From bulasevich at openjdk.java.net Sun Nov 1 17:14:12 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Sun, 1 Nov 2020 17:14:12 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v4] In-Reply-To: References: Message-ID: > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: adding conversion ((a & 0xff) << 8) + (b & 0xff) -> ((a & 0xff) << 8) | (b & 0xff) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/511/files - new: https://git.openjdk.java.net/jdk/pull/511/files/e5833fec..ee95fc6e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=02-03 Stats: 151 lines in 2 files changed: 101 ins; 46 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From xliu at openjdk.java.net Sun Nov 1 21:13:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 1 Nov 2020 21:13:56 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: <2Gv8l6fRNQIoDnX6pGWDZwHXnXNFKrziGNYxddI3z_s=.f5a48e50-9ba8-4a23-bd47-bd83c2cb1e14@github.com> On Sat, 31 Oct 2020 14:44:05 GMT, Claes Redestad wrote: >> The optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) >> to substring(base, beg, end) | base.startsWith(prefix, beg). >> >> it reduces uses of substring. hopefully c2 optimizer can remove the used substring. > > test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 44: > >> 42: @Measurement(iterations = 20, time = 200, timeUnit = TimeUnit.MILLISECONDS) >> 43: @State(Scope.Benchmark) >> 44: public class SubstringAndStartsWith { > > I'd put this micro in org.openjdk.bench.java.lang and call it SubstringStartsWith got it. ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From redestad at openjdk.java.net Sun Nov 1 23:43:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 1 Nov 2020 23:43:00 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts Message-ID: MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. Testing: tier1-3 passed ------------- Commit messages: - Merge branch 'master' into bci_to_dp_opt - Optimize bci_to_data in ciMethodData - Optimize bci_to_data similarly, remove dead code - Optimize bci_to_dp by enabling iteration over DataLayouts with as few allocations as possible Changes: https://git.openjdk.java.net/jdk/pull/988/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=988&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255720 Stats: 95 lines in 4 files changed: 67 ins; 1 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/988.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/988/head:pull/988 PR: https://git.openjdk.java.net/jdk/pull/988 From redestad at openjdk.java.net Sun Nov 1 23:48:02 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 1 Nov 2020 23:48:02 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods Message-ID: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. ------------- Commit messages: - Clean-up no-op clean_weak_method_links Changes: https://git.openjdk.java.net/jdk/pull/990/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=990&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255721 Stats: 28 lines in 2 files changed: 0 ins; 28 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/990.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/990/head:pull/990 PR: https://git.openjdk.java.net/jdk/pull/990 From kvn at openjdk.java.net Mon Nov 2 01:53:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 01:53:55 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 22:50:39 GMT, Claes Redestad wrote: > MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) > > Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. > > This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. > > Testing: tier1-3 passed Nice! ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/988 From kvn at openjdk.java.net Mon Nov 2 02:04:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 02:04:54 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods In-Reply-To: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: On Sun, 1 Nov 2020 23:32:29 GMT, Claes Redestad wrote: > ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/990 From ngasson at openjdk.java.net Mon Nov 2 04:11:55 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 2 Nov 2020 04:11:55 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v5] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Fri, 30 Oct 2020 04:34:19 GMT, Yasumasa Suenaga wrote: > > Sure, the change looks good to me. However I don't understand why CSR is not needed. It introduces new dcmd for Linux. I think because interfaces that are for diagnostic purposes don't require a CSR. See question 4 on the CSR FAQs: https://wiki.openjdk.java.net/display/csr/CSR+FAQs ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From xliu at openjdk.java.net Mon Nov 2 06:49:55 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 2 Nov 2020 06:49:55 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 14:48:46 GMT, Claes Redestad wrote: >> The optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) >> to substring(base, beg, end) | base.startsWith(prefix, beg). >> >> it reduces uses of substring. hopefully c2 optimizer can remove the used substring. > > test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 45: > >> 43: @State(Scope.Benchmark) >> 44: public class SubstringAndStartsWith { >> 45: @Param({"1", "8", "32", "128", "256", "512"}) > > Does each param value pull its weight? I'd consider cutting down the default list to 2-3 variants (you can always specify more values on the command line et.c) to be honest, I don't know the average length of substring in Java. I add so many parameters to show this kind of pattern allocates and copies strings with O(n) complexity, which n is length of substring. to take your advice, I think I can model them in 3 representative lengths. how about this? 1. small 4 like a variable name 2. medium size = 24 like a url or a filepath 3. long string = 256 like a human-readable message ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From thartmann at openjdk.java.net Mon Nov 2 07:31:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 07:31:56 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods In-Reply-To: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: On Sun, 1 Nov 2020 23:32:29 GMT, Claes Redestad wrote: > ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/990 From xliu at openjdk.java.net Mon Nov 2 07:35:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 2 Nov 2020 07:35:56 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 14:59:16 GMT, Claes Redestad wrote: >> The optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) >> to substring(base, beg, end) | base.startsWith(prefix, beg). >> >> it reduces uses of substring. hopefully c2 optimizer can remove the used substring. > > Some comments and nits on the microbenchmark. > > A general comment is that I think it would be good to add variants exercising UTF16 Strings: one where `sample` has some UTF-16 chars, and one where both `sample` and `prefix` do (latin-1 `sample` and UTF-16 `prefix` could be interesting too, to ensure this variant shortcuts quickly). > > Should the `prefix` be something a bit more complex than a single char string? `startsWith("a", off)` is a case that'd be tempting to optimize down to `charAt(off) == 'a'` and then this micro might no longer do what it intends to do. @cl4es Thank you for taking time to review this. I understand you would like to see more variants, such as UTF16 strings and different prefixes. This api-level substitution actually doesn't care the underlying representation of string and prefix of startsWith. it works in the same way. The purpose of this microbench is to prove that substring() is not inevitable in a certain pattern. JIT compilers can archive similar performance of the hand-craft code. Right now, I have only a single variable, which is the length of substring. The result shows that the throughput is irrelevant of the lengths of substrings. My concern is that we would make results discernible if we introduce more than one variable. or I should write a group of benchmarks? > test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 68: > >> 66: // compare prefix length with the length of substring >> 67: if (prefix.length() > substrLength) return false; >> 68: return sample.startsWith(prefix, substrLength); // substrLength here is actually the beginIdex of substring > > Suggestion: > > return sample.startsWith(prefix, substrLength); // substrLength here is actually the beginIndex of substring thinks. I will fix it in next revision. ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From thartmann at openjdk.java.net Mon Nov 2 07:43:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 07:43:56 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 22:50:39 GMT, Claes Redestad wrote: > MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) > > Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. > > This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. > > Testing: tier1-3 passed Looks good to! Some of the code formatting could be improved though (missing new lines between method definitions). ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/988 From thartmann at openjdk.java.net Mon Nov 2 08:22:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 08:22:00 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes Message-ID: C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. This patch includes the following changes: - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. - No need to yank node inputs before calling `destruct`. - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. - Some removal of dead code. Tested with tier1-3, higher tiers are running. JDK-8255670 will further improve detection. Thanks, Tobias ------------- Commit messages: - 8255665: C2 should aggressively remove temporary hook nodes Changes: https://git.openjdk.java.net/jdk/pull/994/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=994&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255665 Stats: 62 lines in 11 files changed: 7 ins; 30 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/994.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/994/head:pull/994 PR: https://git.openjdk.java.net/jdk/pull/994 From thartmann at openjdk.java.net Mon Nov 2 08:39:07 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 08:39:07 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v2] In-Reply-To: References: Message-ID: > C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. > > This patch includes the following changes: > - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. > - No need to yank node inputs before calling `destruct`. > - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. > - Some removal of dead code. > > Tested with tier1-3, higher tiers are running. > > JDK-8255670 will further improve detection. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Restored handling of constant nodes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/994/files - new: https://git.openjdk.java.net/jdk/pull/994/files/aecffa98..e8899406 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=994&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=994&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/994.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/994/head:pull/994 PR: https://git.openjdk.java.net/jdk/pull/994 From thartmann at openjdk.java.net Mon Nov 2 09:38:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 09:38:05 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check Message-ID: `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. Thanks, Tobias ------------- Commit messages: - 8255672: Replace PhaseTransform::eqv by pointer equality check Changes: https://git.openjdk.java.net/jdk/pull/999/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=999&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255672 Stats: 138 lines in 10 files changed: 30 ins; 15 del; 93 mod Patch: https://git.openjdk.java.net/jdk/pull/999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/999/head:pull/999 PR: https://git.openjdk.java.net/jdk/pull/999 From chagedorn at openjdk.java.net Mon Nov 2 10:05:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 2 Nov 2020 10:05:57 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check In-Reply-To: References: Message-ID: <6Vuu3pBxF_TUoq1Vll-gP9HhXoFnFJ10BXPuFWKVhsQ=.3a241bbe-3437-4eed-b5d4-de260ea041ed@github.com> On Mon, 2 Nov 2020 09:29:40 GMT, Tobias Hartmann wrote: > `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. > > Thanks, > Tobias Looks good to me! src/hotspot/share/opto/subnode.cpp line 458: > 456: // Convert "x - (x+y)" into "-y" > 457: return new SubFNode(phase->makecon(TypeF::ZERO),in(2)->in(2)); > 458: } Missing space before `in(2)->in(2)`. src/hotspot/share/opto/movenode.cpp line 129: > 127: if (phase->type(in(Condition)) == TypeInt::ZERO) { > 128: return in(IfFalse); // Always pick left(false) input > 129: } Could be merged with the `if` above but might be clearer if left as it is now. src/hotspot/share/opto/divnode.cpp line 1174: > 1172: if (in(1) == in(2)) { > 1173: return TypeLong::ZERO; > 1174: } Could be merged with the `if` above. src/hotspot/share/opto/divnode.cpp line 999: > 997: if (in(1) == in(2)) { > 998: return TypeInt::ZERO; > 999: } Could be merged with the `if` above. src/hotspot/share/opto/callnode.cpp line 1232: > 1230: if (in(0) == this) { > 1231: return Type::TOP; // Dead infinite loop > 1232: } Could be merged with the `if` above but might be clearer if left as it is now. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/999 From mdoerr at openjdk.java.net Mon Nov 2 10:09:00 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 2 Nov 2020 10:09:00 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v3] In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 05:02:06 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Thanks for doing this. Please check my inline comments. If you would like to benchmark C1, you can use -XX:TieredStopAtLevel=1 to switch off C2. When you factor the new logic out, I highly prefer to use it everywhere: C2, C1 (LIR_Assembler::comp_fl2i), interpreter (TemplateTable::lcmp, TemplateTable::float_cmp) src/hotspot/cpu/ppc/ppc.ad line 11422: > 11420: > 11421: // Manifest a CmpL3 result in an integer register. > 11422: instruct cmpL3_reg_reg_Ex(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{ "_Ex" should be removed since it doesn't use exapand any more. src/hotspot/cpu/ppc/ppc.ad line 11425: > 11423: match(Set dst (CmpL3 src1 src2)); > 11424: ins_cost(DEFAULT_COST*5); > 11425: size(20); VM_Version::has_brw() ? 16 : 20 src/hotspot/cpu/ppc/ppc.ad line 11427: > 11425: size(20); > 11426: > 11427: format %{ "cmpL3_reg_reg_Ex $dst, $src1, $src2" %} "_Ex" should be removed since it doesn't use exapand any more. src/hotspot/cpu/ppc/ppc.ad line 11441: > 11439: __ srawi(R0, R0, 31); > 11440: } > 11441: __ orr($dst$$Register, $dst$$Register, R0); Better factor this out to macroAssembler. E.g. MacroAssembler::set_cmp3(Register dst); // set dst to -1, 0, +1 depending on CR0 src/hotspot/cpu/ppc/ppc.ad line 11766: > 11764: > 11765: // Compare float, generate -1,0,1 > 11766: instruct cmpF3_reg_reg_Ex(iRegIdst dst, regF src1, regF src2) %{ "_Ex" should be removed since it doesn't use exapand any more. src/hotspot/cpu/ppc/ppc.ad line 11859: > 11857: > 11858: // Compare double, generate -1,0,1 > 11859: instruct cmpD3_reg_reg_Ex(iRegIdst dst, regD src1, regD src2) %{ "_Ex" should be removed since it doesn't use exapand any more. src/hotspot/cpu/ppc/ppc.ad line 11864: > 11862: size(20); > 11863: > 11864: format %{ "cmpD3_reg_reg_Ex $dst, $src1, $src2" %} "_Ex" should be removed since it doesn't use exapand any more. src/hotspot/cpu/ppc/ppc.ad line 11862: > 11860: match(Set dst (CmpD3 src1 src2)); > 11861: ins_cost(DEFAULT_COST*5); > 11862: size(20); VM_Version::has_brw() ? 16 : 20 src/hotspot/cpu/ppc/ppc.ad line 11769: > 11767: match(Set dst (CmpF3 src1 src2)); > 11768: ins_cost(DEFAULT_COST*5); > 11769: size(20); VM_Version::has_brw() ? 16 : 20 src/hotspot/cpu/ppc/ppc.ad line 11771: > 11769: size(20); > 11770: > 11771: format %{ "cmpF3_reg_reg_Ex $dst, $src1, $src2" %} "_Ex" should be removed since it doesn't use exapand any more. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/907 From redestad at openjdk.java.net Mon Nov 2 10:12:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 10:12:09 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods [v2] In-Reply-To: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: > ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into methoddata_cleanup - Remove undefined declaration - Clean-up no-op clean_weak_method_links ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/990/files - new: https://git.openjdk.java.net/jdk/pull/990/files/4d5c18f4..653a325f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=990&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=990&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/990.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/990/head:pull/990 PR: https://git.openjdk.java.net/jdk/pull/990 From chagedorn at openjdk.java.net Mon Nov 2 10:41:55 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 2 Nov 2020 10:41:55 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:39:07 GMT, Tobias Hartmann wrote: >> C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. >> >> This patch includes the following changes: >> - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. >> - No need to yank node inputs before calling `destruct`. >> - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. >> - Some removal of dead code. >> >> Tested with tier1-3, higher tiers are running. >> >> JDK-8255670 will further improve detection. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Restored handling of constant nodes Nice clean up, looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/994 From ysuenaga at openjdk.java.net Mon Nov 2 11:03:59 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Mon, 2 Nov 2020 11:03:59 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v5] In-Reply-To: <1IQqVGMwEJYYpHWOQXdtJvIp3hs6mjEJkaMfTb3lwWo=.5cb02e47-d773-41ba-b206-b2657417124c@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> <1IQqVGMwEJYYpHWOQXdtJvIp3hs6mjEJkaMfTb3lwWo=.5cb02e47-d773-41ba-b206-b2657417124c@github.com> Message-ID: On Tue, 27 Oct 2020 04:21:33 GMT, Nick Gasson wrote: >> When using the Linux "perf" tool to do system profiling, symbol names of >> running Java methods cannot be decoded, resulting in unhelpful output >> such as: >> >> 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 >> >> Perf can read a simple text file format describing the mapping between >> address ranges and symbol names for a particular process [1]. >> >> It's possible to generate this already for Java processes using a JVMTI >> plugin such as perf-map-agent [2]. However this requires compiling >> third-party code and then loading the agent into your Java process. It >> would be more convenient if Hotspot could write this file directly using >> a diagnostic command. The information required is almost identical to >> that of the existing Compiler.codelist command. >> >> This patch adds a Compiler.perfmap diagnostic command on Linux only. To >> use, first run "jcmd Compiler.perfmap" and then "perf top" or >> "perf record" and the report should show decoded Java symbol names for >> that process. >> >> As this just writes a snapshot of the code cache when the command is >> run, it will become stale if methods are compiled later or unloaded. >> However this shouldn't be a big problem in practice if the map file is >> generated after the application has warmed up. >> >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> [2] https://github.com/jvm-profiling-tools/perf-map-agent > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Make DumpPerfMapAtExit a diagnostic option Ok, I reviewed this change. I guess you should close [JDK-8254723](https://bugs.openjdk.java.net/browse/JDK-8254723). ------------- Marked as reviewed by ysuenaga (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/760 From rwestrel at redhat.com Mon Nov 2 12:23:23 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Nov 2020 13:23:23 +0100 Subject: Bad graph detected in build_loop_late but I have no clue In-Reply-To: References: Message-ID: <871rhcdk1w.fsf@redhat.com> > Isn?t that paper describe global code motion? Is build_loop_early/late() actually doing code motion before real loop optimizations? > > In particular, I don?t understand this statement. > set_ctrl(n, least); > > Is this set_ctrl(Y, X) saying Y control dependent on X. The definition of control dependence is in [2]? No, it's not a control dependence. A control dependence would be an edge between Y and X. build_loop_early/late() is not doing code motion either. In that case, Y is a floating node. So there's no need for code motion as it can freely move within the constraints encoded by input and output edges. What build_loop_early/late() does is find the earliest legal placement for a floating node (after all its inputs), the latest legal placement (before all its uses) and the best placement (as late as possible but out of loops if possible). That placement is then recorded by by set_ctrl() that is by associating a floating node with a control node (which cannot float). The result is not a schedule of nodes either because how 2 floating nodes should be ordered is not computed and recorded. > So far, I found that PhaseIdealLoop::_nodes[IDX] can be any of 3 different values. > > 1. NULL, which means IDX is dead. > 2. A CFG node with the lowest bit set. Assigned by set_ctrl. > 3. IdealLoopTree*, when this node is the head of a loop. > > Do I understand this data-structure right? So PhaseIdealLoop doesn?t have ?BasicBlocks? and it uses _nodes to mark where a node belongs to? Yes, so for instance if you need to know what loop a floating node is part of, you first retrieve its control with get_ctrl() and then the loop that the control node if part of with get_loop(). > I understand that (legal->is_Start() && !early->is_Root()) is a legit assertion, I believe I mess up the ideal graph somewhere and cause this fiasco. > Could you give me a pointer which node is broken? Or, could you share me with some hints how to debug this kind of problem? That: > idom[3] 760 If === 522 559 [[ 1604 486 ]] P=0.999999, C=-1.000000 !jvms: Handler::parseURL @ bci:88 > idom[4] 522 Region === 522 799 800 [[ 522 531 253 254 680 760 ]] !jvms: String::coder @ bci:3 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 and > idom[22] 531 If === 522 559 [[ 940 261 ]] P=0.999999, C=-1.000000 !jvms: String::coder @ bci:14 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 > idom[23] 522 Region === 522 799 800 [[ 522 531 253 254 680 760 ]] !jvms: String::coder @ bci:3 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 Doesn't seem right. Both If have Region 522 as control input. That's not legal. There should only be one control use for 522. Roland. From rwestrel at redhat.com Mon Nov 2 12:49:34 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Nov 2020 13:49:34 +0100 Subject: Question about a few properties of ideal graph In-Reply-To: <171D40B4-23CB-49E0-8850-4C4EF021C972@amazon.com> References: <171D40B4-23CB-49E0-8850-4C4EF021C972@amazon.com> Message-ID: <87y2jjdiu9.fsf@redhat.com> > 1. Useless. [1] defines an operation is useless if no operation uses its results, or if all uses of the results are dead (10.2) > Presumably, a node is useless if it?s not useful. Can I say identify_useful_nodes() is same as the definition above? > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L302 I would say yes. In the ideal graph, nodes that have no use are useless and can be removed. identify_useful_nodes() is applied after parsing to remove nodes that have no use because parsing leaves some behind it. > 2. Unreachable. In my understanding, a node is unreachable if control flow never be there. I feel this definition only fits in CFG. Is it still the same meaning in ideal graph? > > According to the code here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L744, > it looks like a node is unreachable if > 1) no use or > 2) its type is TOP or > 3) its control input node's is_top() is true. > > Is it complementary to Node::is_reachable_from_root()? To be honest, I don't understand why finding the root from a node by following uses using BFS is same thing as the control flow can reach it from the root. I would say 1) is not unreachable. It can still be reachable but it is useless. Not all nodes have a control input (AddNode for instance) so looking only at control inputs can't be sufficient. I would say, a node is unreachable if one of its required inputs is top or NULL. Region/Phi are special cases because they merge multiple control flows. So if one of their inputs become top but all other inputs are not, the Region/Phi is still reachable. > 3. dead: dead is everywhere in c2. I feel it could refer to different things in different places. > 1. useless? e.g. Compile::_dead_node_list > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L326 > 2. no direct use e.g. outcnt() == 0 is dead. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/phaseX.hpp#L511 > 3. sometimes, I feel c2 removes a dead node because it's unreachable. > > Is there a definition of dead node in c2? Or dead/useless/unreachable are all same thing in ideal graph? Not sure if there's a consistent use of "dead" in the source base. > 4. top. What's semantic if a node's is_top() is true? Is it same thing as its type is TOP? Node::is_top() returns true for the unique top node (constant node that has type TOP). I don't think you can say the type TOP and the node top are the same thing. Let's say you have a AddNode with a top input. The AddNode now has type TOP which at GVN would lead that AddNode to be replaced by the top node. > In type lattice, TOP is vague to me too. I feel that the type of a node is TOP has a slight different meaning for different nodes. If the node is a CFG node, TOP seems to mean control flow can't reach to it. > If a value node whose type is TOP, I guess it means the value of this node is undefined. I am correct here? I don't think TOP has a different meaning for control nodes. One thing about the ideal graph is that control flow is often treated as just another dependence. > 5. root node > Can I say the root node of each compilation unit(CU) is the beginning and the end of that CU? I think so. > So far, I feel the inputs of a root node are return values and side effect. It that correct? Side effect is a bit vague, I guess. Return and other exits from the method such as deoptimization. Maybe there are other edges added from the root node, maybe temporarily, to keep some nodes from becoming useless. Not sure. > If I traverse uses of nodes from root, I should return to root eventually? if yes, it means ideal graph is a DAG. Yes, you should. But it's not a DAG anyway because there can be loops. Roland. From roland at openjdk.java.net Mon Nov 2 13:07:08 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 2 Nov 2020 13:07:08 GMT Subject: RFR: 8255400: Shenandoah: C2 failures after JDK-8255000 Message-ID: At barrier expansion time, the IR graph may contain a Halt node whose control is a region. In that case, code that wires raw memory creates a memory Phi at the region. But that Phi has no use because the Halt node doesn't consume any memory. That dead Phi causes the assert to trigger. I propose some adjustments so a Phi is not created in that case. ------------- Commit messages: - jcheck - more test - fix - test Changes: https://git.openjdk.java.net/jdk/pull/1000/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1000&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255400 Stats: 87 lines in 2 files changed: 81 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1000.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1000/head:pull/1000 PR: https://git.openjdk.java.net/jdk/pull/1000 From redestad at openjdk.java.net Mon Nov 2 13:09:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 13:09:08 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts [v2] In-Reply-To: References: Message-ID: > MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) > > Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. > > This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. > > Testing: tier1-3 passed Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - More lines - Merge branch 'master' into bci_to_dp_opt - Merge branch 'master' into bci_to_dp_opt - Optimize bci_to_data in ciMethodData - Optimize bci_to_data similarly, remove dead code - Optimize bci_to_dp by enabling iteration over DataLayouts with as few allocations as possible ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/988/files - new: https://git.openjdk.java.net/jdk/pull/988/files/0d8e8f04..b12ff3b3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=988&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=988&range=00-01 Stats: 682 lines in 21 files changed: 593 ins; 37 del; 52 mod Patch: https://git.openjdk.java.net/jdk/pull/988.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/988/head:pull/988 PR: https://git.openjdk.java.net/jdk/pull/988 From thartmann at openjdk.java.net Mon Nov 2 13:14:59 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 13:14:59 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods [v2] In-Reply-To: References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: On Mon, 2 Nov 2020 10:12:09 GMT, Claes Redestad wrote: >> ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into methoddata_cleanup > - Remove undefined declaration > - Clean-up no-op clean_weak_method_links Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/990 From thartmann at openjdk.java.net Mon Nov 2 13:16:02 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 13:16:02 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 13:09:08 GMT, Claes Redestad wrote: >> MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) >> >> Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. >> >> This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. >> >> Testing: tier1-3 passed > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - More lines > - Merge branch 'master' into bci_to_dp_opt > - Merge branch 'master' into bci_to_dp_opt > - Optimize bci_to_data in ciMethodData > - Optimize bci_to_data similarly, remove dead code > - Optimize bci_to_dp by enabling iteration over DataLayouts with as few allocations as possible Looks good, thanks for making these changes. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/988 From redestad at openjdk.java.net Mon Nov 2 13:23:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 13:23:57 GMT Subject: RFR: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 01:50:52 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - More lines >> - Merge branch 'master' into bci_to_dp_opt >> - Merge branch 'master' into bci_to_dp_opt >> - Optimize bci_to_data in ciMethodData >> - Optimize bci_to_data similarly, remove dead code >> - Optimize bci_to_dp by enabling iteration over DataLayouts with as few allocations as possible > > Nice! @vnkozlov @TobiHartmann - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/988 From rkennke at openjdk.java.net Mon Nov 2 13:24:57 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 13:24:57 GMT Subject: RFR: 8255400: Shenandoah: C2 failures after JDK-8255000 In-Reply-To: References: Message-ID: <6L73HWSfNYZTr5AT1zuVXK0U6f8QVXw11drbIee3Q-A=.564585b7-5532-4699-b502-7b1d940cfb86@github.com> On Mon, 2 Nov 2020 09:51:01 GMT, Roland Westrelin wrote: > At barrier expansion time, the IR graph may contain a Halt node whose > control is a region. In that case, code that wires raw memory creates > a memory Phi at the region. But that Phi has no use because the Halt > node doesn't consume any memory. That dead Phi causes the assert to > trigger. I propose some adjustments so a Phi is not created in that > case. Looks good to me! Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1000 From redestad at openjdk.java.net Mon Nov 2 13:25:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 13:25:01 GMT Subject: RFR: 8255721: Remove no-op clean_weak_method_links methods [v2] In-Reply-To: References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: On Mon, 2 Nov 2020 02:02:17 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into methoddata_cleanup >> - Remove undefined declaration >> - Clean-up no-op clean_weak_method_links > > Good. @vnkozlov @TobiHartmann - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/990 From redestad at openjdk.java.net Mon Nov 2 13:31:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 13:31:57 GMT Subject: Integrated: 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 22:50:39 GMT, Claes Redestad wrote: > MethodData::bci_to_dp and ciMethodData::bci_to_data show up in startup/warmup profiles - much of the overhead allocating resource objects when iterating over ProfileData objects (next_data) > > Providing a means to iterate over the raw DataLayout objects allow us to avoid explicitly allocating resource objects for purposes of calculating the next DataLayout address for the most common types. > > This patch reduces overhead of MethodData::bci_to_dp and ciMethodData::bci_to_data by 80% or more in profiles and has a measurable impact on simple startup tests, e.g. ~250k instruction (~0.2% of total) reduction on Hello World. > > Testing: tier1-3 passed This pull request has now been integrated. Changeset: 120aec70 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/120aec70 Stats: 99 lines in 4 files changed: 71 ins; 1 del; 27 mod 8255720: Optimize bci_to_dp/-data by enabling iteration over raw DataLayouts Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/988 From redestad at openjdk.java.net Mon Nov 2 13:31:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 13:31:58 GMT Subject: Integrated: 8255721: Remove no-op clean_weak_method_links methods In-Reply-To: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> References: <8DMw8Uh95wEVAR3n7glM67s8w8Hp3htB2JqbkyfPSIA=.c875a1d1-7958-4c83-aa1c-05955a3e4ca3@github.com> Message-ID: On Sun, 1 Nov 2020 23:32:29 GMT, Claes Redestad wrote: > ProfileData:: and DataLayout::clean_weak_method_links are both virtual but empty methods with no overrides, so removing them and simplifying MethodData::clean_weak_method_links should be a minor clean-up and possibly speed-up redefinition a notch. This pull request has now been integrated. Changeset: 4b775e64 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/4b775e64 Stats: 29 lines in 2 files changed: 0 ins; 29 del; 0 mod 8255721: Remove no-op clean_weak_method_links methods Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/990 From thartmann at openjdk.java.net Mon Nov 2 14:33:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 14:33:56 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v2] In-Reply-To: References: Message-ID: <7a1meIvsgTw-HfZ0QSPZl3nQ1MPDhRBq1OfDvG3tlYs=.134a2f16-a9e2-463a-986e-08fe94d5358b@github.com> On Mon, 2 Nov 2020 10:38:49 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Restored handling of constant nodes > > Nice clean up, looks good to me! Thanks Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From adinn at openjdk.java.net Mon Nov 2 14:37:02 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 2 Nov 2020 14:37:02 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v4] In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 17:14:12 GMT, Boris Ulasevich wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > adding conversion ((a & 0xff) << 8) + (b & 0xff) -> ((a & 0xff) << 8) | (b & 0xff) src/hotspot/share/opto/addnode.cpp line 239: > 237: > 238: //---------------------------Bitmask helper------------------------------------ > 239: // Get a bitmask for expression: "(a & 0xFF) << 8" -> 0xFF00 This comment is thoroughly misleading and inadequate. Please describe correctly what this routine does and why. src/hotspot/share/opto/addnode.cpp line 260: > 258: > 259: static julong get_bitmaskL(PhaseGVN *phase, Node* value) { > 260: int opcode = value->Opcode(); If you are not going to provide an equivalent comment to explain this routine then at least point the reader back to the comment for the preceding routine. src/hotspot/share/opto/addnode.cpp line 363: > 361: } > 362: > 363: // Convert "((a & 0xFF) << 8) + (b & 0xFF)" into "((a & 0xFF) << 8) | (b & 0xFF)" Where do the constants 0xFF and 8 in this comment come from? src/hotspot/share/opto/addnode.cpp line 488: > 486: > 487: // Convert "((a & 0xFF) << 8) + (b & 0xFF)" into "((a & 0xFF) << 8) | (b & 0xFF)" > 488: julong bitmask1 = get_bitmaskL(phase, in(1)); Same question as before. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From adinn at openjdk.java.net Mon Nov 2 14:40:55 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 2 Nov 2020 14:40:55 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v4] In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 17:14:12 GMT, Boris Ulasevich wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > adding conversion ((a & 0xff) << 8) + (b & 0xff) -> ((a & 0xff) << 8) | (b & 0xff) src/hotspot/share/opto/addnode.cpp line 848: > 846: Node *src = in(2); > 847: int offset = 0; > 848: // Convert Or(v1, Shift(And(v2, 0xFF), shift)) into BitfieldInsert(dst, src, width, offset) This comment is incoherent, misleading and totally inadequate. What is the 'Or' there for? Why doe it describe an And nested in a Shift when the code expects other cases. This is no use whatsoever to anyone who has to maintain your code. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From thartmann at openjdk.java.net Mon Nov 2 14:54:07 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 14:54:07 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: References: Message-ID: > `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Added missing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/999/files - new: https://git.openjdk.java.net/jdk/pull/999/files/ecfac16c..9731df02 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=999&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=999&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/999.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/999/head:pull/999 PR: https://git.openjdk.java.net/jdk/pull/999 From thartmann at openjdk.java.net Mon Nov 2 14:54:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 2 Nov 2020 14:54:08 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: <6Vuu3pBxF_TUoq1Vll-gP9HhXoFnFJ10BXPuFWKVhsQ=.3a241bbe-3437-4eed-b5d4-de260ea041ed@github.com> References: <6Vuu3pBxF_TUoq1Vll-gP9HhXoFnFJ10BXPuFWKVhsQ=.3a241bbe-3437-4eed-b5d4-de260ea041ed@github.com> Message-ID: On Mon, 2 Nov 2020 10:03:25 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Added missing whitespace > > Looks good to me! Thanks for the review Christian. I'd like to not merge the ifs because I think these are separate cases and I would like to keep the impact on surrounding code to a minimum. Are you okay with that? > src/hotspot/share/opto/subnode.cpp line 458: > >> 456: // Convert "x - (x+y)" into "-y" >> 457: return new SubFNode(phase->makecon(TypeF::ZERO),in(2)->in(2)); >> 458: } > > Missing space before `in(2)->in(2)`. Good catch! ------------- PR: https://git.openjdk.java.net/jdk/pull/999 From chagedorn at openjdk.java.net Mon Nov 2 15:09:58 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 2 Nov 2020 15:09:58 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: References: Message-ID: <_4kVZ1J5tXaWgKsuK6iFENPAF3TOhp7Pysnbau5Cl48=.27a62ef0-d091-4d36-a2ab-c489880d2cd6@github.com> On Mon, 2 Nov 2020 14:54:07 GMT, Tobias Hartmann wrote: >> `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing whitespace Yes, I think that is reasonable to leave the code as it is and only do the clean-ups directly related to `PhaseTransform::eqv`. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/999 From redestad at openjdk.java.net Mon Nov 2 15:17:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 2 Nov 2020 15:17:55 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 14:54:07 GMT, Tobias Hartmann wrote: >> `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing whitespace Nice cleanup! ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/999 From roland at openjdk.java.net Mon Nov 2 15:53:07 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 2 Nov 2020 15:53:07 GMT Subject: Integrated: 8255400: Shenandoah: C2 failures after JDK-8255000 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 09:51:01 GMT, Roland Westrelin wrote: > At barrier expansion time, the IR graph may contain a Halt node whose > control is a region. In that case, code that wires raw memory creates > a memory Phi at the region. But that Phi has no use because the Halt > node doesn't consume any memory. That dead Phi causes the assert to > trigger. I propose some adjustments so a Phi is not created in that > case. This pull request has now been integrated. Changeset: a3aad119 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/a3aad119 Stats: 87 lines in 2 files changed: 81 ins; 0 del; 6 mod 8255400: Shenandoah: C2 failures after JDK-8255000 Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/1000 From roland at openjdk.java.net Mon Nov 2 16:22:12 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 2 Nov 2020 16:22:12 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges Message-ID: This change add 3 new methods in Objects: public static long checkIndex(long index, long length) public static long checkFromToIndex(long fromIndex, long toIndex, long length) public static long checkFromIndexSize(long fromIndex, long size, long length) This mirrors the int utility methods that were added by JDK-8135248 with the same motivations. As is the case with the int checkIndex(), the long checkIndex() method is JIT compiled as an intrinsic. It allows the JIT to compile checkIndex to an unsigned comparison and properly recognize it as a range check that then becomes a candidate for the existing range check optimizations. This has proven to be important for panama's MemorySegment API and a prototype of this change (with some extra c2 improvements) showed that panama micro benchmark results improve significantly. This change includes: - the API change - the C2 intrinsic - tests for the API and the C2 intrinsic This is a joint work with Paul who reviewed and reworked the API change and filled the CSR. ------------- Commit messages: - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - non x86_64 arch support - c2 test case - intrinsic - Use overloads of method names. - Vladimir's comments - checkLongIndex Changes: https://git.openjdk.java.net/jdk/pull/1003/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255150 Stats: 895 lines in 30 files changed: 848 ins; 1 del; 46 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From rwestrel at redhat.com Mon Nov 2 16:32:35 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Nov 2020 17:32:35 +0100 Subject: Question about a few properties of ideal graph In-Reply-To: <87y2jjdiu9.fsf@redhat.com> References: <171D40B4-23CB-49E0-8850-4C4EF021C972@amazon.com> <87y2jjdiu9.fsf@redhat.com> Message-ID: <87pn4vd8ik.fsf@redhat.com> > Region/Phi are special cases because they merge multiple control > flows. So if one of their inputs become top but all other inputs are > not, the Region/Phi is still reachable. Actually to be accurate, it also depends whether the Region is a loop head or node. A back edge becoming top doesn't make the Region unreachable. Roland. From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 2 16:52:04 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 2 Nov 2020 16:52:04 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms Message-ID: Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number ------------- Commit messages: - Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large Changes: https://git.openjdk.java.net/jdk/pull/894/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255368 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 2 16:52:05 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 2 Nov 2020 16:52:05 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number I am from Intel Corp. Intel is OCA signatory ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From burban at openjdk.java.net Mon Nov 2 17:14:09 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 17:14:09 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 Message-ID: Fixing this problem: 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. ------------- Commit messages: - 8255766: Fix linux+arm64 build after 8254072 Changes: https://git.openjdk.java.net/jdk/pull/1013/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1013&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255766 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1013.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1013/head:pull/1013 PR: https://git.openjdk.java.net/jdk/pull/1013 From kvn at openjdk.java.net Mon Nov 2 17:28:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 17:28:01 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: <3rDZqiozaBj5QlHPfsxnmcY_25zoTgEj1VastnnP2IY=.847356ee-fda7-47ed-8de2-83a3f0e783fb@github.com> On Mon, 2 Nov 2020 17:04:53 GMT, Bernhard Urban-Forster wrote: > Fixing this problem: > > 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. Let me test it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From darcy at openjdk.java.net Mon Nov 2 17:45:02 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 2 Nov 2020 17:45:02 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. ------------- Changes requested by darcy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/894 From ihse at openjdk.java.net Mon Nov 2 17:45:55 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 17:45:55 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:04:53 GMT, Bernhard Urban-Forster wrote: > Fixing this problem: > > 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. It looks OK to me, given that it compiles with gcc+msdev. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1013 From ihse at openjdk.java.net Mon Nov 2 17:45:56 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 17:45:56 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:42:51 GMT, Magnus Ihse Bursie wrote: >> Fixing this problem: >> >> 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); >> Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. > > It looks OK to me, given that it compiles with gcc+msdev. Someone in hotspot team needs to determine if this is a "trivial" fix or not. ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From jbhateja at openjdk.java.net Mon Nov 2 17:50:56 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 17:50:56 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Fri, 16 Oct 2020 14:50:15 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/cfgnode.cpp line 396: > >> 394: } >> 395: >> 396: bool RegionNode::is_self_loop(Node* n) { > > A bit expensive to DFS the entire graph to find a self loop. You don't need to visit nodes outside the loop. But you might not need to do this at all - see my comments further down. DONE ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Mon Nov 2 17:58:03 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 17:58:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Fri, 16 Oct 2020 14:53:32 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/cfgnode.cpp line 436: > >> 434: Node* rep_node = NULL; >> 435: PhaseIterGVN *igvn = phase->is_IterGVN(); >> 436: if (in(1)->is_top() && !in(2)->is_top()) { > > The Phi-nodes for loops are always normalized - in(1) will be loop-entry and in(2) is the backedge. So if in(1) is top - in(2) will be a self loop. Yes, loop self loop check is no longer needed, removed associated phi disintegration logic also, there was a problem with the inputs connection exit_region (convergence region after partially in-lined fast path region and stub call slow path region) which has been fixed, it was maligning the graph shape. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From akozlov at openjdk.java.net Mon Nov 2 18:00:56 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 2 Nov 2020 18:00:56 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:43:32 GMT, Magnus Ihse Bursie wrote: >> It looks OK to me, given that it compiles with gcc+msdev. > > Someone in hotspot team needs to determine if this is a "trivial" fix or not. A recently introduced checked cast[1] looks to be useful here, it would allow to remove the ad-hoc assert. [1] https://github.com/openjdk/jdk/commit/3302d3adb5fbd5729c1677d927dd4c8af1c428a4 ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From aph at openjdk.java.net Mon Nov 2 18:00:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 2 Nov 2020 18:00:57 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:55:46 GMT, Anton Kozlov wrote: >> Someone in hotspot team needs to determine if this is a "trivial" fix or not. > > A recently introduced checked cast[1] looks to be useful here, it would allow to remove the ad-hoc assert. > > [1] https://github.com/openjdk/jdk/commit/3302d3adb5fbd5729c1677d927dd4c8af1c428a4 diff --git a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp index eb6195fd675..92a07a84d2a 100644 --- a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp @@ -1498,8 +1498,7 @@ nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm, // Generate stack overflow check if (UseStackBanging) { - assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); - __ bang_stack_with_offset((int)StackOverflow::stack_shadow_zone_size()); + __ bang_stack_with_offset(checked_cast(StackOverflow::stack_shadow_zone_size())); } else { Unimplemented(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From jbhateja at openjdk.java.net Mon Nov 2 18:00:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 18:00:58 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 18:33:22 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. > > There is regression after 8252847 changes: 8254890. > It should be fixed before we proceed with these changes. Hi @vnkozlov , @neliasso, kindly let me know if there any review comments which needs to be addressed. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Mon Nov 2 18:21:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 18:21:54 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:58:02 GMT, Andrew Haley wrote: >> A recently introduced checked cast[1] looks to be useful here, it would allow to remove the ad-hoc assert. >> >> [1] https://github.com/openjdk/jdk/commit/3302d3adb5fbd5729c1677d927dd4c8af1c428a4 > > diff --git a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp > index eb6195fd675..92a07a84d2a 100644 > --- a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp > @@ -1498,8 +1498,7 @@ nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm, > > // Generate stack overflow check > if (UseStackBanging) { > - assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > - __ bang_stack_with_offset((int)StackOverflow::stack_shadow_zone_size()); > + __ bang_stack_with_offset(checked_cast(StackOverflow::stack_shadow_zone_size())); > } else { > Unimplemented(); > } I like Andrew's proposal. It is more clean. Let me test it too. ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From kvn at openjdk.java.net Mon Nov 2 18:43:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 18:43:55 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:39:07 GMT, Tobias Hartmann wrote: >> C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. >> >> This patch includes the following changes: >> - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. >> - No need to yank node inputs before calling `destruct`. >> - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. >> - Some removal of dead code. >> >> Tested with tier1-3, higher tiers are running. >> >> JDK-8255670 will further improve detection. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Restored handling of constant nodes src/hotspot/share/opto/cfgnode.cpp line 2308: > 2306: igvn->_worklist.remove(hook); > 2307: } > 2308: hook->destruct(); I think we should pass PhaseGVN* into destruct() and do removal from worklist there because it looks like repetitive pattern. Also we can take Compile pointer from PhaseGVN instead of calling Compile::current(): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L584 ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From burban at openjdk.java.net Mon Nov 2 18:45:08 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 18:45:08 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 [v2] In-Reply-To: References: Message-ID: > Fixing this problem: > > 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: use checked_cast utility ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1013/files - new: https://git.openjdk.java.net/jdk/pull/1013/files/30ef4190..5584ac72 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1013&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1013&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1013.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1013/head:pull/1013 PR: https://git.openjdk.java.net/jdk/pull/1013 From burban at openjdk.java.net Mon Nov 2 18:45:08 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 18:45:08 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 [v2] In-Reply-To: References: Message-ID: <0v1_YYD_JI2qmoKZYcPhm_3_dEIRwCTiHkKAxWJ5_5w=.f917fa64-2f18-41b9-9ea9-2e85b09c6ad0@github.com> On Mon, 2 Nov 2020 18:19:06 GMT, Vladimir Kozlov wrote: >> diff --git a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >> index eb6195fd675..92a07a84d2a 100644 >> --- a/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >> @@ -1498,8 +1498,7 @@ nmethod* SharedRuntime::generate_native_wrapper(MacroAssembler* masm, >> >> // Generate stack overflow check >> if (UseStackBanging) { >> - assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); >> - __ bang_stack_with_offset((int)StackOverflow::stack_shadow_zone_size()); >> + __ bang_stack_with_offset(checked_cast(StackOverflow::stack_shadow_zone_size())); >> } else { >> Unimplemented(); >> } > > I like Andrew's proposal. It is more clean. Let me test it too. Ah, this came just-in-time ?? I've updated the PR. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From kvn at openjdk.java.net Mon Nov 2 18:49:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 18:49:54 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 14:54:07 GMT, Tobias Hartmann wrote: >> `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing whitespace Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/999 From never at openjdk.java.net Mon Nov 2 19:25:03 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 2 Nov 2020 19:25:03 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: <-eax17fyDoxXQFT0QSzGlFaJC1z8nLtkn_EbJKApNPQ=.93cceeb4-2046-4761-81a7-87eb89e78660@github.com> On Tue, 20 Oct 2020 15:46:36 GMT, Bernhard Urban-Forster wrote: >> Use r18 as allocatable register on Linux only. >> >> A bootstrap works now (it has been crashing before due to r18 being allocated): >> $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version >> Bootstrapping JVMCI................................. in 17990 ms >> (compiled 3330 methods) >> openjdk version "16-internal" 2021-03-16 >> OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) >> >> Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. > > Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add missing precompiled.hpp include > - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 > - rename argument to canUsePlatformRegister > - comment for platformRegister > - 8254827: JVMCI: Enable it for Windows+AArch64 > > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > ```console > $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms > (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > ``` > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From kvn at openjdk.java.net Mon Nov 2 19:36:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 19:36:05 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 15:46:36 GMT, Bernhard Urban-Forster wrote: >> Use r18 as allocatable register on Linux only. >> >> A bootstrap works now (it has been crashing before due to r18 being allocated): >> $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version >> Bootstrapping JVMCI................................. in 17990 ms >> (compiled 3330 methods) >> openjdk version "16-internal" 2021-03-16 >> OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) >> >> Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. > > Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add missing precompiled.hpp include > - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 > - rename argument to canUsePlatformRegister > - comment for platformRegister > - 8254827: JVMCI: Enable it for Windows+AArch64 > > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > ```console > $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms > (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > ``` > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Changes requested by kvn (Reviewer). make/autoconf/jvm-features.m4 line 309: > 307: if test "x$OPENJDK_TARGET_CPU" = "xx86_64"; then > 308: AC_MSG_RESULT([yes]) > 309: elif test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then You are missing the same change for JVM_FEATURES_CHECK_JVMCI. Unless it is done intentionally. ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From kvn at openjdk.java.net Mon Nov 2 19:50:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 19:50:54 GMT Subject: RFR: 8255766: Fix linux+arm64 build after 8254072 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 18:45:08 GMT, Bernhard Urban-Forster wrote: >> Fixing this problem: >> >> 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); >> Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. > > Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: > > use checked_cast utility Updated changes passed our tier1 build and testing on aarch64. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1013 From burban at openjdk.java.net Mon Nov 2 20:08:03 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 20:08:03 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 19:33:39 GMT, Vladimir Kozlov wrote: >> Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - add missing precompiled.hpp include >> - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 >> - rename argument to canUsePlatformRegister >> - comment for platformRegister >> - 8254827: JVMCI: Enable it for Windows+AArch64 >> >> Use r18 as allocatable register on Linux only. >> >> A bootstrap works now (it has been crashing before due to r18 being allocated): >> ```console >> $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version >> Bootstrapping JVMCI................................. in 17990 ms >> (compiled 3330 methods) >> openjdk version "16-internal" 2021-03-16 >> OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) >> ``` >> >> Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. > > make/autoconf/jvm-features.m4 line 309: > >> 307: if test "x$OPENJDK_TARGET_CPU" = "xx86_64"; then >> 308: AC_MSG_RESULT([yes]) >> 309: elif test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then > > You are missing the same change for JVM_FEATURES_CHECK_JVMCI. > Unless it is done intentionally. It's done a couple lines below: https://github.com/openjdk/jdk/pull/685/files#diff-a09b08bcd422d0a8fb32a95ccf85051ac1e69bef2bd420d579f74d8efa286d2fL343 Or do you mean something else? ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From kvn at openjdk.java.net Mon Nov 2 20:25:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 20:25:01 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 15:46:36 GMT, Bernhard Urban-Forster wrote: >> Use r18 as allocatable register on Linux only. >> >> A bootstrap works now (it has been crashing before due to r18 being allocated): >> $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version >> Bootstrapping JVMCI................................. in 17990 ms >> (compiled 3330 methods) >> openjdk version "16-internal" 2021-03-16 >> OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) >> >> Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. > > Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add missing precompiled.hpp include > - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 > - rename argument to canUsePlatformRegister > - comment for platformRegister > - 8254827: JVMCI: Enable it for Windows+AArch64 > > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > ```console > $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms > (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > ``` > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From kvn at openjdk.java.net Mon Nov 2 20:25:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 20:25:02 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: <-w7BvxPLjNvak1PLDvan5QXC8435O2Pq4kGf-5X8LTU=.34aae93f-69a0-456b-a389-ebc1e1dfe218@github.com> On Mon, 2 Nov 2020 20:05:10 GMT, Bernhard Urban-Forster wrote: >> make/autoconf/jvm-features.m4 line 309: >> >>> 307: if test "x$OPENJDK_TARGET_CPU" = "xx86_64"; then >>> 308: AC_MSG_RESULT([yes]) >>> 309: elif test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then >> >> You are missing the same change for JVM_FEATURES_CHECK_JVMCI. >> Unless it is done intentionally. > > It's done a couple lines below: https://github.com/openjdk/jdk/pull/685/files#diff-a09b08bcd422d0a8fb32a95ccf85051ac1e69bef2bd420d579f74d8efa286d2fL343 > > Or do you mean something else? Uhh. Sorry, I thought JVMCI definition will be first, before Graal and did not look below. ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From burban at openjdk.java.net Mon Nov 2 20:29:57 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 20:29:57 GMT Subject: Integrated: 8255766: Fix linux+arm64 build after 8254072 In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:04:53 GMT, Bernhard Urban-Forster wrote: > Fixing this problem: > > 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > Verified via a linux+arm64 slowdebug build and a windows+arm64 slowdebug build. This pull request has now been integrated. Changeset: bee864fb Author: Bernhard Urban-Forster Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/bee864fb Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8255766: Fix linux+arm64 build after 8254072 Reviewed-by: kvn, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/1013 From burban at openjdk.java.net Mon Nov 2 20:37:56 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 20:37:56 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 20:22:21 GMT, Vladimir Kozlov wrote: >> Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - add missing precompiled.hpp include >> - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 >> - rename argument to canUsePlatformRegister >> - comment for platformRegister >> - 8254827: JVMCI: Enable it for Windows+AArch64 >> >> Use r18 as allocatable register on Linux only. >> >> A bootstrap works now (it has been crashing before due to r18 being allocated): >> ```console >> $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version >> Bootstrapping JVMCI................................. in 17990 ms >> (compiled 3330 methods) >> openjdk version "16-internal" 2021-03-16 >> OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) >> ``` >> >> Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. > > Marked as reviewed by kvn (Reviewer). Thanks for the review Tom and Magnus, and for the comments to Vladimir and Doug ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From burban at openjdk.java.net Mon Nov 2 20:40:56 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 20:40:56 GMT Subject: RFR: 8255703: jaotc: Add Windows+Arm64 support In-Reply-To: <0kuY-CWrhC0HciZuOkVAxe-Ixai1PP2rFF-GY1ovjFw=.86cdaaa3-fe43-4e69-ad35-f43d15c933c0@github.com> References: <0kuY-CWrhC0HciZuOkVAxe-Ixai1PP2rFF-GY1ovjFw=.86cdaaa3-fe43-4e69-ad35-f43d15c933c0@github.com> Message-ID: On Fri, 30 Oct 2020 22:38:50 GMT, Vladimir Kozlov wrote: > You can contribute Graal (jdk.internal.vm.compiler) changes to GraalVM. Probably unnecessary as well. `AArch64HotSpotConstantRetrievalOp` is only used in the `jaotc` context. ------------- PR: https://git.openjdk.java.net/jdk/pull/972 From burban at openjdk.java.net Mon Nov 2 20:40:58 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 20:40:58 GMT Subject: Withdrawn: 8255703: jaotc: Add Windows+Arm64 support In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 22:07:34 GMT, Bernhard Urban-Forster wrote: > Quite bad timing given https://github.com/openjdk/jdk/pull/960 is happening, but I'm gonna publish those changes anyway. > > Tests aren't passing yet, that's the current result of `test/hotspot/jtreg:tier1_compiler_aot_jvmci`: > Test results: passed: 120; failed: 22; error: 8 > > Depends on https://github.com/openjdk/jdk/pull/685. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/972 From ngasson at openjdk.java.net Tue Nov 3 01:59:57 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 3 Nov 2020 01:59:57 GMT Subject: Integrated: 8254723: add diagnostic command to write Linux perf map file In-Reply-To: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Tue, 20 Oct 2020 09:27:45 GMT, Nick Gasson wrote: > When using the Linux "perf" tool to do system profiling, symbol names of > running Java methods cannot be decoded, resulting in unhelpful output > such as: > > 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 > > Perf can read a simple text file format describing the mapping between > address ranges and symbol names for a particular process [1]. > > It's possible to generate this already for Java processes using a JVMTI > plugin such as perf-map-agent [2]. However this requires compiling > third-party code and then loading the agent into your Java process. It > would be more convenient if Hotspot could write this file directly using > a diagnostic command. The information required is almost identical to > that of the existing Compiler.codelist command. > > This patch adds a Compiler.perfmap diagnostic command on Linux only. To > use, first run "jcmd Compiler.perfmap" and then "perf top" or > "perf record" and the report should show decoded Java symbol names for > that process. > > As this just writes a snapshot of the code cache when the command is > run, it will become stale if methods are compiled later or unloaded. > However this shouldn't be a big problem in practice if the map file is > generated after the application has warmed up. > > [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > [2] https://github.com/jvm-profiling-tools/perf-map-agent This pull request has now been integrated. Changeset: 50357d13 Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/50357d13 Stats: 171 lines in 8 files changed: 169 ins; 1 del; 1 mod 8254723: add diagnostic command to write Linux perf map file Reviewed-by: ysuenaga, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From thartmann at openjdk.java.net Tue Nov 3 07:23:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 3 Nov 2020 07:23:57 GMT Subject: RFR: 8255672: Replace PhaseTransform::eqv by pointer equality check [v2] In-Reply-To: <_4kVZ1J5tXaWgKsuK6iFENPAF3TOhp7Pysnbau5Cl48=.27a62ef0-d091-4d36-a2ab-c489880d2cd6@github.com> References: <_4kVZ1J5tXaWgKsuK6iFENPAF3TOhp7Pysnbau5Cl48=.27a62ef0-d091-4d36-a2ab-c489880d2cd6@github.com> Message-ID: <7Qs1Ck8vJ_ExLhZcNdL-g10gy5BpP2olel41JT1QtOs=.2569eb7c-3938-47f6-8e38-5a80cd77731f@github.com> On Mon, 2 Nov 2020 15:06:52 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Added missing whitespace > > Yes, I think that is reasonable to leave the code as it is and only do the clean-ups directly related to `PhaseTransform::eqv`. Looks good! @chhagedorn, @cl4es, @vnkozlov thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/999 From thartmann at openjdk.java.net Tue Nov 3 07:23:58 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 3 Nov 2020 07:23:58 GMT Subject: Integrated: 8255672: Replace PhaseTransform::eqv by pointer equality check In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 09:29:40 GMT, Tobias Hartmann wrote: > `PhaseTransform::eqv(n1, n2)` can be replaced by `n1 == n2`. Code in adlc also refers to it but already uses `==` instead. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 15805741 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/15805741 Stats: 138 lines in 10 files changed: 30 ins; 15 del; 93 mod 8255672: Replace PhaseTransform::eqv by pointer equality check Reviewed-by: chagedorn, redestad, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/999 From neliasso at openjdk.java.net Tue Nov 3 08:22:06 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 3 Nov 2020 08:22:06 GMT Subject: RFR: 8255011: UnexpectedDeoptimizationAllTest.java timed out Message-ID: Hi, This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. Please review, Nils Eliasson ------------- Commit messages: - 8255011: UnexpectedDeoptimizationAllTest.java timed out Changes: https://git.openjdk.java.net/jdk/pull/1030/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1030&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255011 Stats: 13 lines in 3 files changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1030/head:pull/1030 PR: https://git.openjdk.java.net/jdk/pull/1030 From rcastanedalo at openjdk.java.net Tue Nov 3 09:13:05 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Nov 2020 09:13:05 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Tue, 3 Nov 2020 08:55:14 GMT, Roberto Casta?eda Lozano wrote: > Complete and disambiguate the informal specification of the replay file syntax > given in the ciReplay class implementation. Complete and disambiguate the informal specification of the replay file syntax given in the ciReplay class implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/1033 From rcastanedalo at openjdk.java.net Tue Nov 3 09:13:05 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Nov 2020 09:13:05 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser Message-ID: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Complete and disambiguate the informal specification of the replay file syntax given in the ciReplay class implementation. ------------- Commit messages: - 8255797: ciReplay: improve documentation of replay file syntax in parser Changes: https://git.openjdk.java.net/jdk/pull/1033/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1033&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255797 Stats: 10 lines in 1 file changed: 3 ins; 2 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1033.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1033/head:pull/1033 PR: https://git.openjdk.java.net/jdk/pull/1033 From rcastanedalo at openjdk.java.net Tue Nov 3 09:13:05 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Nov 2020 09:13:05 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Tue, 3 Nov 2020 08:56:33 GMT, Roberto Casta?eda Lozano wrote: >> Complete and disambiguate the informal specification of the replay file syntax >> given in the ciReplay class implementation. > > Complete and disambiguate the informal specification of the replay file syntax > given in the ciReplay class implementation. Tested by building on linux-x64. ------------- PR: https://git.openjdk.java.net/jdk/pull/1033 From chagedorn at openjdk.java.net Tue Nov 3 10:27:05 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 3 Nov 2020 10:27:05 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v2] In-Reply-To: References: Message-ID: > The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): > https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 > > We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). > > My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. > ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Check dominance with pre loop head instead of tail ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/954/files - new: https://git.openjdk.java.net/jdk/pull/954/files/2a05f6c6..e9c99dcc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=00-01 Stats: 83 lines in 2 files changed: 21 ins; 13 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/954.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/954/head:pull/954 PR: https://git.openjdk.java.net/jdk/pull/954 From chagedorn at openjdk.java.net Tue Nov 3 10:27:07 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 3 Nov 2020 10:27:07 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v2] In-Reply-To: References: Message-ID: <5Jpkc4T-PNVCLiSAAHLCR8WTAf_KRFS8uyMmFq22Fnk=.2a3f7abb-1b62-4a90-b99a-342dd747f198@github.com> On Fri, 30 Oct 2020 17:13:41 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Check dominance with pre loop head instead of tail > > I would also suggest to run locally next jtreg command and compare number of created vector nodes to make sure your changes did not affect common cases: > > `$ jtreg -testjdk:/my_jdk/ -va -javaoptions:'-server -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+TraceNewVectors' -J-Djavatest.maxOutputSize=1000000 compiler/c2/cr6340864/ compiler/codegen/ compiler/loopopts/superword/ compiler/vectorization >new_vects.log` > > `$ grep "new Vector node:" new_vects.log|wc` @vnkozlov Thanks for your review! > $ jtreg -testjdk:/my_jdk/ -va -javaoptions:'-server -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+TraceNewVectors' -J-Djavatest.maxOutputSize=1000000 compiler/c2/cr6340864/ compiler/codegen/ compiler/loopopts/superword/ compiler/vectorization >new_vects.log > > $ grep "new Vector node:" new_vects.log|wc I ran that with the new commit - there is no difference between the fix and base without fix: $ grep "new Vector node:" new_vects_fix.log | wc 9439 227498 1941126 $ grep "new Vector node:" new_vects_base.log | wc 9437 227521 1941437 > src/hotspot/share/opto/superword.cpp line 3801: > >> 3799: } >> 3800: >> 3801: bool SWPointer::invariant_not_dominated_by_pre_loop_end(Node* n) { > > I think we should have only one invariant() method which does this additional check. > And have separate method is_main_loop_member() where currently we check !invariant(). That's a good idea, fixed. > src/hotspot/share/opto/superword.cpp line 3810: > >> 3808: // This happens, for example, when n_c is a CastII node that prevents data nodes to flow above the main loop and into >> 3809: // the pre loop. Use the cached version as the real pre loop end might not be found anymore with get_pre_loop_end(). >> 3810: return !phase()->is_dominator(_slp->cached_pre_loop_end(), n_c); > > I think it is not enough. We don't want invariant be inside pre-loop. > Invariant should be node outside original loop (before splitting and unrolling). > We should check that n_c dominates pre-loop head. > What do you think? I think that makes sense. I updated that. Tests still pass. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From rrich at openjdk.java.net Tue Nov 3 12:41:54 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 3 Nov 2020 12:41:54 GMT Subject: Integrated: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 21:53:10 GMT, Richard Reingruber wrote: > The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: > > EAPopFrameNotInlinedReallocFailure > EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure > EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure > > For ZGC (so far) this is not 100% reliable. > > Just ignoring the runs where the expected OOME was not raised was not accepted. > > Summary of the now accepted solution: > > - The 3 problematic test cases are skipped if ZGC is selected. > > - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. > > - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. > > - EATests.java is removed from the problem list for ZGC. This pull request has now been integrated. Changeset: 63461d59 Author: Richard Reingruber URL: https://git.openjdk.java.net/jdk/commit/63461d59 Stats: 65 lines in 2 files changed: 37 ins; 8 del; 20 mod 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown Reviewed-by: cjplummer, sspitsyn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/775 From kvn at openjdk.java.net Tue Nov 3 17:47:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 17:47:59 GMT Subject: RFR: 8255838: Use 32-bit immediate movslq in macro assembler if 64-bit value fits in 32 bits on x86_64 In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 15:12:50 GMT, Jorn Vernee wrote: > Currently, Macro::Assembler(Address dst, intprt_t src) on x64 uses an intermediate scratch register to store the 64-bit immediate. > > void MacroAssembler::movptr(Address dst, intptr_t src) { > mov64(rscratch1, src); > movq(dst, rscratch1); > } > > But, if the value fits into 32-bits, we can also explicitly use the 32-bit immediate overload, which saves an instruction and a register use, by using movslq/movabs instead (sig extended move). > > This ends up saving about 90k instructions on hello world. It also reduces the size of the interpreter by about 4k. > > Special thanks to Claes for prior discussion and help with testing. > > Testing: tier1-3, local startup profiling Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1038 From kvn at openjdk.java.net Tue Nov 3 18:11:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 18:11:59 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Tue, 3 Nov 2020 08:55:14 GMT, Roberto Casta?eda Lozano wrote: > Complete and disambiguate the informal specification of the replay file syntax > given in the ciReplay class implementation. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1033 From kvn at openjdk.java.net Tue Nov 3 18:24:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 18:24:02 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v2] In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 17:13:41 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Check dominance with pre loop head instead of tail > > I would also suggest to run locally next jtreg command and compare number of created vector nodes to make sure your changes did not affect common cases: > > `$ jtreg -testjdk:/my_jdk/ -va -javaoptions:'-server -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+TraceNewVectors' -J-Djavatest.maxOutputSize=1000000 compiler/c2/cr6340864/ compiler/codegen/ compiler/loopopts/superword/ compiler/vectorization >new_vects.log` > > `$ grep "new Vector node:" new_vects.log|wc` > @vnkozlov Thanks for your review! > > > $ jtreg -testjdk:/my_jdk/ -va -javaoptions:'-server -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+TraceNewVectors' -J-Djavatest.maxOutputSize=1000000 compiler/c2/cr6340864/ compiler/codegen/ compiler/loopopts/superword/ compiler/vectorization >new_vects.log > > $ grep "new Vector node:" new_vects.log|wc > > I ran that with the new commit - there is no difference between the fix and base without fix: > $ grep "new Vector node:" new_vects_fix.log | wc > 9439 227498 1941126 > $ grep "new Vector node:" new_vects_base.log | wc > 9437 227521 1941437 You have 2 more new vectors with fix! 9439 vs 9437 It is actually more than I tested last time: 9421 Good - no regression. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From kvn at openjdk.java.net Tue Nov 3 18:33:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 18:33:04 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:27:05 GMT, Christian Hagedorn wrote: >> The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): >> https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 >> >> We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). >> >> My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. >> ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Check dominance with pre loop head instead of tail Changes requested by kvn (Reviewer). src/hotspot/share/opto/superword.cpp line 3989: > 3987: } > 3988: } > 3989: if (!is_main_loop_member(n)) { Please, add comment here explaining why !is_main_loop_member() is used in this case instead of invariant(). Other places looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From kvn at openjdk.java.net Tue Nov 3 19:00:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 19:00:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: <3OBqYTJqjla1_OhTl3dXiNKljr3yyba_OIzxlNvHgnk=.ba8e45f8-34da-42e0-ae0c-e30197c2438b@github.com> On Fri, 23 Oct 2020 12:01:11 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 826: >> >>> 824: class VectorMaskGenNode : public TypeNode { >>> 825: public: >>> 826: VectorMaskGenNode(Node* src, const Type* ty, const Type* ety): TypeNode(ty, 2), _elemType(ety) { >> >> Sorry, I don't quite understand the arguments here. What does 'src' mean to the mask? > > ty -> Node type , long in this case since for X86 mask register is 64 bit wide. > ety -> Mask element type, currently used during LoadVectorMasked/StoreVectorMasked idealization to compute the block sizes for constant masks and replace masked vector operations with non-masked if block size is equal to vector size. Src has been replaced by a better name "length" used for mask computation. Please, use meaningful names. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From burban at openjdk.java.net Tue Nov 3 19:07:58 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 3 Nov 2020 19:07:58 GMT Subject: Integrated: 8254827: JVMCI: Enable it for Windows+AArch64 In-Reply-To: References: Message-ID: On Thu, 15 Oct 2020 15:00:47 GMT, Bernhard Urban-Forster wrote: > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms > (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. This pull request has now been integrated. Changeset: 88ee9733 Author: Bernhard Urban-Forster Committer: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/88ee9733 Stats: 24 lines in 4 files changed: 15 ins; 0 del; 9 mod 8254827: JVMCI: Enable it for Windows+AArch64 Reviewed-by: ihse, never, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From kvn at openjdk.java.net Tue Nov 3 19:28:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 19:28:05 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 10:28:00 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Replacing explicit type checks with existing type checking routines > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8074: > 8072: break; > 8073: default: > 8074: assert(false,"Should not reach here."); Please, use fatal() here and print typ value. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8055: > 8053: > 8054: > 8055: void MacroAssembler::evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len) { Don't shorten words: typ - > type src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8079: > 8077: } > 8078: > 8079: void MacroAssembler::evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len) { typ -> type src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8098: > 8096: break; > 8097: default: > 8098: assert(false,"Should not reach here."); fatal() src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1093: > 1091: // AVX512 Unaligned > 1092: void evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len); > 1093: void evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len); typ -> type src/hotspot/cpu/x86/vm_version_x86.cpp line 1409: > 1407: ArrayCopyPartialInlineSize != 32 && > 1408: ArrayCopyPartialInlineSize != 64)) { > 1409: int pi_size = 0; What is 'pi'? src/hotspot/cpu/x86/vm_version_x86.cpp line 1410: > 1408: ArrayCopyPartialInlineSize != 64)) { > 1409: int pi_size = 0; > 1410: if (MaxVectorSize > 32 && AVX3Threshold == 0) { I think we can compare with 64 here because MaxVectorSize value is power of 2: (MaxVectorSize >= 64 src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: > 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { > 1422: ArrayCopyPartialInlineSize = MaxVectorSize; > 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); warning only if ArrayCopyPartialInlineSize is not default. src/hotspot/share/opto/arraycopynode.hpp line 184: > 182: static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase, ArrayCopyNode*& ac); > 183: > 184: static int get_partial_inline_vector_lane_count(BasicType type, int con_len); 'con' defined in Dictionary as 'an instance of deceiving or tricking someone'. Please, don't use short words which may confuse. src/hotspot/share/opto/macroArrayCopy.cpp line 202: > 200: bool PhaseMacroExpand::generate_partial_inlining_block(Node** ctrl, MergeMemNode** mem, const TypePtr* adr_type, > 201: RegionNode** exit_block, Node** result_memory, Node* length, > 202: Node* src_start, Node* dst_start, BasicType type) { I need more time to look on this method. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From bulasevich at openjdk.java.net Tue Nov 3 20:56:14 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Tue, 3 Nov 2020 20:56:14 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v5] In-Reply-To: References: Message-ID: > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: update comments and function name to make them clearer ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/511/files - new: https://git.openjdk.java.net/jdk/pull/511/files/ee95fc6e..775c3533 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=03-04 Stats: 30 lines in 1 file changed: 8 ins; 0 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From redestad at openjdk.java.net Tue Nov 3 20:58:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 3 Nov 2020 20:58:55 GMT Subject: RFR: 8255838: Use 32-bit immediate movslq in macro assembler if 64-bit value fits in 32 bits on x86_64 In-Reply-To: References: Message-ID: <0nRtpc9CMUyecx2YmnF0bCbdUXe4KvMFFdfpbCx7Z08=.7ea1fd92-809a-4c84-adbb-1ab07701b4e2@github.com> On Tue, 3 Nov 2020 15:12:50 GMT, Jorn Vernee wrote: > Currently, Macro::Assembler(Address dst, intprt_t src) on x64 uses an intermediate scratch register to store the 64-bit immediate. > > void MacroAssembler::movptr(Address dst, intptr_t src) { > mov64(rscratch1, src); > movq(dst, rscratch1); > } > > But, if the value fits into 32-bits, we can also explicitly use the 32-bit immediate overload, which saves an instruction and a register use, by using movslq/movabs instead (sig extended move). > > This ends up saving about 90k instructions on hello world. It also reduces the size of the interpreter by about 4k. > > Special thanks to Claes for prior discussion and help with testing. > > Testing: tier1-3, local startup profiling Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1038 From iignatyev at openjdk.java.net Tue Nov 3 22:17:58 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 3 Nov 2020 22:17:58 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 08:14:29 GMT, Nils Eliasson wrote: > Hi, > > This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. > > What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. > > I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. > > In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. > > In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. > > Please review, > Nils Eliasson Changes requested by iignatyev (Reviewer). test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java line 45: > 43: long timeout = Utils.adjustTimeout(Utils.DEFAULT_TEST_TIMEOUT); > 44: timeout *= 0.75; > 45: new TimeLimitedRunner(timeout, 2.0d, this::test).call(); why do you need `end_time`? won't it be enough to just set `timeout` to 60_000? ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From iignatyev at openjdk.java.net Tue Nov 3 22:25:00 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 3 Nov 2020 22:25:00 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 22:14:51 GMT, Igor Ignatyev wrote: >> Hi, >> >> This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. >> >> What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. >> >> I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. >> >> In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. >> >> In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. >> >> Please review, >> Nils Eliasson > > Changes requested by iignatyev (Reviewer). @lepestock, IIRC, you are/were working on fixing timeouts in some other code cache tests which use `compiler/codecache/stress/CodeCacheStressRunner`, so I believe this might be of interest to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From iignatyev at openjdk.java.net Tue Nov 3 22:52:57 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 3 Nov 2020 22:52:57 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 08:14:29 GMT, Nils Eliasson wrote: > Hi, > > This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. > > What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. > > I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. > > In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. > > In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. > > Please review, > Nils Eliasson @neliasso , could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From ihse at openjdk.java.net Tue Nov 3 22:59:01 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 3 Nov 2020 22:59:01 GMT Subject: RFR: 8255861: Also update jaotc.1 for JDK 16 Message-ID: <198z3XcXIAA8f_vbpy8Kq7t5m-yRENMRk8V2Gv4T9Rc=.6d48b32b-9746-45b2-b43e-22298beb5258@github.com> Unfortunately, I missed updating jaotc.1 in JDK-8255853, since it is no longer built by Oracle and we don't maintain the man page source as markdown any longer. This is not a reason for not updating the open troff file, however. ------------- Commit messages: - 8255861: Also update jaotc.1 for JDK 16 Changes: https://git.openjdk.java.net/jdk/pull/1045/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1045&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255861 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1045/head:pull/1045 PR: https://git.openjdk.java.net/jdk/pull/1045 From kvn at openjdk.java.net Tue Nov 3 23:31:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 23:31:54 GMT Subject: RFR: 8255861: Also update jaotc.1 for JDK 16 In-Reply-To: <198z3XcXIAA8f_vbpy8Kq7t5m-yRENMRk8V2Gv4T9Rc=.6d48b32b-9746-45b2-b43e-22298beb5258@github.com> References: <198z3XcXIAA8f_vbpy8Kq7t5m-yRENMRk8V2Gv4T9Rc=.6d48b32b-9746-45b2-b43e-22298beb5258@github.com> Message-ID: On Tue, 3 Nov 2020 22:51:43 GMT, Magnus Ihse Bursie wrote: > Unfortunately, I missed updating jaotc.1 in JDK-8255853, since it is no longer built by Oracle and we don't maintain the man page source as markdown any longer. > > This is not a reason for not updating the open troff file, however. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1045 From ihse at openjdk.java.net Tue Nov 3 23:56:58 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 3 Nov 2020 23:56:58 GMT Subject: Integrated: 8255861: Also update jaotc.1 for JDK 16 In-Reply-To: <198z3XcXIAA8f_vbpy8Kq7t5m-yRENMRk8V2Gv4T9Rc=.6d48b32b-9746-45b2-b43e-22298beb5258@github.com> References: <198z3XcXIAA8f_vbpy8Kq7t5m-yRENMRk8V2Gv4T9Rc=.6d48b32b-9746-45b2-b43e-22298beb5258@github.com> Message-ID: <6d9VY6UwFfrgStj9uNz6dY4hzgPrw5uVvwRxD6dCX28=.40b0fcfc-2383-48c6-b3e6-3bf96d8e5e17@github.com> On Tue, 3 Nov 2020 22:51:43 GMT, Magnus Ihse Bursie wrote: > Unfortunately, I missed updating jaotc.1 in JDK-8255853, since it is no longer built by Oracle and we don't maintain the man page source as markdown any longer. > > This is not a reason for not updating the open troff file, however. This pull request has now been integrated. Changeset: 2668d232 Author: Magnus Ihse Bursie URL: https://git.openjdk.java.net/jdk/commit/2668d232 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8255861: Also update jaotc.1 for JDK 16 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1045 From thartmann at openjdk.java.net Wed Nov 4 07:15:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 4 Nov 2020 07:15:55 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Tue, 3 Nov 2020 08:55:14 GMT, Roberto Casta?eda Lozano wrote: > Complete and disambiguate the informal specification of the replay file syntax > given in the ciReplay class implementation. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1033 From thartmann at openjdk.java.net Wed Nov 4 07:20:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 4 Nov 2020 07:20:56 GMT Subject: RFR: 8255838: Use 32-bit immediate movslq in macro assembler if 64-bit value fits in 32 bits on x86_64 In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 15:12:50 GMT, Jorn Vernee wrote: > Currently, Macro::Assembler(Address dst, intprt_t src) on x64 uses an intermediate scratch register to store the 64-bit immediate. > > void MacroAssembler::movptr(Address dst, intptr_t src) { > mov64(rscratch1, src); > movq(dst, rscratch1); > } > > But, if the value fits into 32-bits, we can also explicitly use the 32-bit immediate overload, which saves an instruction and a register use, by using movslq/movabs instead (sig extended move). > > This ends up saving about 90k instructions on hello world. It also reduces the size of the interpreter by about 4k. > > Special thanks to Claes for prior discussion and help with testing. > > Testing: tier1-3, local startup profiling Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1038 From rcastanedalo at openjdk.java.net Wed Nov 4 07:28:56 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Nov 2020 07:28:56 GMT Subject: RFR: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Wed, 4 Nov 2020 07:13:09 GMT, Tobias Hartmann wrote: >> Complete and disambiguate the informal specification of the replay file syntax >> given in the ciReplay class implementation. > > Marked as reviewed by thartmann (Reviewer). Thanks for reviewing Vladimir and Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/1033 From rcastanedalo at openjdk.java.net Wed Nov 4 07:34:54 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Nov 2020 07:34:54 GMT Subject: Integrated: 8255797: ciReplay: improve documentation of replay file syntax in parser In-Reply-To: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> References: <5RkKpm4MP2OequYmfLKTXjR76f5KeYLd596RWQkQXzQ=.615f9615-35d0-4b73-b6ea-e600e3945730@github.com> Message-ID: On Tue, 3 Nov 2020 08:55:14 GMT, Roberto Casta?eda Lozano wrote: > Complete and disambiguate the informal specification of the replay file syntax > given in the ciReplay class implementation. This pull request has now been integrated. Changeset: c7a2c245 Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c7a2c245 Stats: 10 lines in 1 file changed: 3 ins; 2 del; 5 mod 8255797: ciReplay: improve documentation of replay file syntax in parser Complete and disambiguate the informal specification of the replay file syntax given in the ciReplay class implementation. Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1033 From xxinliu at amazon.com Wed Nov 4 09:22:32 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 4 Nov 2020 09:22:32 +0000 Subject: Question about a few properties of ideal graph In-Reply-To: <87y2jjdiu9.fsf@redhat.com> References: <171D40B4-23CB-49E0-8850-4C4EF021C972@amazon.com> <87y2jjdiu9.fsf@redhat.com> Message-ID: <818864CB-1B57-46EA-86B8-B629C5F370E4@amazon.com> Hi, Roland, Thank you for taking time to explain those basic concepts. I do spend many hours to dive into the code, but your comments and pointers can help starters like me to connect dots into a graph. Many things of C2 don?t make sense until I can build up a knowledge graph. we owe you a debt of gratitude for educating us. ?On 11/2/20, 4:50 AM, "Roland Westrelin" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > 1. Useless. [1] defines an operation is useless if no operation uses its results, or if all uses of the results are dead (10.2) > Presumably, a node is useless if it?s not useful. Can I say identify_useful_nodes() is same as the definition above? > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L302 I would say yes. In the ideal graph, nodes that have no use are useless and can be removed. identify_useful_nodes() is applied after parsing to remove nodes that have no use because parsing leaves some behind it. > 2. Unreachable. In my understanding, a node is unreachable if control flow never be there. I feel this definition only fits in CFG. Is it still the same meaning in ideal graph? > > According to the code here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L744, > it looks like a node is unreachable if > 1) no use or > 2) its type is TOP or > 3) its control input node's is_top() is true. > > Is it complementary to Node::is_reachable_from_root()? To be honest, I don't understand why finding the root from a node by following uses using BFS is same thing as the control flow can reach it from the root. I would say 1) is not unreachable. It can still be reachable but it is useless. Not all nodes have a control input (AddNode for instance) so looking only at control inputs can't be sufficient. I would say, a node is unreachable if one of its required inputs is top or NULL. Region/Phi are special cases because they merge multiple control flows. So if one of their inputs become top but all other inputs are not, the Region/Phi is still reachable. I see, now I understand the following member function actually means "unreachable OR useless". bool Node::is_unreachable(PhaseIterGVN &igvn) const { assert(!is_Mach(), "doesn't work with MachNodes"); return outcnt() == 0 || (in(0) != NULL && in(0)->is_top()); } > 3. dead: dead is everywhere in c2. I feel it could refer to different things in different places. > 1. useless? e.g. Compile::_dead_node_list > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L326 > 2. no direct use e.g. outcnt() == 0 is dead. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/phaseX.hpp#L511 > 3. sometimes, I feel c2 removes a dead node because it's unreachable. > > Is there a definition of dead node in c2? Or dead/useless/unreachable are all same thing in ideal graph? Not sure if there's a consistent use of "dead" in the source base. > 4. top. What's semantic if a node's is_top() is true? Is it same thing as its type is TOP? Node::is_top() returns true for the unique top node (constant node that has type TOP). I don't think you can say the type TOP and the node top are the same thing. Let's say you have a AddNode with a top input. The AddNode now has type TOP which at GVN would lead that AddNode to be replaced by the top node. yes, thanks for confirming that. I found a TOP node is an ad-hoc node in a CU, whose Node::_out is NULL An interesting property of the top node is that Node::add_out() and Node::del_out() are nop's. It's convenient to use it in replace_xxx(old, C->top()). > In type lattice, TOP is vague to me too. I feel that the type of a node is TOP has a slight different meaning for different nodes. If the node is a CFG node, TOP seems to mean control flow can't reach to it. > If a value node whose type is TOP, I guess it means the value of this node is undefined. I am correct here? I don't think TOP has a different meaning for control nodes. One thing about the ideal graph is that control flow is often treated as just another dependence. I see. I really need to learn to treat a control as a special "value" in ideal graph. > 5. root node > Can I say the root node of each compilation unit(CU) is the beginning and the end of that CU? I think so. > So far, I feel the inputs of a root node are return values and side effect. It that correct? Side effect is a bit vague, I guess. Return and other exits from the method such as deoptimization. Maybe there are other edges added from the root node, maybe temporarily, to keep some nodes from becoming useless. Not sure. I use sideeffect here to represent i/o state and mem state. RootNode has 'i/o' and mem input nodes. I believe they represent that external side effect of current CU execution. > If I traverse uses of nodes from root, I should return to root eventually? if yes, it means ideal graph is a DAG. Yes, you should. But it's not a DAG anyway because there can be loops. Got it! Roland. From chagedorn at openjdk.java.net Wed Nov 4 13:31:06 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 4 Nov 2020 13:31:06 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v3] In-Reply-To: References: Message-ID: > The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): > https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 > > We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). > > My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. > ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update comments and invariant selection in offset_plus_k ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/954/files - new: https://git.openjdk.java.net/jdk/pull/954/files/e9c99dcc..6eff991c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=01-02 Stats: 10 lines in 1 file changed: 4 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/954.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/954/head:pull/954 PR: https://git.openjdk.java.net/jdk/pull/954 From chagedorn at openjdk.java.net Wed Nov 4 13:31:07 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 4 Nov 2020 13:31:07 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 18:30:10 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Check dominance with pre loop head instead of tail > > src/hotspot/share/opto/superword.cpp line 3989: > >> 3987: } >> 3988: } >> 3989: if (!is_main_loop_member(n)) { > > Please, add comment here explaining why !is_main_loop_member() is used in this case instead of invariant(). > Other places looks good. I updated it and also moved the following `invariant()` check into the if block as it can only hold if `!is_main_loop_member()` is true. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From redestad at openjdk.java.net Wed Nov 4 15:22:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 15:22:01 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled Message-ID: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) - (trivial pile-on) remove unused set_word_if_not_zero Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. ------------- Commit messages: - Fix copy-paste error - add checked variants of _verify_oop to avoid calls in product builds - Merge branch 'master' into verify_oop_stub - Cleanup verify_oop Changes: https://git.openjdk.java.net/jdk/pull/1058/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1058&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255900 Stats: 27 lines in 4 files changed: 13 ins; 9 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1058.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1058/head:pull/1058 PR: https://git.openjdk.java.net/jdk/pull/1058 From neliasso at openjdk.java.net Wed Nov 4 15:59:10 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 15:59:10 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. > > What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. > > I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. > > In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. > > In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. > > Please review, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with three additional commits since the last revision: - changed wait time - removed file - simplify_timeout ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1030/files - new: https://git.openjdk.java.net/jdk/pull/1030/files/cf3ec2cc..5050a2ef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1030&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1030&range=00-01 Stats: 8 lines in 2 files changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1030/head:pull/1030 PR: https://git.openjdk.java.net/jdk/pull/1030 From neliasso at openjdk.java.net Wed Nov 4 15:59:10 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 15:59:10 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 22:49:50 GMT, Igor Ignatyev wrote: > @neliasso , > > could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? I was experimenting with different levels of contention. 10 millis + Xcomp gets 30% fewer methods compiled, but in all other cases 10 millis results in more compilations. It's a toss for me. I reverted since it doesn't really matter. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From neliasso at openjdk.java.net Wed Nov 4 15:59:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 15:59:11 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 22:14:02 GMT, Igor Ignatyev wrote: >> Nils Eliasson has updated the pull request incrementally with three additional commits since the last revision: >> >> - changed wait time >> - removed file >> - simplify_timeout > > test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java line 45: > >> 43: long timeout = Utils.adjustTimeout(Utils.DEFAULT_TEST_TIMEOUT); >> 44: timeout *= 0.75; >> 45: new TimeLimitedRunner(timeout, 2.0d, this::test).call(); > > why do you need `end_time`? won't it be enough to just set `timeout` to 60_000? I had a much more complex version at first. I realize that now it is equivalent to just setting 60 secs. I have fixed in in the update. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From neliasso at openjdk.java.net Wed Nov 4 16:28:54 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 16:28:54 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: <88u6xCsIwiZ7Sq549-tTRU8IKhjPc8gXcuMk7LvW5o0=.2350d90c-dc8a-4070-8b36-105097fa5068@github.com> On Wed, 4 Nov 2020 14:49:41 GMT, Claes Redestad wrote: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1058 From adinn at openjdk.java.net Wed Nov 4 17:44:58 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 4 Nov 2020 17:44:58 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v5] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 20:56:14 GMT, Boris Ulasevich wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > update comments and function name to make them clearer src/hotspot/share/opto/addnode.cpp line 872: > 870: return new BitfieldInsertINode(dst, value, phase->intcon(offset), phase->intcon(width)); > 871: } > 872: } This code and its accompanying comments need to be made much clearer: - `dst` and `src` is a rather perverse choice of names for the inputs of the Or node. `l` and `r` or `left` and `right`, following the convention in the preceding code, would be better. - You mention `dst` in the comment to identify it as the left `Or` input but do not clearly identify `src` with the two alternative matched patterns for the the right hand side of the `Or`. - The name `shift `is used in those two patterns for an operand that actually has to be a constant bit mask for the transformation to be applicable (so why not use `mask`, `1s_mask` or `constmask`?). - Your comment incoherently employs different notations: i.e. you use a term for the `BitfieldInsert` expression and refer to the outermost node using the term name `Or` but you specify the patterns using the infix C language operators `&` and `<<`. - The qualifiying comment 'if the Or argument value range masks do not overlap' states a condition for the replacement without properly explaining the meaning of that condition i.e. that the various subexpressions operate on disjoint bitfields of the integral value being computed. - You use var `mask` for the const mask node and then reuse it for the constant value it identifies (directly after using `mask` to compute `width` which is used to define `mask`, making it look like mask is used to define itself). - Most importantly, the introductory comment does not provide a clear summary of what sort of graph shape is being replaced and how the match and bit range constraints legitimize that replacement. I would suggest the following as a replacement: if (can_reshape && !phase->C->major_progress() && Matcher::match_rule_supported(Op_BitfieldInsertI)) { // If the right input of this Or is an And with mask or an LShifted // And with mask and the left and right inputs can be determined // to construct values lying in disjoint bit ranges then the Or // can be replaced with BitfieldInsert. // // There are two substitution rules: // // 1) (Or left (And value mask)) => (BitfieldInsert left value width 0)) // where width == bitcount(mask) AND // (value_range_mask(left) & mask) == 0 // // 2) (Or left (LShift (And value mask) offset) => (BitfieldInsert left value width 0) // where width == bitcount(mask) AND // (value_range_mask(left) & (mask << offset)) == 0 // n.b. // mask is an integer constant comprising a contiguous sequence of 1s // value_range_mask(node) computes a mask identifying the range of bits // that could be set by its argument Node *left = in(1); Node *right = in(2); Node *andi = NULL; int offset = 0; if (right->Opcode() == Op_LShiftI && right->in(1)->Opcode() == Op_AndI && right->in(2)->is_Con()) { andi = right->in(1); offset = right->in(2)->get_int(); } else if (right->Opcode() == Op_AndI) { andi = right; } if (andi != NULL) { Node* mask = andi->in(2); if (mask->is_Con() && is_power_of_2(mask->get_int() + 1)) { Node* value = andi->in(1); int width = exact_log2(mask->get_int() + 1); int maskval = ((1 << width) - 1) << offset; if (width + offset <= 32 && ((value_range_mask(phase, left) & maskval) == 0)) { return new BitfieldInsertINode(left, value, phase->intcon(offset), phase->intcon(width)); } } Note that I am reusing the SEXPR format used in the ad files to describe the graph patterns and using (C-like) pseudo-code to define the substitution conditions. src/hotspot/share/opto/addnode.cpp line 963: > 961: } > 962: } > 963: } This needs reworking as above. I suggest you comment it by referring the reader back to the previous method. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From minqi at openjdk.java.net Wed Nov 4 17:44:56 2020 From: minqi at openjdk.java.net (Yumin Qi) Date: Wed, 4 Nov 2020 17:44:56 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: On Wed, 4 Nov 2020 14:49:41 GMT, Claes Redestad wrote: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. Why only for x86_64 not including other platforms? ------------- PR: https://git.openjdk.java.net/jdk/pull/1058 From adinn at openjdk.java.net Wed Nov 4 17:56:57 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 4 Nov 2020 17:56:57 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 17:40:12 GMT, Andrew Dinn wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments and function name to make them clearer > > src/hotspot/share/opto/addnode.cpp line 872: > >> 870: return new BitfieldInsertINode(dst, value, phase->intcon(offset), phase->intcon(width)); >> 871: } >> 872: } > > This code and its accompanying comments need to be made much clearer: > > - `dst` and `src` is a rather perverse choice of names for the inputs of the Or node. `l` and `r` or `left` and `right`, following the convention in the preceding code, would be better. > > - You mention `dst` in the comment to identify it as the left `Or` input but do not clearly identify `src` with the two alternative matched patterns for the the right hand side of the `Or`. > > - The name `shift `is used in those two patterns for an operand that actually has to be a constant bit mask for the transformation to be applicable (so why not use `mask`, `1s_mask` or `constmask`?). > > - Your comment incoherently employs different notations: i.e. you use a term for the `BitfieldInsert` expression and refer to the outermost node using the term name `Or` but you specify the patterns using the infix C language operators `&` and `<<`. > > - The qualifiying comment 'if the Or argument value range masks do not overlap' states a condition for the replacement without properly explaining the meaning of that condition i.e. that the various subexpressions operate on disjoint bitfields of the integral value being computed. > > - You use var `mask` for the const mask node and then reuse it for the constant value it identifies (directly after using `mask` to compute `width` which is used to define `mask`, making it look like mask is used to define itself). > > - Most importantly, the introductory comment does not provide a clear summary of what sort of graph shape is being replaced and how the match and bit range constraints legitimize that replacement. > > I would suggest the following as a replacement: > > if (can_reshape && !phase->C->major_progress() && Matcher::match_rule_supported(Op_BitfieldInsertI)) { > // If the right input of this Or is an And with mask or an LShifted > // And with mask and the left and right inputs can be determined > // to construct values lying in disjoint bit ranges then the Or > // can be replaced with BitfieldInsert. > // > // There are two substitution rules: > // > // 1) (Or left (And value mask)) => (BitfieldInsert left value width 0)) > // where width == bitcount(mask) AND > // (value_range_mask(left) & mask) == 0 > // > // 2) (Or left (LShift (And value mask) offset) => (BitfieldInsert left value width 0) > // where width == bitcount(mask) AND > // (value_range_mask(left) & (mask << offset)) == 0 > // n.b. > // mask is an integer constant comprising a contiguous sequence of 1s > // value_range_mask(node) computes a mask identifying the range of bits > // that could be set by its argument > > Node *left = in(1); > Node *right = in(2); > Node *andi = NULL; > int offset = 0; > > if (right->Opcode() == Op_LShiftI && right->in(1)->Opcode() == Op_AndI && right->in(2)->is_Con()) { > andi = right->in(1); > offset = right->in(2)->get_int(); > } else if (right->Opcode() == Op_AndI) { > andi = right; > } > if (andi != NULL) { > Node* mask = andi->in(2); > if (mask->is_Con() && is_power_of_2(mask->get_int() + 1)) { > Node* value = andi->in(1); > int width = exact_log2(mask->get_int() + 1); > int maskval = ((1 << width) - 1) << offset; > if (width + offset <= 32 && ((value_range_mask(phase, left) & maskval) == 0)) { > return new BitfieldInsertINode(left, value, phase->intcon(offset), phase->intcon(width)); > } > } > > Note that I am reusing the SEXPR format used in the ad files to describe the graph patterns and using (C-like) pseudo-code to define the substitution conditions. Hmm, that is not quite right because in case 2 you are also checking that the shift is constant. So, the overall description should change to say // If the right input of this Or is an And with mask or an LShifted // And with mask with constant shift and the left and right inputs // can be determined . . . ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From jvernee at openjdk.java.net Wed Nov 4 18:12:57 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 4 Nov 2020 18:12:57 GMT Subject: Integrated: 8255838: Use 32-bit immediate movslq in macro assembler if 64-bit value fits in 32 bits on x86_64 In-Reply-To: References: Message-ID: <-QogyhUBZmh1XoX_wQtGgQI3oqYp3VbxyloVP_wRnHE=.cfc76426-043b-4c7c-b0d1-ee3e86d6c26b@github.com> On Tue, 3 Nov 2020 15:12:50 GMT, Jorn Vernee wrote: > Currently, Macro::Assembler(Address dst, intprt_t src) on x64 uses an intermediate scratch register to store the 64-bit immediate. > > void MacroAssembler::movptr(Address dst, intptr_t src) { > mov64(rscratch1, src); > movq(dst, rscratch1); > } > > But, if the value fits into 32-bits, we can also explicitly use the 32-bit immediate overload, which saves an instruction and a register use, by using movslq/movabs instead (sig extended move). > > This ends up saving about 90k instructions on hello world. It also reduces the size of the interpreter by about 4k. > > Special thanks to Claes for prior discussion and help with testing. > > Testing: tier1-3, local startup profiling This pull request has now been integrated. Changeset: 160759ce Author: Jorn Vernee URL: https://git.openjdk.java.net/jdk/commit/160759ce Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8255838: Use 32-bit immediate movslq in macro assembler if 64-bit value fits in 32 bits on x86_64 Reviewed-by: azeemj, kvn, redestad, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1038 From redestad at openjdk.java.net Wed Nov 4 18:56:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 18:56:55 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: On Wed, 4 Nov 2020 17:42:30 GMT, Yumin Qi wrote: > Why only for x86_64 not including other platforms? It's uneasy changing platforms I don't have access to to test. Generally it seems more appropriate that someone with access ports the changes. This case might be trivial enough to port without testing, but I looked at aarch64 which does not use the same macro structure, for example, so I backed away from attempting that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1058 From minqi at openjdk.java.net Wed Nov 4 19:10:00 2020 From: minqi at openjdk.java.net (Yumin Qi) Date: Wed, 4 Nov 2020 19:10:00 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: On Wed, 4 Nov 2020 14:49:41 GMT, Claes Redestad wrote: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. Then looks good to me. ------------- Marked as reviewed by minqi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1058 From kvn at openjdk.java.net Wed Nov 4 19:24:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 4 Nov 2020 19:24:57 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: <1quZg5MPENLj0Nlu32wA3emIPtcObY4IVBYCxYsI-wY=.b987385a-fa10-4a5a-a37a-b3816651aeb8@github.com> On Wed, 4 Nov 2020 14:49:41 GMT, Claes Redestad wrote: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1058 From enikitin at openjdk.java.net Wed Nov 4 20:02:54 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 4 Nov 2020 20:02:54 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 15:49:22 GMT, Nils Eliasson wrote: >> @neliasso , >> >> could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? > >> @neliasso , >> >> could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? > > I was experimenting with different levels of contention. 10 millis + Xcomp gets 30% fewer methods compiled, but in all other cases 10 millis results in more compilations. It's a toss for me. > > I reverted since it doesn't really matter. @neliasso, could you please explain how those compile storms cause timeouts? As far as I could find, the JVM doesn't wait for the threads to finish, it gives them approx. [10 seconds](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l388), and then just [exits](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l424). So we only have to make sure that our 20% is larger than 10s + some reasonable margin, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From redestad at openjdk.java.net Wed Nov 4 20:41:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 20:41:14 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled [v2] In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Revert arm/aarch64 changes - Merge branch 'master' into verify_oop_stub - Apply similar improvements on arm/aarch64 - Fix copy-paste error - add checked variants of _verify_oop to avoid calls in product builds - Merge branch 'master' into verify_oop_stub - Cleanup verify_oop ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1058/files - new: https://git.openjdk.java.net/jdk/pull/1058/files/0e083d99..5407d476 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1058&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1058&range=00-01 Stats: 1820 lines in 271 files changed: 737 ins; 712 del; 371 mod Patch: https://git.openjdk.java.net/jdk/pull/1058.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1058/head:pull/1058 PR: https://git.openjdk.java.net/jdk/pull/1058 From neliasso at openjdk.java.net Wed Nov 4 20:55:58 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 20:55:58 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 15:49:22 GMT, Nils Eliasson wrote: >> @neliasso , >> >> could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? > >> @neliasso , >> >> could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? > > I was experimenting with different levels of contention. 10 millis + Xcomp gets 30% fewer methods compiled, but in all other cases 10 millis results in more compilations. It's a toss for me. > > I reverted since it doesn't really matter. > @neliasso, could you please explain how those compile storms cause timeouts? As far as I could find, the JVM doesn't wait for the threads to finish, it gives them approx. [10 seconds](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l388), and then just [exits](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l424). I don't know exactly since 1) I can't reproduce it 2) I didn't get any core or thread dump from the failure. What I do know is 1) The failure was on a slow platform 2) It was probably running concurrently with other tests 3) It was running with -Xcomp Using JFR and jcmd I have seen - 1) very long compile queues (> 500 for C1) I did some measurements on workload too. In normal mode 6500 iterations of CodeCacheStressRunner was performed in 30 sec, with -Xcomp only 250. But in total compiles -Xcomp had 30% less compilations. What I also do know is that the CodeCache lock will be contended by the compile threads and the test thread that is invalidating everything in the cache. With long compile queues the MethodCompileQueue lock can also become contented because the time held is proportional to the number of compile tasks in the queue. In register_method both locks are held. > > So we only have to make sure that our 20% is larger than 10s + some reasonable margin, right? > Yes. I suggest 60 seconds test time and minimum of 60 seconds margin. I also suggest to not scale the test time with the timeout fraction since the run times becomes excessive. I experimented with keeping the CodeCaceStressRunner iterations constant but the different tests have completely different profiles and the variants are too many. 60 seconds is still a huge amount of compiles and invalidations. (times 2 since all tests are run with and without segmented code cache). ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From iignatyev at openjdk.java.net Wed Nov 4 21:05:54 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 4 Nov 2020 21:05:54 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 20:53:25 GMT, Nils Eliasson wrote: >>> @neliasso , >>> >>> could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? >> >> I was experimenting with different levels of contention. 10 millis + Xcomp gets 30% fewer methods compiled, but in all other cases 10 millis results in more compilations. It's a toss for me. >> >> I reverted since it doesn't really matter. > >> @neliasso, could you please explain how those compile storms cause timeouts? As far as I could find, the JVM doesn't wait for the threads to finish, it gives them approx. [10 seconds](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l388), and then just [exits](http://hg.openjdk.java.net/jdk/jdk/file/ee1d592a9f53/src/hotspot/share/runtime/vmOperations.cpp#l424). > > I don't know exactly since 1) I can't reproduce it 2) I didn't get any core or thread dump from the failure. > > What I do know is 1) The failure was on a slow platform 2) It was probably running concurrently with other tests 3) It was running with -Xcomp > > Using JFR and jcmd I have seen - 1) very long compile queues (> 500 for C1) > > I did some measurements on workload too. In normal mode 6500 iterations of CodeCacheStressRunner was performed in 30 sec, with -Xcomp only 250. But in total compiles -Xcomp had 30% less compilations. > > What I also do know is that the CodeCache lock will be contended by the compile threads and the test thread that is invalidating everything in the cache. With long compile queues the MethodCompileQueue lock can also become contented because the time held is proportional to the number of compile tasks in the queue. In register_method both locks are held. > >> >> So we only have to make sure that our 20% is larger than 10s + some reasonable margin, right? >> > > Yes. I suggest 60 seconds test time and minimum of 60 seconds margin. > I also suggest to not scale the test time with the timeout fraction since the run times becomes excessive. I experimented with keeping the CodeCaceStressRunner iterations constant but the different tests have completely different profiles and the variants are too many. 60 seconds is still a huge amount of compiles and invalidations. (times 2 since all tests are run with and without segmented code cache). > > @neliasso , > > could you please explain why in `UnexpectedDeoptimizationAllTest` you "have adjusted the sleep time to 100 millis between the invalidations of the entire code cache"? > > I was experimenting with different levels of contention. 10 millis + Xcomp gets 30% fewer methods compiled, but in all other cases 10 millis results in more compilations. It's a toss for me. > > I reverted since it doesn't really matter. ?? , but could you please add comments to both `sleep` so the future readers would know why they are there? ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From iignatyev at openjdk.java.net Wed Nov 4 21:11:03 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 4 Nov 2020 21:11:03 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> On Wed, 4 Nov 2020 20:53:25 GMT, Nils Eliasson wrote: >... In normal mode 6500 iterations of CodeCacheStressRunner was performed in 30 sec, with -Xcomp only 250. But in total compiles -Xcomp had 30% less compilations... would it make sense to update `CodeCacheStressRunner` and/or individual tests to count iterations and have a reasonable (test-specific?) threshold below which a test isn't considered passed, but instead considered skipped (ie throws `jtreg.SkippedException`). this way we would know if tests run enough iterations in the allocated amount of time, and if they don't in some configuration, we will have data to make proper decision on either increasing time budget, decreasing threshold, or excluding configurations. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From redestad at openjdk.java.net Wed Nov 4 21:42:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 21:42:59 GMT Subject: RFR: 8255900: x86: Reduce impact when VerifyOops is disabled [v2] In-Reply-To: <1quZg5MPENLj0Nlu32wA3emIPtcObY4IVBYCxYsI-wY=.b987385a-fa10-4a5a-a37a-b3816651aeb8@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> <1quZg5MPENLj0Nlu32wA3emIPtcObY4IVBYCxYsI-wY=.b987385a-fa10-4a5a-a37a-b3816651aeb8@github.com> Message-ID: On Wed, 4 Nov 2020 19:22:33 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Revert arm/aarch64 changes >> - Merge branch 'master' into verify_oop_stub >> - Apply similar improvements on arm/aarch64 >> - Fix copy-paste error >> - add checked variants of _verify_oop to avoid calls in product builds >> - Merge branch 'master' into verify_oop_stub >> - Cleanup verify_oop > > Good. @vnkozlov @neliasso @yminqi - thank you for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1058 From redestad at openjdk.java.net Wed Nov 4 21:43:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 21:43:00 GMT Subject: Integrated: 8255900: x86: Reduce impact when VerifyOops is disabled In-Reply-To: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> References: <4KIoPcjQ3ibI-BXMRVwuBHrVRekcsQDaEepwo2gIrmc=.7704e62b-771e-451f-83ea-8875524a064f@github.com> Message-ID: On Wed, 4 Nov 2020 14:49:41 GMT, Claes Redestad wrote: > - Don't generate stub _verify_oop_subroutine_entry on startup if -VerifyOops > - Make VerifyOops check in calls to MacroAssembler::verify_oop inlineable. (VerifyOops is a develop flag, so this allows more aggressive code elimination in a few place) > - (trivial pile-on) remove unused set_word_if_not_zero > > Slightly improves static footprint, gets rid of a stub and reduce jitter during jitting in product builds. This pull request has now been integrated. Changeset: a0ade220 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/a0ade220 Stats: 27 lines in 4 files changed: 13 ins; 9 del; 5 mod 8255900: x86: Reduce impact when VerifyOops is disabled Reviewed-by: neliasso, minqi, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1058 From neliasso at openjdk.java.net Wed Nov 4 21:58:56 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 4 Nov 2020 21:58:56 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> References: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> Message-ID: <4PgK8q-Ol6Mp3CvzR77QnabQYk6E1-fJUz54zx83-IE=.cbd2429f-9f41-4a3e-894b-37cd16ef0cb3@github.com> On Wed, 4 Nov 2020 21:08:30 GMT, Igor Ignatyev wrote: > > ... In normal mode 6500 iterations of CodeCacheStressRunner was performed in 30 sec, with -Xcomp only 250. But in total compiles -Xcomp had 30% less compilations... > > would it make sense to update `CodeCacheStressRunner` and/or individual tests to count iterations and have a reasonable (test-specific?) threshold below which a test isn't considered passed, but instead considered skipped (ie throws `jtreg.SkippedException`). this way we would know if tests run enough iterations in the allocated amount of time, and if they don't in some configuration, we will have data to make proper decision on either increasing time budget, decreasing threshold, or excluding configurations. > > -- Igor I experimented with couting the CodeCaceStressRunner iterations but the different tests have completely different profiles and the variants are many - 4 tests, 2 modes (Xcomp) that all need separate values. I would like to see that we added thread dumps and core files on timeouts. Then we can experiment with lower timeouts since it will be possible to diagnose them quickly. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From redestad at openjdk.java.net Wed Nov 4 22:43:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 22:43:03 GMT Subject: RFR: 8255909: Remove unused delayed_value methods Message-ID: Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) Clean up and remove. Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. ------------- Commit messages: - Remove unused delayed_value methods Changes: https://git.openjdk.java.net/jdk/pull/1065/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1065&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255909 Stats: 181 lines in 15 files changed: 0 ins; 181 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1065.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1065/head:pull/1065 PR: https://git.openjdk.java.net/jdk/pull/1065 From iignatyev at openjdk.java.net Wed Nov 4 22:51:03 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 4 Nov 2020 22:51:03 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: <4PgK8q-Ol6Mp3CvzR77QnabQYk6E1-fJUz54zx83-IE=.cbd2429f-9f41-4a3e-894b-37cd16ef0cb3@github.com> References: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> <4PgK8q-Ol6Mp3CvzR77QnabQYk6E1-fJUz54zx83-IE=.cbd2429f-9f41-4a3e-894b-37cd16ef0cb3@github.com> Message-ID: On Wed, 4 Nov 2020 21:56:27 GMT, Nils Eliasson wrote: > would like to see that we added thread dumps and core files on timeouts. we actually do that since JDK 9, [JEP 279](https://bugs.openjdk.java.net/browse/JDK-8075621) added the failure handler which runs many things including `jstack`, `gdb -ex thread apply all backtrace` (or its equivalent) to get java and native thread dumps and `kill -ABRT` to generate core files on timeouts. however b/c it's run concurrently to a timed out process, you don't always get the data. so I'm wondering what you meant. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From minqi at openjdk.java.net Wed Nov 4 23:46:55 2020 From: minqi at openjdk.java.net (Yumin Qi) Date: Wed, 4 Nov 2020 23:46:55 GMT Subject: RFR: 8255909: Remove unused delayed_value methods In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 22:12:29 GMT, Claes Redestad wrote: > Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) > > Clean up and remove. > > Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. Looks good to me. ------------- Marked as reviewed by minqi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1065 From coleenp at openjdk.java.net Thu Nov 5 00:08:55 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 00:08:55 GMT Subject: RFR: 8255909: Remove unused delayed_value methods In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 22:12:29 GMT, Claes Redestad wrote: > Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) > > Clean up and remove. > > Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. Wow, this seems to be code left over from sparc. Looks good, and trivial if it builds. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1065 From dholmes at openjdk.java.net Thu Nov 5 00:43:09 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 5 Nov 2020 00:43:09 GMT Subject: RFR: 8255563: Missing NULL checks after JDK-8233624 Message-ID: JDK-8233624 introduced a check for illegal native method names that will result in the `NativeLookup::*_jni_name` methods returning NULL. JVMCI `registerNativeMethods` uses these APIs without any check for a NULL return, however the methods being registered are part of the JVMCI implementation so should never have an illegal name, so adding a guarantee suffices to deal with that (see comments in JBS issue). ------------- Commit messages: - Fixed JDK-8255563 Changes: https://git.openjdk.java.net/jdk/pull/1068/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1068&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255563 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1068.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1068/head:pull/1068 PR: https://git.openjdk.java.net/jdk/pull/1068 From kvn at openjdk.java.net Thu Nov 5 01:09:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 01:09:54 GMT Subject: RFR: 8255563: Missing NULL checks after JDK-8233624 In-Reply-To: References: Message-ID: <_Ja8OuAy8MhCBbMHcd8OjwHYfvbVwzIvINlrqQSy5Go=.b341cc4d-dcbf-43bd-b315-eb425247ba63@github.com> On Thu, 5 Nov 2020 00:37:30 GMT, David Holmes wrote: > JDK-8233624 introduced a check for illegal native method names that will result in the `NativeLookup::*_jni_name` methods returning NULL. JVMCI `registerNativeMethods` uses these APIs without any check for a NULL return, however the methods being registered are part of the JVMCI implementation so should never have an illegal name, so adding a guarantee suffices to deal with that (see comments in JBS issue). Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1068 From dholmes at openjdk.java.net Thu Nov 5 04:41:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 5 Nov 2020 04:41:56 GMT Subject: RFR: 8255563: Missing NULL checks after JDK-8233624 In-Reply-To: <_Ja8OuAy8MhCBbMHcd8OjwHYfvbVwzIvINlrqQSy5Go=.b341cc4d-dcbf-43bd-b315-eb425247ba63@github.com> References: <_Ja8OuAy8MhCBbMHcd8OjwHYfvbVwzIvINlrqQSy5Go=.b341cc4d-dcbf-43bd-b315-eb425247ba63@github.com> Message-ID: On Thu, 5 Nov 2020 01:07:24 GMT, Vladimir Kozlov wrote: >> JDK-8233624 introduced a check for illegal native method names that will result in the `NativeLookup::*_jni_name` methods returning NULL. JVMCI `registerNativeMethods` uses these APIs without any check for a NULL return, however the methods being registered are part of the JVMCI implementation so should never have an illegal name, so adding a guarantee suffices to deal with that (see comments in JBS issue). > > Looks good. @dougxc could you review this please. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1068 From kvn at openjdk.java.net Thu Nov 5 05:06:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 05:06:55 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 04:32:30 GMT, David Holmes wrote: > This all looks fine, but can you simply replace UNSUPPORTED_OPTION_NULL with UNSUPPORTED_OPTION_INIT and update the two existing usages, so that we don't have three versions of the macro? Okay. I agree with that. But, please, note it will change current VM behavior for these flags. UNSUPPORTED_OPTION_INIT macro unconditionally set flag's value without checking current value and it gives warning regardless value specified on command line. UNSUPPORTED_OPTION_NULL macro does not give warning if flag's value specified on command line is NULL. ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From david.holmes at oracle.com Thu Nov 5 06:02:51 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Nov 2020 16:02:51 +1000 Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: <1b9a2b3c-b9e2-b8aa-a9c9-867f8f3986dd@oracle.com> On 5/11/2020 3:06 pm, Vladimir Kozlov wrote: > On Thu, 5 Nov 2020 04:32:30 GMT, David Holmes wrote: > >> This all looks fine, but can you simply replace UNSUPPORTED_OPTION_NULL with UNSUPPORTED_OPTION_INIT and update the two existing usages, so that we don't have three versions of the macro? > > Okay. I agree with that. > > But, please, note it will change current VM behavior for these flags. > UNSUPPORTED_OPTION_INIT macro unconditionally set flag's value without checking current value and it gives warning regardless value specified on command line. > UNSUPPORTED_OPTION_NULL macro does not give warning if flag's value specified on command line is NULL. Sorry I missed that subtlety in how the macros are expressed. Please leave as is. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1071 > From dholmes at openjdk.java.net Thu Nov 5 06:05:54 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 5 Nov 2020 06:05:54 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 04:21:01 GMT, Vladimir Kozlov wrote: > Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > java version "16-internal" 2021-03-16 > > It should give warning: > > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM > java version "16-internal" 2021-03-16 Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From kvn at openjdk.java.net Thu Nov 5 06:11:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 06:11:54 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 06:03:27 GMT, David Holmes wrote: >> Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> java version "16-internal" 2021-03-16 >> >> It should give warning: >> >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM >> java version "16-internal" 2021-03-16 > > Marked as reviewed by dholmes (Reviewer). Thank you, David. ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From thartmann at openjdk.java.net Thu Nov 5 07:40:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 5 Nov 2020 07:40:55 GMT Subject: RFR: 8255909: Remove unused delayed_value methods In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 22:12:29 GMT, Claes Redestad wrote: > Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) > > Clean up and remove. > > Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. Nice cleanup! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1065 From redestad at openjdk.java.net Thu Nov 5 08:39:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 5 Nov 2020 08:39:55 GMT Subject: Integrated: 8255909: Remove unused delayed_value methods In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 22:12:29 GMT, Claes Redestad wrote: > Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) > > Clean up and remove. > > Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. This pull request has now been integrated. Changeset: 700447f7 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/700447f7 Stats: 181 lines in 15 files changed: 0 ins; 181 del; 0 mod 8255909: Remove unused delayed_value methods Reviewed-by: minqi, coleenp, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1065 From redestad at openjdk.java.net Thu Nov 5 08:39:54 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 5 Nov 2020 08:39:54 GMT Subject: RFR: 8255909: Remove unused delayed_value methods In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 23:44:33 GMT, Yumin Qi wrote: >> Usage of and definitions of Assembler::delayed_value have been removed, but declarations linger along with various platform specific definitions of delayed_value_impl (all of which areunused) >> >> Clean up and remove. >> >> Untested on arm/aarch/ppc/s390, but as there are no mentions of delayed_value in the repository after this cleanup this is hopefully OK. > > Looks good to me. @yminqi @coleenp @TobiHartmann - thanks you for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1065 From roland at openjdk.java.net Thu Nov 5 08:50:00 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 5 Nov 2020 08:50:00 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah Message-ID: This is a Shenandoah bug but the proposed fix is in shared code. In an infinite loop, a barrier is located right after the loop head and above the never branch. When the barrier is expanded, control flow is added between the loop and the never branch. During loop verification the assert fires because it doesn't expect any control flow between the never branch and the loop head. While it would have been nice to fix this Shenandoah issue in Shenandoah code, I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. In the proposed patch, this is achieved by moving all uses of the loop head to the never branch when it's constructed. ------------- Commit messages: - fix & test Changes: https://git.openjdk.java.net/jdk/pull/1073/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1073&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255936 Stats: 13 lines in 2 files changed: 6 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1073/head:pull/1073 PR: https://git.openjdk.java.net/jdk/pull/1073 From jbhateja at openjdk.java.net Thu Nov 5 09:06:10 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 5 Nov 2020 09:06:10 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: JDK-8252848: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/9e85592a..689426d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=10-11 Stats: 28 lines in 6 files changed: 2 ins; 0 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Thu Nov 5 09:06:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 5 Nov 2020 09:06:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 19:25:30 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > Changes requested by kvn (Reviewer). Hi @vnkozlov , I have resolved your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From bulasevich at openjdk.java.net Thu Nov 5 09:27:18 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Thu, 5 Nov 2020 09:27:18 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v6] In-Reply-To: References: Message-ID: > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: comprehensive comment and code cleanup for BitfieldInsert transformation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/511/files - new: https://git.openjdk.java.net/jdk/pull/511/files/775c3533..f210850f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=04-05 Stats: 58 lines in 1 file changed: 31 ins; 8 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From bulasevich at openjdk.java.net Thu Nov 5 09:27:18 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Thu, 5 Nov 2020 09:27:18 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 17:54:35 GMT, Andrew Dinn wrote: >> src/hotspot/share/opto/addnode.cpp line 872: >> >>> 870: return new BitfieldInsertINode(dst, value, phase->intcon(offset), phase->intcon(width)); >>> 871: } >>> 872: } >> >> This code and its accompanying comments need to be made much clearer: >> >> - `dst` and `src` is a rather perverse choice of names for the inputs of the Or node. `l` and `r` or `left` and `right`, following the convention in the preceding code, would be better. >> >> - You mention `dst` in the comment to identify it as the left `Or` input but do not clearly identify `src` with the two alternative matched patterns for the the right hand side of the `Or`. >> >> - The name `shift `is used in those two patterns for an operand that actually has to be a constant bit mask for the transformation to be applicable (so why not use `mask`, `1s_mask` or `constmask`?). >> >> - Your comment incoherently employs different notations: i.e. you use a term for the `BitfieldInsert` expression and refer to the outermost node using the term name `Or` but you specify the patterns using the infix C language operators `&` and `<<`. >> >> - The qualifiying comment 'if the Or argument value range masks do not overlap' states a condition for the replacement without properly explaining the meaning of that condition i.e. that the various subexpressions operate on disjoint bitfields of the integral value being computed. >> >> - You use var `mask` for the const mask node and then reuse it for the constant value it identifies (directly after using `mask` to compute `width` which is used to define `mask`, making it look like mask is used to define itself). >> >> - Most importantly, the introductory comment does not provide a clear summary of what sort of graph shape is being replaced and how the match and bit range constraints legitimize that replacement. >> >> I would suggest the following as a replacement: >> >> if (can_reshape && !phase->C->major_progress() && Matcher::match_rule_supported(Op_BitfieldInsertI)) { >> // If the right input of this Or is an And with mask or an LShifted >> // And with mask and the left and right inputs can be determined >> // to construct values lying in disjoint bit ranges then the Or >> // can be replaced with BitfieldInsert. >> // >> // There are two substitution rules: >> // >> // 1) (Or left (And value mask)) => (BitfieldInsert left value width 0)) >> // where width == bitcount(mask) AND >> // (value_range_mask(left) & mask) == 0 >> // >> // 2) (Or left (LShift (And value mask) offset) => (BitfieldInsert left value width 0) >> // where width == bitcount(mask) AND >> // (value_range_mask(left) & (mask << offset)) == 0 >> // n.b. >> // mask is an integer constant comprising a contiguous sequence of 1s >> // value_range_mask(node) computes a mask identifying the range of bits >> // that could be set by its argument >> >> Node *left = in(1); >> Node *right = in(2); >> Node *andi = NULL; >> int offset = 0; >> >> if (right->Opcode() == Op_LShiftI && right->in(1)->Opcode() == Op_AndI && right->in(2)->is_Con()) { >> andi = right->in(1); >> offset = right->in(2)->get_int(); >> } else if (right->Opcode() == Op_AndI) { >> andi = right; >> } >> if (andi != NULL) { >> Node* mask = andi->in(2); >> if (mask->is_Con() && is_power_of_2(mask->get_int() + 1)) { >> Node* value = andi->in(1); >> int width = exact_log2(mask->get_int() + 1); >> int maskval = ((1 << width) - 1) << offset; >> if (width + offset <= 32 && ((value_range_mask(phase, left) & maskval) == 0)) { >> return new BitfieldInsertINode(left, value, phase->intcon(offset), phase->intcon(width)); >> } >> } >> >> Note that I am reusing the SEXPR format used in the ad files to describe the graph patterns and using (C-like) pseudo-code to define the substitution conditions. > > Hmm, that is not quite right because in case 2 you are also checking that the shift is constant. So, the overall description should change to say > > // If the right input of this Or is an And with mask or an LShifted > // And with mask with constant shift and the left and right inputs > // can be determined . . . Yes, that is right. Thank you!! ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From neliasso at openjdk.java.net Thu Nov 5 09:41:55 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 5 Nov 2020 09:41:55 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> <4PgK8q-Ol6Mp3CvzR77QnabQYk6E1-fJUz54zx83-IE=.cbd2429f-9f41-4a3e-894b-37cd16ef0cb3@github.com> Message-ID: <8vcZ76yRXbl9BD-1zBTltusg8ZGQ-gp5kr6tDVTVcvA=.9176b5ca-048d-4b18-990e-ecc144da1464@github.com> On Wed, 4 Nov 2020 22:48:06 GMT, Igor Ignatyev wrote: > we actually do that since JDK 9, [JEP 279](https://bugs.openjdk.java.net/browse/JDK-8075621) added the failure handler which runs many things including `jstack`, `gdb -ex thread apply all backtrace` (or its equivalent) to get java and native thread dumps and `kill -ABRT` to generate core files on timeouts. however b/c it's run concurrently to a timed out process, you don't always get the data. so I'm wondering what you meant. That has completely passed me by. I must look into it. Regarding the test: I think 60 secs is plenty of time for this test, especially since it runs multiple times. If there is a deadlock - the test will timeout regardless of timeout time. Adding some workload measurement would be nice - but not worth the time in my opinion. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From neliasso at openjdk.java.net Thu Nov 5 09:46:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 5 Nov 2020 09:46:08 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. > > What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. > > I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. > > In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. > > In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. > > Please review, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: add comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1030/files - new: https://git.openjdk.java.net/jdk/pull/1030/files/5050a2ef..5d72bf94 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1030&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1030&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1030/head:pull/1030 PR: https://git.openjdk.java.net/jdk/pull/1030 From adinn at openjdk.java.net Thu Nov 5 09:55:00 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 5 Nov 2020 09:55:00 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> Message-ID: On Tue, 20 Oct 2020 06:52:06 GMT, Boris Ulasevich wrote: >> One other thing: AFAICS this change doesn't work with Add as well as Or. Looking at the logic in addnode.cpp I don't think there's any reason for not doing that, is there? I've often seen ((a & 0xff) << 8) + (b & 0xff). Or is more common than And, I think, bit we want to get as much leverage out of this work as we can. > >> I've often seen ((a & 0xff) << 8) + (b & 0xff) > Good idea. I will add AddI -> OrI transformation for the case. Thanks for clarifying the code. I still have one thing I'd like to establish and, perhaps, document in the code or maybe just in this thread. I may have missed something in the preceding email thread but at present I am not aware of why you need to delay application of this transform to a new post-loops optimization stage (you said it was needed but I didn't see any reason given). Since that adds more complexity to the overall optimization process I think it needs carefully justifying. Of course, it might be that this requirement also applies for other Ideal transforms, so I am not suggesting adding this extra step is necessarily a bad idea. Could you explain why it is needed in this case? ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From bulasevich at openjdk.java.net Thu Nov 5 10:23:57 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Thu, 5 Nov 2020 10:23:57 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> Message-ID: On Thu, 5 Nov 2020 09:52:28 GMT, Andrew Dinn wrote: > why you need to delay application of this transform to a new post-loops optimization stage Unfortunately, BitfieldInsert transformation conflicts with vectorization: - if or/and/shift was converted to BFI it is no longer vectorized - vectorized or/and/shift operations are faster than BFI I delayed my transformation to be sure loop and vectorization transformations is already done at the moment. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From dnsimon at openjdk.java.net Thu Nov 5 10:33:54 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 5 Nov 2020 10:33:54 GMT Subject: RFR: 8255563: Missing NULL checks after JDK-8233624 In-Reply-To: References: <_Ja8OuAy8MhCBbMHcd8OjwHYfvbVwzIvINlrqQSy5Go=.b341cc4d-dcbf-43bd-b315-eb425247ba63@github.com> Message-ID: On Thu, 5 Nov 2020 04:39:10 GMT, David Holmes wrote: >> Looks good. > > @dougxc could you review this please. Thanks. Looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1068 From jvernee at openjdk.java.net Thu Nov 5 12:01:57 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 5 Nov 2020 12:01:57 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 10:47:18 GMT, Roland Westrelin wrote: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Hi Roland, Paul. Thank you for tackling this! We have to jump through quite a few hoops in the implementation of the foreign memory access API in order to leverage the intrinsification of int-based index checks, and even then we are not covering the cases where the numbers are larger than ints. Looking forward to being able to remove those hacks! Although I'm not a 'Reviewer', I've done a pass over the code and left some inline comments that I hope you find useful. src/hotspot/share/opto/graphKit.hpp line 109: > 107: if (bt == T_INT) { > 108: assert((jlong)(jint)con == con, "not an int"); > 109: return intcon((jint) con); Could also use `checked_cast` here (from globalDefinitions.hpp), which does a similar check. Suggestion: return intcon(checked_cast(con)); src/hotspot/share/opto/graphKit.hpp line 111: > 109: return intcon((jint) con); > 110: } > 111: return longcon(con); This ends up defaulting to longcon for any basic type that is not T_INT. Maybe it's nice to have an assert here? Suggestion: assert(bt == T_LONG, "basic type not an int or long"); return longcon(con); src/hotspot/share/opto/type.cpp line 1353: > 1351: jint int_hi = (jint)hi; > 1352: assert(((jlong)int_lo) == lo && ((jlong)int_hi) == hi, "bounds are not ints"); > 1353: return TypeInt::make(int_lo, int_hi, w); checked_cast could also be used here. Suggestion: return TypeInt::make(checked_cast(lo), checked_cast(hi), w); src/java.base/share/classes/java/lang/IndexOutOfBoundsException.java line 83: > 81: */ > 82: public IndexOutOfBoundsException(long index) { > 83: super("Index out of range: " + index); For consistency with the class name: Suggestion: super("Index out of bounds: " + index); test/jdk/java/util/Objects/CheckLongIndex.java line 94: > 92: long apply(long a, long b, long c); > 93: } > 94: Seems unused? Suggestion: ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From adinn at openjdk.java.net Thu Nov 5 13:41:56 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 5 Nov 2020 13:41:56 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> Message-ID: On Thu, 5 Nov 2020 10:21:07 GMT, Boris Ulasevich wrote: >> Thanks for clarifying the code. I still have one thing I'd like to establish and, perhaps, document in the code or maybe just in this thread. >> >> I may have missed something in the preceding email thread but at present I am not aware of why you need to delay application of this transform to a new post-loops optimization stage (you said it was needed but I didn't see any reason given). Since that adds more complexity to the overall optimization process I think it needs carefully justifying. Of course, it might be that this requirement also applies for other Ideal transforms, so I am not suggesting adding this extra step is necessarily a bad idea. Could you explain why it is needed in this case? > >> why you need to delay application of this transform to a new post-loops optimization stage > > Unfortunately, BitfieldInsert transformation conflicts with vectorization: > - if or/and/shift was converted to BFI it is no longer vectorized > - vectorized or/and/shift operations are faster than BFI > > I delayed my transformation to be sure loop and vectorization transformations is already done at the moment. Ok, so please adjust the comment at addnode.cpp:848 from // major_progress() check postpones the transformation after loop optimization to // major_progress() check postpones this transformation after loop optimization // allowing vectorization of the OR/And/Shift operations as a preferred transformation ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From adinn at openjdk.java.net Thu Nov 5 14:10:59 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 5 Nov 2020 14:10:59 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 09:27:18 GMT, Boris Ulasevich wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > comprehensive comment and code cleanup for BitfieldInsert transformation Once the requested comment change is implemented to explain why this transform is delayed I think this wil be ready to ship. That said ... I am not 100% convinced that this change is properly justified. I agree with Andrew Haley's earlier observation that this is not going to accrue much benefit. It will only make a measurable difference for 1) heavy bit-twiddling apps which 2) are cpu bound rather than memory bound (i.e where the reduction in register -> register ops is not rendered irrelevant by memory transfer delays) and 3) spend a significant proportion of their time merging masked and shifted data. That's a pretty small target. Set against that limited potential gain is the cost of doing one or two extra checks at at every invocation of Or::ideal plus the once-per-compile cost of setting up and performing an extra compiler optimization step. So, it is not clear to me that this change will definitely be an improvement. I'm accepting it on the grounds that any degradation of compile-time performance is probably going to be marginal too (I hope so). In future I think it would be wise to spend time picking more reward ing candidates for optimization and, more importantly, discussing the likelihood of them being useful on the compiler-dev or aarch64-port lists before proceeding to implement them and present them for a review. That should help to obtain a much better return on effort than that has resulted from this patch. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/511 From roland at openjdk.java.net Thu Nov 5 15:47:31 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 5 Nov 2020 15:47:31 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Jorn's comments - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - non x86_64 arch support - c2 test case - intrinsic - Use overloads of method names. Simplify internally to avoid overload resolution issues, leverging List for the exception mapper. - Vladimir's comments - checkLongIndex ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1003/files - new: https://git.openjdk.java.net/jdk/pull/1003/files/74df12d3..b47184ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=00-01 Stats: 11784 lines in 698 files changed: 6370 ins; 3584 del; 1830 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Thu Nov 5 15:47:32 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 5 Nov 2020 15:47:32 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 11:58:43 GMT, Jorn Vernee wrote: > Although I'm not a 'Reviewer', I've done a pass over the code and left some inline comments that I hope you find useful. Thanks for the review! All suggestions (except the exception message one that I commented on) look good to me. I applied them. > src/java.base/share/classes/java/lang/IndexOutOfBoundsException.java line 83: > >> 81: */ >> 82: public IndexOutOfBoundsException(long index) { >> 83: super("Index out of range: " + index); > > For consistency with the class name: > Suggestion: > > super("Index out of bounds: " + index); Not sure about that one as it's a copy of the message from IndexOutOfBoundsException(int index). Both would need to be changed for consistency. ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From jvernee at openjdk.java.net Thu Nov 5 16:47:56 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 5 Nov 2020 16:47:56 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 15:42:54 GMT, Roland Westrelin wrote: >> src/java.base/share/classes/java/lang/IndexOutOfBoundsException.java line 83: >> >>> 81: */ >>> 82: public IndexOutOfBoundsException(long index) { >>> 83: super("Index out of range: " + index); >> >> For consistency with the class name: >> Suggestion: >> >> super("Index out of bounds: " + index); > > Not sure about that one as it's a copy of the message from IndexOutOfBoundsException(int index). Both would need to be changed for consistency. Ok, I didn't notice that it was taken from elsewhere. Never mind then. ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From mdoerr at openjdk.java.net Thu Nov 5 16:52:02 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 16:52:02 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap Message-ID: JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. ------------- Commit messages: - 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap Changes: https://git.openjdk.java.net/jdk/pull/1078/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1078&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255598 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1078.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1078/head:pull/1078 PR: https://git.openjdk.java.net/jdk/pull/1078 From mdoerr at openjdk.java.net Thu Nov 5 17:05:03 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 17:05:03 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests Message-ID: We observed many timeouts in the following test/jdk/jdk/incubator/vector tests: Vector128ConversionTests.java Vector256ConversionTests.java Vector512ConversionTests.java Vector64ConversionTests.java VectorMaxConversionTests.java Some machines don't support vector instructions or fewer of them and C2 uses slower alternatives. Maybe there are options to make the tests faster, but I just propose to use a larger timeout value for now. ------------- Commit messages: - 8255959: Timeouts in VectorConversion tests Changes: https://git.openjdk.java.net/jdk/pull/1079/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1079&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255959 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1079.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1079/head:pull/1079 PR: https://git.openjdk.java.net/jdk/pull/1079 From iveresov at openjdk.java.net Thu Nov 5 17:13:00 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 5 Nov 2020 17:13:00 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 04:21:01 GMT, Vladimir Kozlov wrote: > Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > java version "16-internal" 2021-03-16 > > It should give warning: > > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM > java version "16-internal" 2021-03-16 Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From kvn at openjdk.java.net Thu Nov 5 17:21:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 17:21:58 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 17:09:47 GMT, Igor Veresov wrote: >> Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> java version "16-internal" 2021-03-16 >> >> It should give warning: >> >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM >> java version "16-internal" 2021-03-16 > > Marked as reviewed by iveresov (Reviewer). Thank you, Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From kvn at openjdk.java.net Thu Nov 5 17:21:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 17:21:59 GMT Subject: Integrated: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: <9IE2ar8hY-gqvpI2iYgnybZ62HNwuE_Yhc2wFMZhR8A=.a50748df-89b9-4a55-af13-24af45393be2@github.com> On Thu, 5 Nov 2020 04:21:01 GMT, Vladimir Kozlov wrote: > Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > java version "16-internal" 2021-03-16 > > It should give warning: > > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM > java version "16-internal" 2021-03-16 This pull request has now been integrated. Changeset: 1b59595e Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/1b59595e Stats: 28 lines in 2 files changed: 28 ins; 0 del; 0 mod 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build Reviewed-by: dholmes, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From neliasso at openjdk.java.net Thu Nov 5 17:31:03 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 5 Nov 2020 17:31:03 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: On Thu, 5 Nov 2020 04:21:01 GMT, Vladimir Kozlov wrote: > Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > java version "16-internal" 2021-03-16 > > It should give warning: > > $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version > Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM > java version "16-internal" 2021-03-16 Looks good! ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From kvn at openjdk.java.net Thu Nov 5 17:38:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 17:38:01 GMT Subject: RFR: 8255914: [AOT] Using AOT flag should give warning when AOT is not included in build In-Reply-To: References: <5LnO5kvk5GEiviuVD8BG7zLJggwAEA3Q32UnHnMogh4=.ce751828-6589-4baa-b4a9-9a964d3d2541@github.com> Message-ID: <1x8KZqksew3sfjsyqjhCLYg6SqO_CaF4Vehmq2WvqKU=.5961cfd0-dea0-478e-ba93-2b54153fe343@github.com> On Thu, 5 Nov 2020 17:27:52 GMT, Nils Eliasson wrote: >> Currently if AOT feature is not included in a build AOT flags specified on command line are silently ignored: >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> java version "16-internal" 2021-03-16 >> >> It should give warning: >> >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -version >> Java HotSpot(TM) 64-Bit Server VM warning: -XX:+UseAOT not supported in this VM >> java version "16-internal" 2021-03-16 > > Looks good! Thank you, Nils ------------- PR: https://git.openjdk.java.net/jdk/pull/1071 From iignatyev at openjdk.java.net Thu Nov 5 17:46:00 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 5 Nov 2020 17:46:00 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out [v3] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 09:46:08 GMT, Nils Eliasson wrote: >> Hi, >> >> This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. >> >> What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. >> >> I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. >> >> In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. >> >> In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. >> >> Please review, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > add comment LGTM. however I'd like to you to wait for explcit 'ok' from Evgeny (@lepestock) as well. BTW, you need to either update the title of this PR to `[TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out` or change [8255011](https://bugs.openjdk.java.net/browse/JDK-8255011)'s title to `[TESTBUG] UnexpectedDeoptimizationAllTest.java timed out` -- Igor ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1030 From iignatyev at openjdk.java.net Thu Nov 5 17:46:01 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 5 Nov 2020 17:46:01 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: <8vcZ76yRXbl9BD-1zBTltusg8ZGQ-gp5kr6tDVTVcvA=.9176b5ca-048d-4b18-990e-ecc144da1464@github.com> References: <4QuQ060iNSZDh6mFBAI3mcWFDOI3IyXwn34EXf-MQuY=.baf764b7-f681-44d4-a489-a434076d2073@github.com> <4PgK8q-Ol6Mp3CvzR77QnabQYk6E1-fJUz54zx83-IE=.cbd2429f-9f41-4a3e-894b-37cd16ef0cb3@github.com> <8vcZ76yRXbl9BD-1zBTltusg8ZGQ-gp5kr6tDVTVcvA=.9176b5ca-048d-4b18-990e-ecc144da1464@github.com> Message-ID: On Thu, 5 Nov 2020 09:39:20 GMT, Nils Eliasson wrote: > Adding some workload measurement would be nice - but not worth the time in my opinion. sure-sure, I didn't mean to do it as part of this PR, just a thought for future improvements in these tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From ayang at openjdk.java.net Thu Nov 5 18:38:58 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 5 Nov 2020 18:38:58 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 16:46:43 GMT, Martin Doerr wrote: > JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. > This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. > PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) > We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. Given the following naming pair, I would have guess that `decode_raw` can handle null properly, so the explicit check of null in this PR struck me as a surprise. Why so? static inline oop decode_raw_not_null(narrowOop v); static inline oop decode_raw(narrowOop v); I think having some concise inline comments to explain why `decode_raw` is used over `decode` could improve readability. ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/1078 From psandoz at openjdk.java.net Thu Nov 5 20:28:58 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 5 Nov 2020 20:28:58 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 16:59:57 GMT, Martin Doerr wrote: > We observed many timeouts in the following test/jdk/jdk/incubator/vector tests: > Vector128ConversionTests.java > Vector256ConversionTests.java > Vector512ConversionTests.java > Vector64ConversionTests.java > VectorMaxConversionTests.java > Some machines don't support vector instructions or fewer of them and C2 uses slower alternatives. > > Maybe there are options to make the tests faster, but I just propose to use a larger timeout value for now. I have not observed such timeouts in our test infrastructure, but was wondering if this may cause such issues. Perhaps PPC does not support the conversion intrinsics? The timeout value is quite high, 1800 (i cannot recall what the default is). Ideally we should split this test per species. I regret that this test is not produced from a template. Would you mind first if we can try a quick experiment to reduce the time taken? In `AbstractVectorConversionTest` there is a loop using an upper bound of `INVOC_COUNT`, perhaps if any of the species input to the relevant methods have a length > the length of the corresponding preferred species, then we can reduce the loop count. ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From ecaspole at openjdk.java.net Thu Nov 5 22:19:03 2020 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Thu, 5 Nov 2020 22:19:03 GMT Subject: RFR: 8255965: LogCompilation: add sort by nmethod code size Message-ID: While profiling an issue I added this sort by code size to LogCompilation, using -z $ java -ea -jar target/LogCompilation-1.0-SNAPSHOT.jar -z 2000-2.log | head 879 4 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 57344) 853 make_not_entrant 853 3 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 55968) 895 4 com.fee.fi.fo.Fum::baz (2238 bytes)(code size: 46112) 888 4 com.fee.fi.fo.Fum::quux (2165 bytes)(code size: 43200) The code size = stub_offset - insts_offset from what is in the log. This makes it easier to see, for example, if changing compiler XX options make huge differences in inlining. ------------- Commit messages: - 8255965: LogCompilation: add sort by nmethod code size Changes: https://git.openjdk.java.net/jdk/pull/1085/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1085&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255965 Stats: 96 lines in 6 files changed: 91 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1085/head:pull/1085 PR: https://git.openjdk.java.net/jdk/pull/1085 From mdoerr at openjdk.java.net Thu Nov 5 22:29:11 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 22:29:11 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: > JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. > This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. > PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) > We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: add comment and use CompressedOops::is_null ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1078/files - new: https://git.openjdk.java.net/jdk/pull/1078/files/4ffffa95..400ecfda Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1078&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1078&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1078.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1078/head:pull/1078 PR: https://git.openjdk.java.net/jdk/pull/1078 From kvn at openjdk.java.net Thu Nov 5 22:33:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 5 Nov 2020 22:33:58 GMT Subject: RFR: 8255965: LogCompilation: add sort by nmethod code size In-Reply-To: References: Message-ID: <1pzy5RPGK1v9O3utvsOwnPkKbXXYC29BEo58XTMjcbg=.427e9ca9-bbda-473e-a1af-89348f244fc5@github.com> On Thu, 5 Nov 2020 22:13:54 GMT, Eric Caspole wrote: > While profiling an issue I added this sort by code size to LogCompilation, using -z > > $ java -ea -jar target/LogCompilation-1.0-SNAPSHOT.jar -z 2000-2.log | head > > 879 4 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 57344) > 853 make_not_entrant > 853 3 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 55968) > 895 4 com.fee.fi.fo.Fum::baz (2238 bytes)(code size: 46112) > 888 4 com.fee.fi.fo.Fum::quux (2165 bytes)(code size: 43200) > > The code size = stub_offset - insts_offset from what is in the log. > This makes it easier to see, for example, if changing compiler XX options make huge differences in inlining. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1085 From mdoerr at openjdk.java.net Thu Nov 5 22:33:58 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 22:33:58 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 18:36:16 GMT, Albert Mingkun Yang wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and use CompressedOops::is_null > > Given the following naming pair, I would have guess that `decode_raw` can handle null properly, so the explicit check of null in this PR struck me as a surprise. Why so? > > static inline oop decode_raw_not_null(narrowOop v); > static inline oop decode_raw(narrowOop v); > > I think having some concise inline comments to explain why `decode_raw` is used over `decode` could improve readability. @albertnetymk: Thanks for looking at this. I've added a comment. I was also surprised about that decode_raw doesn't handle null properly. I'd have expected decode_raw to call decode_raw_not_null, but it's implemented vice-versa. Therefore the null check before the call. Note that there's another usage with preceding null check in oopDesc::load_oop_raw. ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From redestad at openjdk.java.net Thu Nov 5 22:40:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 5 Nov 2020 22:40:58 GMT Subject: RFR: 8255965: LogCompilation: add sort by nmethod code size In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 22:13:54 GMT, Eric Caspole wrote: > While profiling an issue I added this sort by code size to LogCompilation, using -z > > $ java -ea -jar target/LogCompilation-1.0-SNAPSHOT.jar -z 2000-2.log | head > > 879 4 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 57344) > 853 make_not_entrant > 853 3 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 55968) > 895 4 com.fee.fi.fo.Fum::baz (2238 bytes)(code size: 46112) > 888 4 com.fee.fi.fo.Fum::quux (2165 bytes)(code size: 43200) > > The code size = stub_offset - insts_offset from what is in the log. > This makes it easier to see, for example, if changing compiler XX options make huge differences in inlining. Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1085 From mdoerr at openjdk.java.net Thu Nov 5 23:10:58 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 23:10:58 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 20:25:44 GMT, Paul Sandoz wrote: >> We observed many timeouts in the following test/jdk/jdk/incubator/vector tests: >> Vector128ConversionTests.java >> Vector256ConversionTests.java >> Vector512ConversionTests.java >> Vector64ConversionTests.java >> VectorMaxConversionTests.java >> Some machines don't support vector instructions or fewer of them and C2 uses slower alternatives. >> >> Maybe there are options to make the tests faster, but I just propose to use a larger timeout value for now. > > I have not observed such timeouts in our test infrastructure, but was wondering if this may cause such issues. Perhaps PPC does not support the conversion intrinsics? > > The timeout value is quite high, 1800 (i cannot recall what the default is). Ideally we should split this test per species. I regret that this test is not produced from a template. > > Would you mind first if we can try a quick experiment to reduce the time taken? In `AbstractVectorConversionTest` there is a loop using an upper bound of `INVOC_COUNT`, perhaps if any of the species input to the relevant methods have a length > the length of the corresponding preferred species, then we can reduce the loop count. At the moment, neither PPC nor s390 support any conversion intrinsics. Modern s390 (or z/Architecture to be more precise) machines have vector instructions, but nobody implemented them in hotspot. So s390 uses the regular 8 Byte registers for vectors. PPC only uses 16 Byte vectors on modern hardware (8 Byte on older hardware). Default timeout is 1200 and I found out that 50% more makes the tests happy on all our machines. We could do an experiment, but I'm not familiar with the test. ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From ecaspole at openjdk.java.net Thu Nov 5 23:56:59 2020 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Thu, 5 Nov 2020 23:56:59 GMT Subject: Integrated: 8255965: LogCompilation: add sort by nmethod code size In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 22:13:54 GMT, Eric Caspole wrote: > While profiling an issue I added this sort by code size to LogCompilation, using -z > > $ java -ea -jar target/LogCompilation-1.0-SNAPSHOT.jar -z 2000-2.log | head > > 879 4 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 57344) > 853 make_not_entrant > 853 3 com.fee.fi.fo.Fum::foobar (3076 bytes)(code size: 55968) > 895 4 com.fee.fi.fo.Fum::baz (2238 bytes)(code size: 46112) > 888 4 com.fee.fi.fo.Fum::quux (2165 bytes)(code size: 43200) > > The code size = stub_offset - insts_offset from what is in the log. > This makes it easier to see, for example, if changing compiler XX options make huge differences in inlining. This pull request has now been integrated. Changeset: 57b98fa5 Author: Eric Caspole URL: https://git.openjdk.java.net/jdk/commit/57b98fa5 Stats: 96 lines in 6 files changed: 91 ins; 0 del; 5 mod 8255965: LogCompilation: add sort by nmethod code size Reviewed-by: kvn, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/1085 From dlong at openjdk.java.net Fri Nov 6 00:02:58 2020 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 6 Nov 2020 00:02:58 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 15:47:31 GMT, Roland Westrelin wrote: >> This change add 3 new methods in Objects: >> >> public static long checkIndex(long index, long length) >> public static long checkFromToIndex(long fromIndex, long toIndex, long length) >> public static long checkFromIndexSize(long fromIndex, long size, long length) >> >> This mirrors the int utility methods that were added by JDK-8135248 >> with the same motivations. >> >> As is the case with the int checkIndex(), the long checkIndex() method >> is JIT compiled as an intrinsic. It allows the JIT to compile >> checkIndex to an unsigned comparison and properly recognize it as >> a range check that then becomes a candidate for the existing range check >> optimizations. This has proven to be important for panama's >> MemorySegment API and a prototype of this change (with some extra c2 >> improvements) showed that panama micro benchmark results improve >> significantly. >> >> This change includes: >> >> - the API change >> - the C2 intrinsic >> - tests for the API and the C2 intrinsic >> >> This is a joint work with Paul who reviewed and reworked the API change >> and filled the CSR. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Jorn's comments > - Update headers and add intrinsic to Graal test ignore list > - move compiler test and add bug to test > - non x86_64 arch support > - c2 test case > - intrinsic > - Use overloads of method names. > > Simplify internally to avoid overload resolution > issues, leverging List for the exception > mapper. > - Vladimir's comments > - checkLongIndex C2 changes look good, except for new code block in inline_preconditions_checkIndex could use a comment. src/hotspot/share/opto/library_call.cpp line 1015: > 1013: Deoptimization::Action_make_not_entrant); > 1014: } > 1015: A comment here explaining what the code below is doing would be helpful. ------------- Changes requested by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1003 From dholmes at openjdk.java.net Fri Nov 6 00:38:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 6 Nov 2020 00:38:56 GMT Subject: RFR: 8255563: Missing NULL checks after JDK-8233624 In-Reply-To: References: <_Ja8OuAy8MhCBbMHcd8OjwHYfvbVwzIvINlrqQSy5Go=.b341cc4d-dcbf-43bd-b315-eb425247ba63@github.com> Message-ID: On Thu, 5 Nov 2020 10:31:28 GMT, Doug Simon wrote: >> @dougxc could you review this please. Thanks. > > Looks good to me. Thanks for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1068 From github.com+670087+jrziviani at openjdk.java.net Fri Nov 6 00:57:00 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Fri, 6 Nov 2020 00:57:00 GMT Subject: RFR: 8248191: [PPC64] Replace lxvd2x/stxvd2x with lxvx/stxvx for Power10 Message-ID: The pair lxvx/stxvx are more modern VSX instructions to load/store data. These should benefit the Vector API because lxvd2x/stxvd2x may require xxswapd, leading to a more difficult code generation. ------------- Commit messages: - 8248191: [PPC64] Replace lxvd2x/stxvd2x with lxvx/stxvx for Power10 Changes: https://git.openjdk.java.net/jdk/pull/1086/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1086&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8248191 Stats: 115 lines in 4 files changed: 87 ins; 0 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/1086.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1086/head:pull/1086 PR: https://git.openjdk.java.net/jdk/pull/1086 From psandoz at openjdk.java.net Fri Nov 6 01:02:58 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 6 Nov 2020 01:02:58 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 23:08:32 GMT, Martin Doerr wrote: >> I have not observed such timeouts in our test infrastructure, but was wondering if this may cause such issues. Perhaps PPC does not support the conversion intrinsics? >> >> The timeout value is quite high, 1800 (i cannot recall what the default is). Ideally we should split this test per species. I regret that this test is not produced from a template. >> >> Would you mind first if we can try a quick experiment to reduce the time taken? In `AbstractVectorConversionTest` there is a loop using an upper bound of `INVOC_COUNT`, perhaps if any of the species input to the relevant methods have a length > the length of the corresponding preferred species, then we can reduce the loop count. > > At the moment, neither PPC nor s390 support any conversion intrinsics. Modern s390 (or > z/Architecture to be more precise) machines have vector instructions, but nobody implemented them in hotspot. So s390 uses the regular 8 Byte registers for vectors. PPC only uses 16 Byte vectors on modern hardware (8 Byte on older hardware). > > Default timeout is 1200 and I found out that 50% more makes the tests happy on all our machines. > > We could do an experiment, but I'm not familiar with the test. Perhaps the following patch might help. Still for say 512 conversion test on my mac that has no AVX512 support the test runs (including compilation) in about 60s. With the patch it reduces to about 40s. If you run jtreg in verbose mode, `-va` it should output individual test times. Perhaps some are taking longer than others? diff --git a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java index d1303bfd295..1754af2110a 100644 --- a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java +++ b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java @@ -551,7 +551,8 @@ abstract class AbstractVectorConversionTest { int m = Math.max(dst_species_len,src_species_len) / Math.min(src_species_len,dst_species_len); int [] parts = getPartsArray(m, is_contracting_conv); - for (int ic = 0; ic < INVOC_COUNT; ic++) { + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); + for (int ic = 0; ic < count; ic++) { for (int i=0, j=0; i < in_len; i += src_species_len, j+= dst_species_len) { int part = parts[i % parts.length]; var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); @@ -592,7 +593,8 @@ abstract class AbstractVectorConversionTest { int m = Math.max(dst_vector_size,src_vector_size) / Math.min(dst_vector_size, src_vector_size); int [] parts = getPartsArray(m, is_contracting_conv); - for (int ic = 0; ic < INVOC_COUNT; ic++) { + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); + for (int ic = 0; ic < count; ic++) { for (int i = 0, j=0; i < in_len; i += src_vector_lane_cnt, j+= dst_vector_lane_cnt) { int part = parts[i % parts.length]; var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); @@ -609,4 +611,15 @@ abstract class AbstractVectorConversionTest { } assertResultsEquals(boxed_res, boxed_ref, dst_vector_lane_cnt); } + + static int invocationCount(int c, VectorSpecies... species) { + return Arrays.asList(species).stream().allMatch(AbstractVectorConversionTest::leqPreferred) + ? c + : Math.min(c, c / 100); + } + + static boolean leqPreferred(VectorSpecies species) { + VectorSpecies preferred = VectorSpecies.ofPreferred(species.elementType()); + return species.length() <= preferred.length(); + } } ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From kvn at openjdk.java.net Fri Nov 6 01:11:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 6 Nov 2020 01:11:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 09:06:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8252848: Review comments resolution. Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2133: > 2131: MacroAssembler::evmovdqu(typ, kmask, dst, src, vector_len); > 2132: } > 2133: typ -> type src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 125: > 123: void evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len); > 124: void evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len); > 125: typ -> type src/hotspot/share/opto/cfgnode.hpp line 106: > 104: bool try_clean_mem_phi(PhaseGVN *phase); > 105: bool is_self_loop(Node* n, PhaseGVN *phase); > 106: bool try_phi_disintegration(PhaseGVN *phase); Why these changes and where new definitions? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Fri Nov 6 01:12:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 6 Nov 2020 01:12:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 19:16:21 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: > >> 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { >> 1422: ArrayCopyPartialInlineSize = MaxVectorSize; >> 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); > > warning only if ArrayCopyPartialInlineSize is not default. I don't see your fix for my comment. I asked to add `if(!FLAG_IS_DEFAULT(ArrayCopyPartialInlineSize))` check ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dholmes at openjdk.java.net Fri Nov 6 01:40:57 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 6 Nov 2020 01:40:57 GMT Subject: Integrated: 8255563: Missing NULL checks after JDK-8233624 In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 00:37:30 GMT, David Holmes wrote: > JDK-8233624 introduced a check for illegal native method names that will result in the `NativeLookup::*_jni_name` methods returning NULL. JVMCI `registerNativeMethods` uses these APIs without any check for a NULL return, however the methods being registered are part of the JVMCI implementation so should never have an illegal name, so adding a guarantee suffices to deal with that (see comments in JBS issue). This pull request has now been integrated. Changeset: 5dfb42fc Author: David Holmes URL: https://git.openjdk.java.net/jdk/commit/5dfb42fc Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8255563: Missing NULL checks after JDK-8233624 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1068 From dongbo at openjdk.java.net Fri Nov 6 03:44:06 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 6 Nov 2020 03:44:06 GMT Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate Message-ID: This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. Verified with linux-aarch64-server-release, tier1-3. Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. We witness about ~20% with different basic types on Kunpeng916. The JMH results: Benchmark (count) (seed) Mode Cnt Score Error Units # before, Kunpeng 916 VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op # after shift right and accumulate, Kunpeng 916 VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: src/hotspot/share/opto/vectornode.cpp, line 226: case Op_URShiftI: switch (bt) { case T_BOOLEAN:return Op_URShiftVB; case T_CHAR: return Op_URShiftVS; case T_BYTE: case T_SHORT: return 0; // Vector logical right shift for signed short // values produces incorrect Java result for // negative data because java code should convert // a short value into int value with sign // extension before a shift. case T_INT: return Op_URShiftVI; default: ShouldNotReachHere(); return 0; } We also tried the existing vector operation micro urShiftB, i.e.: test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 @Benchmark public void urShiftB() { for (int i = 0; i < COUNT; i++) { resB[i] = (byte) (bytesA[i] >>> 3); } } It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. ------------- Commit messages: - 8255949: AArch64: Add support for vectorized shift right and accumulate Changes: https://git.openjdk.java.net/jdk/pull/1087/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1087&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255949 Stats: 349 lines in 3 files changed: 349 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1087.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1087/head:pull/1087 PR: https://git.openjdk.java.net/jdk/pull/1087 From dlong at openjdk.java.net Fri Nov 6 04:27:57 2020 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 6 Nov 2020 04:27:57 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 23:58:21 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Jorn's comments >> - Update headers and add intrinsic to Graal test ignore list >> - move compiler test and add bug to test >> - non x86_64 arch support >> - c2 test case >> - intrinsic >> - Use overloads of method names. >> >> Simplify internally to avoid overload resolution >> issues, leverging List for the exception >> mapper. >> - Vladimir's comments >> - checkLongIndex > > src/hotspot/share/opto/library_call.cpp line 1015: > >> 1013: Deoptimization::Action_make_not_entrant); >> 1014: } >> 1015: > > A comment here explaining what the code below is doing would be helpful. This code wasn't here before, so I'm guessing it's needed for T_LONG. For T_INT is it just wasted work? ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From jbhateja at openjdk.java.net Fri Nov 6 07:10:57 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:10:57 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 01:02:43 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: >> >>> 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { >>> 1422: ArrayCopyPartialInlineSize = MaxVectorSize; >>> 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); >> >> warning only if ArrayCopyPartialInlineSize is not default. > > I don't see your fix for my comment. I asked to add `if(!FLAG_IS_DEFAULT(ArrayCopyPartialInlineSize))` check Default value for ArrayCopyPartialInlineSize = -1 with a value range [-1,64], default value for MaxVectorSize=0 with a value range [0,max_int]; control flow will be reaching to this warning only for a non-default value of ArrayCopyPartialInlineSize. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 6 07:13:57 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:13:57 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 01:07:34 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8252848: Review comments resolution. > > src/hotspot/share/opto/cfgnode.hpp line 106: > >> 104: bool try_clean_mem_phi(PhaseGVN *phase); >> 105: bool is_self_loop(Node* n, PhaseGVN *phase); >> 106: bool try_phi_disintegration(PhaseGVN *phase); > > Why these changes and where new definitions? Definitions were removed earlier declarations not needed anymore. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 6 07:23:07 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:23:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v13] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge remote-tracking branch 'upstream' into JDK-8252848 - JDK-8252848 : Review comments resolved - JDK-8252848: Review comments resolution. - JDK-8252848: Review comments addressed. - Merge remote-tracking branch 'origin' into JDK-8252848 - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - JDK-8252848 : Review comments resolution. - Merge remote-tracking branch 'upstream' into JDK-8252848 - ... and 4 more: https://git.openjdk.java.net/jdk/compare/5dfb42fc...ed343a9e ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=12 Stats: 535 lines in 27 files changed: 485 ins; 23 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From roland at openjdk.java.net Fri Nov 6 08:27:59 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 08:27:59 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 04:25:40 GMT, Dean Long wrote: >> src/hotspot/share/opto/library_call.cpp line 1015: >> >>> 1013: Deoptimization::Action_make_not_entrant); >>> 1014: } >>> 1015: >> >> A comment here explaining what the code below is doing would be helpful. > > This code wasn't here before, so I'm guessing it's needed for T_LONG. For T_INT is it just wasted work? Code in IdealLoopTree::is_range_check_if() uses this check: if (range->Opcode() != Op_LoadRange && !iff->is_RangeCheck()) { const TypeInt* tint = phase->_igvn.type(range)->isa_int(); if (tint == NULL || tint->empty() || tint->_lo < 0) { // Allow predication on positive values that aren't LoadRanges. // This allows optimization of loops where the length of the // array is a known value and doesn't need to be loaded back // from the array. return false; that is it assumes that everything that's on the right hand size of the a RangeCheck test is positive. I think it's cleaner and less dangerous to explicitly cast the length to >= 0 when the intrinsic is built. In a subsequent patch I intend to drop the && !iff->is_RangeCheck() check above. ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Fri Nov 6 08:35:17 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 08:35:17 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 23:59:43 GMT, Dean Long wrote: > C2 changes look good, except for new code block in inline_preconditions_checkIndex could use a comment. Thanks for the review. I added a comment for this block and some other comments for the rest of this method. ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Fri Nov 6 08:35:17 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 08:35:17 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v3] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: intrinsic comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1003/files - new: https://git.openjdk.java.net/jdk/pull/1003/files/b47184ac..aaacd328 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From ayang at openjdk.java.net Fri Nov 6 09:13:58 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 6 Nov 2020 09:13:58 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 22:31:28 GMT, Martin Doerr wrote: > I've added a comment. Thank you for that. I believe `decode_raw` handles null well; `decode_raw_not_null` is there just to have an extra assertion when the caller *knows* that the oop is not null. > Note that there's another usage with preceding null check in oopDesc::load_oop_raw. Indeed, but I think that one is unnecessary as well, which could (or should) be addressed in this PR or another. ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From mdoerr at openjdk.java.net Fri Nov 6 09:38:54 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 6 Nov 2020 09:38:54 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 09:11:23 GMT, Albert Mingkun Yang wrote: >> @albertnetymk: Thanks for looking at this. I've added a comment. >> I was also surprised about that decode_raw doesn't handle null properly. I'd have expected decode_raw to call decode_raw_not_null, but it's implemented vice-versa. Therefore the null check before the call. Note that there's another usage with preceding null check in oopDesc::load_oop_raw. > >> I've added a comment. > > Thank you for that. > > I believe `decode_raw` handles null well; `decode_raw_not_null` is there just to have an extra assertion when the caller *knows* that the oop is not null. > >> Note that there's another usage with preceding null check in oopDesc::load_oop_raw. > > Indeed, but I think that one is unnecessary as well, which could (or should) be addressed in this PR or another. Unfortunately, no. Assume we're using "HeapBasedNarrowOop". decode_raw adds the base, so decode_raw(0) returns base which is wrong. Also see Stefan's comment in the bug: "I see that decode and decode_raw has different semantics w.r.t. 0, so be careful if you change this code to use decode_raw" ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From aph at openjdk.java.net Fri Nov 6 10:06:55 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 6 Nov 2020 10:06:55 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v6] In-Reply-To: References: Message-ID: <47eNWrFw1oErxMVcCMe515H9r28dIIv9hQTGa6-F_oU=.d95ab327-b586-4ebc-bf17-a57d31e0ef75@github.com> On Thu, 5 Nov 2020 14:08:03 GMT, Andrew Dinn wrote: > > In future I think it would be wise to spend time picking more rewarding candidates for optimization and, more importantly, discussing the likelihood of them being useful on the compiler-dev or aarch64-port lists before proceeding to implement them and present them for a review. That should help to obtain a much better return on effort than that has resulted from this patch. I have to echo this. Compilation takes time and we should use it wisely. Given the expected advantage of this patch, I simply cannot tell whether it is worth the effort. I suppose we can take some comfort from the fact that C2 only kicks in after a lot of work has been done, but I don't like to be in such a quandary. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From aph at redhat.com Fri Nov 6 10:08:51 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 6 Nov 2020 10:08:51 +0000 Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: <48ebc303-9947-6c10-4430-1a11c6d98d8e@redhat.com> On 11/6/20 3:44 AM, Dong Bo wrote: > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. > We witness about ~20% with different basic types on Kunpeng916. Do you find it disappointing that there is such a small improvement? Do you konw why that is? Perhaps the benchmark is memory bound, or somesuch? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From enikitin at openjdk.java.net Fri Nov 6 10:16:56 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 6 Nov 2020 10:16:56 GMT Subject: RFR: 8255011: [TESTBUG] UnexpectedDeoptimizationAllTest.java timed out [v3] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 17:43:16 GMT, Igor Ignatyev wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment > > LGTM. however I'd like to you to wait for explcit 'ok' from Evgeny (@lepestock) as well. > > BTW, you need to either update the title of this PR to `[TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out` or change [8255011](https://bugs.openjdk.java.net/browse/JDK-8255011)'s title to `[TESTBUG] UnexpectedDeoptimizationAllTest.java timed out` > > -- Igor LGTM. I was considering similar change as well, so I agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From vlivanov at openjdk.java.net Fri Nov 6 10:34:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 6 Nov 2020 10:34:57 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 09:27:18 GMT, Boris Ulasevich wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > comprehensive comment and code cleanup for BitfieldInsert transformation Changes requested by vlivanov (Reviewer). src/hotspot/share/opto/addnode.cpp line 246: > 244: static juint value_range_mask(PhaseGVN *phase, Node* value) { > 245: int opcode = value->Opcode(); > 246: if (opcode == Op_LShiftI && value->in(2)->is_Con()) { `Node::is_Con()` returns `true` for TOP node, so it's incorrect to assume the value is of type int after the check. src/hotspot/share/opto/addnode.cpp line 849: > 847: } > 848: // major_progress() check postpones the transformation after loop optimization > 849: if (can_reshape && !phase->C->major_progress() && Matcher::match_rule_supported(Op_BitfieldInsertI)) { There's `C->post_loop_opts_phase()` flag added recently specifically for that purpose. src/hotspot/share/opto/addnode.cpp line 959: > 957: } > 958: } > 959: if (can_reshape && !phase->C->major_progress() && Matcher::match_rule_supported(Op_BitfieldInsertL)) { Same here: please, use `C->post_loop_opts_phase()` instead. src/hotspot/share/opto/compile.cpp line 2000: > 1998: > 1999: // execute posponed transformations > 2000: void Compile::post_optimize_loops_igvn(PhaseIterGVN& igvn) { Preferred way is to register nodes for processing using `Compile::record_for_post_loop_opts_igvn()`. That way you avoid additional pass over the graph. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From roland at openjdk.java.net Fri Nov 6 11:13:04 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 11:13:04 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v2] In-Reply-To: References: Message-ID: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: fix & test ------------- Changes: https://git.openjdk.java.net/jdk/pull/1073/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1073&range=01 Stats: 13 lines in 2 files changed: 6 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1073/head:pull/1073 PR: https://git.openjdk.java.net/jdk/pull/1073 From ayang at openjdk.java.net Fri Nov 6 11:12:58 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 6 Nov 2020 11:12:58 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 09:36:05 GMT, Martin Doerr wrote: > Assume we're using "HeapBasedNarrowOop". decode_raw adds the base, so decode_raw(0) returns base which is wrong. Yes, you are right. Thank you for pointing this out. ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From ayang at openjdk.java.net Fri Nov 6 11:12:57 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 6 Nov 2020 11:12:57 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 22:29:11 GMT, Martin Doerr wrote: >> JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. >> This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. >> PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) >> We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > add comment and use CompressedOops::is_null Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From roland at openjdk.java.net Fri Nov 6 11:17:55 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 11:17:55 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 08:44:03 GMT, Roland Westrelin wrote: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. I rebased the change ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From kvn at openjdk.java.net Fri Nov 6 16:18:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 6 Nov 2020 16:18:57 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v3] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 13:31:06 GMT, Christian Hagedorn wrote: >> The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): >> https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 >> >> We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). >> >> My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. >> ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update comments and invariant selection in offset_plus_k Updated changes looks good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/954 From roland at openjdk.java.net Fri Nov 6 16:46:07 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 6 Nov 2020 16:46:07 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop Message-ID: A loop's trip count is computed to have exact trip count 6. Then: 1- pre/main/post loops are created which brings the trip count from 6 to 5 2- main loop is unrolled which brings the trip count to 2 3- main loop is peeled: Trip count is 1 4- pre/main/post loops are created again. Trip count of main loop is 0. 5- peeling is attempted again and the assert fires IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip count is 1. I propose that IdealLoopTree::policy_range_check() (that causes the pre/main/post loops insertion the second time) performs the same check so step 4 doesn't happen. ------------- Commit messages: - test - fix Changes: https://git.openjdk.java.net/jdk/pull/1096/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1096&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254887 Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1096.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1096/head:pull/1096 PR: https://git.openjdk.java.net/jdk/pull/1096 From neliasso at openjdk.java.net Fri Nov 6 20:42:54 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 6 Nov 2020 20:42:54 GMT Subject: RFR: 8255011: [TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out [v3] In-Reply-To: References: Message-ID: <_p2H_mL27u3aBHA9ANJ6KdAacvwI-_dCZACVutqVu7M=.de1ec598-4d09-4057-aaff-0ed7f3b93ac0@github.com> On Fri, 6 Nov 2020 10:14:24 GMT, Evgeny Nikitin wrote: >> LGTM. however I'd like to you to wait for explcit 'ok' from Evgeny (@lepestock) as well. >> >> BTW, you need to either update the title of this PR to `[TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out` or change [8255011](https://bugs.openjdk.java.net/browse/JDK-8255011)'s title to `[TESTBUG] UnexpectedDeoptimizationAllTest.java timed out` >> >> -- Igor > > LGTM. I was considering similar change as well, so I agree. Thank you Evgeny and Igor! ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From github.com+58006833+xbzhang99 at openjdk.java.net Sat Nov 7 07:33:06 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Sat, 7 Nov 2020 07:33:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v2] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: Added test cases for exp at the value of 1024 and 10000 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/72630558..305d915b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From jiefu at openjdk.java.net Sat Nov 7 07:54:59 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 7 Nov 2020 07:54:59 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad Message-ID: src/hotspot/share/adlc/Test/i486.ad is empty. It might be better to remove it. Thanks. ------------- Commit messages: - 8256009: Remove src/hotspot/share/adlc/Test/i486.ad Changes: https://git.openjdk.java.net/jdk/pull/1107/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1107&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256009 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1107.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1107/head:pull/1107 PR: https://git.openjdk.java.net/jdk/pull/1107 From dongbo at openjdk.java.net Sat Nov 7 08:43:59 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 7 Nov 2020 08:43:59 GMT Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 03:36:57 GMT, Dong Bo wrote: > This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. > We witness about ~20% with different basic types on Kunpeng916. The JMH results: > Benchmark (count) (seed) Mode Cnt Score Error Units > # before, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op > # after shift right and accumulate, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op > > We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: > src/hotspot/share/opto/vectornode.cpp, line 226: > case Op_URShiftI: > switch (bt) { > case T_BOOLEAN:return Op_URShiftVB; > case T_CHAR: return Op_URShiftVS; > case T_BYTE: > case T_SHORT: return 0; // Vector logical right shift for signed short > // values produces incorrect Java result for > // negative data because java code should convert > // a short value into int value with sign > // extension before a shift. > case T_INT: return Op_URShiftVI; > default: ShouldNotReachHere(); return 0; > } > We also tried the existing vector operation micro urShiftB, i.e.: > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 > @Benchmark > public void urShiftB() { > for (int i = 0; i < COUNT; i++) { > resB[i] = (byte) (bytesA[i] >>> 3); > } > } > It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 11/6/20 3:44 AM, Dong Bo wrote: > > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. > > We witness about ~20% with different basic types on Kunpeng916. > > Do you find it disappointing that there is such a small improvement? > Do you konw why that is? Perhaps the benchmark is memory bound, or > somesuch? > @theRealAph Thanks for the quick review. For test shiftURightAccumulateByte, as claimed before, it is not vectorized with/without this patch, so the performance are all the same. For other tests (14.13%~19.53% improvement), I checked the profile from `-prof perfasm` in JMH framwork. The runtime is mainly took by load/store instructions other than shifting and accumulating. As far as I considered, there is no way that we can test these improvements without these memory accesses. BTW, according to the hardware PMU counters, 99.617%~99.901% the memory accesses mainly hit in L1/L2 data cache. But the cpu cycles took for load/store in L1/L2 data cache can still be serveral times more than shifting and accumulating registers. I think that's why the improvements are small, hope this could address what you considered, thanks. The profile with test shiftRightAccumulateByte (14.13% improvement): # Before ?? 0x0000ffff68309804: add x6, x2, x15 ?? 0x0000ffff68309808: add x7, x3, x15 19.81% ?? 0x0000ffff6830980c: ldr q16, [x6,#16] 3.81% ?? 0x0000ffff68309810: ldr q17, [x7,#16] ?? 0x0000ffff68309814: sshr v16.16b, v16.16b, #1 ?? 0x0000ffff68309818: add v16.16b, v16.16b, v17.16b ?? 0x0000ffff6830981c: add x15, x4, x15 ?? 0x0000ffff68309820: str q16, [x15,#16] 4.06% ?? 0x0000ffff68309824: ldr q16, [x6,#32] 3.79% ?? 0x0000ffff68309828: ldr q17, [x7,#32] ?? 0x0000ffff6830982c: sshr v16.16b, v16.16b, #1 ?? 0x0000ffff68309830: add v16.16b, v16.16b, v17.16b ?? 0x0000ffff68309834: str q16, [x15,#32] 6.05% ?? 0x0000ffff68309838: ldr q16, [x6,#48] 3.48% ?? 0x0000ffff6830983c: ldr q17, [x7,#48] ?? 0x0000ffff68309840: sshr v16.16b, v16.16b, #1 ?? 0x0000ffff68309844: add v16.16b, v16.16b, v17.16b 0.25% ?? 0x0000ffff68309848: str q16, [x15,#48] 8.67% ?? 0x0000ffff6830984c: ldr q16, [x6,#64] 4.30% ?? 0x0000ffff68309850: ldr q17, [x7,#64] ?? 0x0000ffff68309854: sshr v16.16b, v16.16b, #1 ?? 0x0000ffff68309858: add v16.16b, v16.16b, v17.16b 0.06% ?? 0x0000ffff6830985c: str q16, [x15,#64] # After ?? 0x0000ffff98308d64: add x6, x2, x15 14.77% ?? 0x0000ffff98308d68: ldr q16, [x6,#16] ?? 0x0000ffff98308d6c: add x7, x3, x15 4.55% ?? 0x0000ffff98308d70: ldr q17, [x7,#16] ?? 0x0000ffff98308d74: ssra v17.16b, v16.16b, #1 ?? 0x0000ffff98308d78: add x15, x4, x15 0.02% ?? 0x0000ffff98308d7c: str q17, [x15,#16] 6.14% ?? 0x0000ffff98308d80: ldr q16, [x6,#32] 5.22% ?? 0x0000ffff98308d84: ldr q17, [x7,#32] ?? 0x0000ffff98308d88: ssra v17.16b, v16.16b, #1 ?? 0x0000ffff98308d8c: str q17, [x15,#32] 5.26% ?? 0x0000ffff98308d90: ldr q16, [x6,#48] 5.14% ?? 0x0000ffff98308d94: ldr q17, [x7,#48] ?? 0x0000ffff98308d98: ssra v17.16b, v16.16b, #1 ?? 0x0000ffff98308d9c: str q17, [x15,#48] 6.56% ?? 0x0000ffff98308da0: ldr q16, [x6,#64] 5.10% ?? 0x0000ffff98308da4: ldr q17, [x7,#64] ?? 0x0000ffff98308da8: ssra v17.16b, v16.16b, #1 0.06% ?? 0x0000ffff98308dac: str q17, [x15,#64] ------------- PR: https://git.openjdk.java.net/jdk/pull/1087 From aph at redhat.com Sat Nov 7 09:13:23 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 7 Nov 2020 09:13:23 +0000 Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: <52aee308-a18a-3c3c-e2aa-ee978b1e58d5@redhat.com> On 11/7/20 8:43 AM, Dong Bo wrote: > For other tests (14.13%~19.53% improvement), I checked the profile from `-prof perfasm` in JMH framwork. > The runtime is mainly took by load/store instructions other than shifting and accumulating. > As far as I considered, there is no way that we can test these improvements without these memory accesses. > > BTW, according to the hardware PMU counters, 99.617%~99.901% the memory accesses mainly hit in L1/L2 data cache. > But the cpu cycles took for load/store in L1/L2 data cache can still be serveral times more than shifting and accumulating registers. > > I think that's why the improvements are small, hope this could address what you considered, thanks. OK, but let's think about how this works in the real world outside benchmarking. If you're missing L1 it really doesn't matter much what you do with the data, that 12-cycle load latency is going to dominate whether you use vectorized shifts or not. Hopefully, though, shifting and accumulating isn't the only thing you're doing with that data. Probably, you're going to be doing other things with it too. With that in mind, please produce a benchmark that fits in L1, so that we can see if it works better. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vlivanov at openjdk.java.net Sat Nov 7 11:41:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 7 Nov 2020 11:41:58 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v3] In-Reply-To: References: Message-ID: <0P2H9rV2KCehYBo_hE-uYriC_S0YAs_1QOePRcRtQjI=.614ae3b3-316b-4bd9-a547-680acb193d3d@github.com> On Fri, 6 Nov 2020 08:35:17 GMT, Roland Westrelin wrote: >> This change add 3 new methods in Objects: >> >> public static long checkIndex(long index, long length) >> public static long checkFromToIndex(long fromIndex, long toIndex, long length) >> public static long checkFromIndexSize(long fromIndex, long size, long length) >> >> This mirrors the int utility methods that were added by JDK-8135248 >> with the same motivations. >> >> As is the case with the int checkIndex(), the long checkIndex() method >> is JIT compiled as an intrinsic. It allows the JIT to compile >> checkIndex to an unsigned comparison and properly recognize it as >> a range check that then becomes a candidate for the existing range check >> optimizations. This has proven to be important for panama's >> MemorySegment API and a prototype of this change (with some extra c2 >> improvements) showed that panama micro benchmark results improve >> significantly. >> >> This change includes: >> >> - the API change >> - the C2 intrinsic >> - tests for the API and the C2 intrinsic >> >> This is a joint work with Paul who reviewed and reworked the API change >> and filled the CSR. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > intrinsic comments Marked as reviewed by vlivanov (Reviewer). src/hotspot/share/opto/castnode.cpp line 100: > 98: } > 99: case Op_CastLL: { > 100: assert(!carry_dependency, "carry dependency not supported"); Any particular reason to reject control dependency (except it is not used right now)? ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From redestad at openjdk.java.net Sun Nov 8 20:47:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 8 Nov 2020 20:47:03 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t Message-ID: This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: Baseline: Benchmark Mode Cnt Score Error Units SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op Patched: Benchmark Mode Cnt Score Error Units SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. ------------- Commit messages: - unsigned overflow in find_last_elem (found by some tier6 tests) - Fix and clarify low_bits - Merge branch 'master' into c2_uintptr_t - Improve bitfield comments - ALL_BITS clash, rename constants. - Fix comments from Vladimir and Mikael. A few additional cleanups. - 32-bit compat: 63U -> BitsPerWord-1U - 32-bit: Long -> Word - C2: Convert RegMask and IndexSet to use uintptr_t Changes: https://git.openjdk.java.net/jdk/pull/1102/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1102&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8221404 Stats: 477 lines in 5 files changed: 283 ins; 25 del; 169 mod Patch: https://git.openjdk.java.net/jdk/pull/1102.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1102/head:pull/1102 PR: https://git.openjdk.java.net/jdk/pull/1102 From kvn at openjdk.java.net Sun Nov 8 20:47:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 8 Nov 2020 20:47:04 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 21:21:56 GMT, Claes Redestad wrote: > This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. > > To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: > > Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op > > Patched: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op > > This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. > > Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. Looks good in general. You may want to compare RA times from -XX:+LogCompilation to see clear difference. src/hotspot/share/opto/regmask.cpp line 85: > 83: return SlotsPerVecA; > 84: default: > 85: // Op_VecS and the rest ideal registers. Add assert to make sure we see only expected values here. src/hotspot/share/opto/indexSet.hpp line 105: > 103: // access to IndexSet and IndexSetIterator. > 104: > 105: // A BitBlock is composed of some number of 64 bit words. When a BitBlock 63- or 32- bit words ------------- PR: https://git.openjdk.java.net/jdk/pull/1102 From redestad at openjdk.java.net Sun Nov 8 20:47:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 8 Nov 2020 20:47:04 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 21:55:47 GMT, Vladimir Kozlov wrote: >> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. >> >> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op >> >> Patched: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op >> >> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. >> >> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. > > Looks good in general. > You may want to compare RA times from -XX:+LogCompilation to see clear difference. Using +CITime to get a breakdown of a sample run of Regalloc times for largeMethod_repeat_c2, baseline: C2 Compile Time: 8.731 s ... Regalloc: 4.759 s Ctor Chaitin: 0.000 s Build IFG (virt): 0.190 s Build IFG (phys): 1.523 s Compute Liveness: 0.235 s Regalloc Split: 0.284 s Postalloc Copy Rem: 0.283 s Merge multidefs: 0.011 s Fixup Spills: 0.012 s Compact: 0.005 s Coalesce 1: 0.127 s Coalesce 2: 0.002 s Coalesce 3: 0.747 s Cache LRG: 0.005 s Simplify: 0.375 s Select: 0.423 s Other: 0.536 s Patch: C2 Compile Time: 8.317 s ... Regalloc: 4.340 s Ctor Chaitin: 0.000 s Build IFG (virt): 0.162 s Build IFG (phys): 1.344 s Compute Liveness: 0.237 s Regalloc Split: 0.284 s Postalloc Copy Rem: 0.279 s Merge multidefs: 0.011 s Fixup Spills: 0.012 s Compact: 0.004 s Coalesce 1: 0.121 s Coalesce 2: 0.002 s Coalesce 3: 0.680 s Cache LRG: 0.005 s Simplify: 0.345 s Select: 0.362 s Other: 0.490 s Timings appear pretty stable from run-to-run. No significant change in other phases. ------------- PR: https://git.openjdk.java.net/jdk/pull/1102 From rkennke at openjdk.java.net Sun Nov 8 21:40:03 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Sun, 8 Nov 2020 21:40:03 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access Message-ID: In Shenandoah-testing, we noticed that compiler/jsr292/CallSiteDepContextTest.java fails with the following error: Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:92), pid=906849, tid=907073 Error: Before Updating References, Marked; Must be marked in complete bitmap Referenced from: interior location: 0x00000000fff87504 0x00000000fff874f8 - klass 0x000000010004ecd8 java.lang.invoke.MutableCallSite allocated after mark start not after update watermark marked strong marked weak not in collection set mark: mark(is_neutral no_hash age=0) region: | 2565|R |BTE fff80000, fffc0000, fffc0000|TAMS fff80000|UWM fffc0000|U 256K|T 0B|G 256K|S 0B|L 0B|CP 0 Object: 0x00000000d80a9210 - klass 0x000000010004cf58 java.lang.invoke.DirectMethodHandle not allocated after mark start not after update watermark not marked strong not marked weak in collection set mark: mark(is_neutral no_hash age=0) region: | 9|CS |BTE d8080000, d80c0000, d80c0000|TAMS d80c0000|UWM d80c0000|U 256K|T 256K|G 0B|S 0B|L 22464B|CP 0 Forwardee: (the object itself) In other words, a reachable (marked) MutableCallSite references an unreachable DirectMethodHandle. That reference would subsequently become dangling and lead to crashes if accessed. I narrowed it down to the access in Dependencies::DepStream::recorded_oop_at(int i) which is done as 'strong', which means that it would return the reference even if it is unreachable, e.g. during concurrent class-unloading. This resurrection of the unreachable DMH is potentially fatal: eventually the reference will become dangling (after GC) and lead to crashes when accessed. I believe that access should be 'phantom' instead which causes GCs like Shenandoah and ZGC to return NULL when encountering unreachable objects. (Notice that the bug only manifested after JDK-8255691, we accidentally applied the resurrection-preventing weak-LRB on strong access too) Testing: the offending CallSiteDepContextTest.java, tier1+UseShenandoahGC+ShenandoahVerify, tier2+UseShenandoahGC+ShenandoahVerify, hotspot_gc_shenandoah ------------- Commit messages: - 8256020: Don't resurrect objects on argument-dependency access Changes: https://git.openjdk.java.net/jdk/pull/1113/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1113&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256020 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1113.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1113/head:pull/1113 PR: https://git.openjdk.java.net/jdk/pull/1113 From dongbo at openjdk.java.net Mon Nov 9 05:55:53 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 9 Nov 2020 05:55:53 GMT Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: <9sN31hCWHAUnlqwM9Trqe5aY3reNRta7HBvrQPTO83A=.e2728fe7-4616-4a71-a0c5-b56b36e7b9c2@github.com> On Sat, 7 Nov 2020 08:40:52 GMT, Dong Bo wrote: >> This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >> We witness about ~20% with different basic types on Kunpeng916. The JMH results: >> Benchmark (count) (seed) Mode Cnt Score Error Units >> # before, Kunpeng 916 >> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op >> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op >> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op >> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op >> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) >> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op >> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op >> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op >> # after shift right and accumulate, Kunpeng 916 >> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op >> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op >> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op >> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op >> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) >> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op >> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op >> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op >> >> We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: >> src/hotspot/share/opto/vectornode.cpp, line 226: >> case Op_URShiftI: >> switch (bt) { >> case T_BOOLEAN:return Op_URShiftVB; >> case T_CHAR: return Op_URShiftVS; >> case T_BYTE: >> case T_SHORT: return 0; // Vector logical right shift for signed short >> // values produces incorrect Java result for >> // negative data because java code should convert >> // a short value into int value with sign >> // extension before a shift. >> case T_INT: return Op_URShiftVI; >> default: ShouldNotReachHere(); return 0; >> } >> We also tried the existing vector operation micro urShiftB, i.e.: >> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 >> @Benchmark >> public void urShiftB() { >> for (int i = 0; i < COUNT; i++) { >> resB[i] = (byte) (bytesA[i] >>> 3); >> } >> } >> It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >> >> On 11/6/20 3:44 AM, Dong Bo wrote: >> >> > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >> > We witness about ~20% with different basic types on Kunpeng916. >> >> Do you find it disappointing that there is such a small improvement? >> Do you konw why that is? Perhaps the benchmark is memory bound, or >> somesuch? >> > > @theRealAph Thanks for the quick review. > > For test shiftURightAccumulateByte, as claimed before, it is not vectorized with/without this patch, so the performance are all the same. > > For other tests (14.13%~19.53% improvement), I checked the profile from `-prof perfasm` in JMH framwork. > The runtime is mainly took by load/store instructions other than shifting and accumulating. > As far as I considered, there is no way that we can test these improvements without these memory accesses. > > BTW, according to the hardware PMU counters, 99.617%~99.901% the memory accesses mainly hit in L1/L2 data cache. > But the cpu cycles took for load/store in L1/L2 data cache can still be serveral times more than shifting and accumulating registers. > > I think that's why the improvements are small, hope this could address what you considered, thanks. > > The profile with test shiftRightAccumulateByte (14.13% improvement): > > # Before > ?? 0x0000ffff68309804: add x6, x2, x15 > ?? 0x0000ffff68309808: add x7, x3, x15 > 19.81% ?? 0x0000ffff6830980c: ldr q16, [x6,#16] > 3.81% ?? 0x0000ffff68309810: ldr q17, [x7,#16] > ?? 0x0000ffff68309814: sshr v16.16b, v16.16b, #1 > ?? 0x0000ffff68309818: add v16.16b, v16.16b, v17.16b > ?? 0x0000ffff6830981c: add x15, x4, x15 > ?? 0x0000ffff68309820: str q16, [x15,#16] > 4.06% ?? 0x0000ffff68309824: ldr q16, [x6,#32] > 3.79% ?? 0x0000ffff68309828: ldr q17, [x7,#32] > ?? 0x0000ffff6830982c: sshr v16.16b, v16.16b, #1 > ?? 0x0000ffff68309830: add v16.16b, v16.16b, v17.16b > ?? 0x0000ffff68309834: str q16, [x15,#32] > 6.05% ?? 0x0000ffff68309838: ldr q16, [x6,#48] > 3.48% ?? 0x0000ffff6830983c: ldr q17, [x7,#48] > ?? 0x0000ffff68309840: sshr v16.16b, v16.16b, #1 > ?? 0x0000ffff68309844: add v16.16b, v16.16b, v17.16b > 0.25% ?? 0x0000ffff68309848: str q16, [x15,#48] > 8.67% ?? 0x0000ffff6830984c: ldr q16, [x6,#64] > 4.30% ?? 0x0000ffff68309850: ldr q17, [x7,#64] > ?? 0x0000ffff68309854: sshr v16.16b, v16.16b, #1 > ?? 0x0000ffff68309858: add v16.16b, v16.16b, v17.16b > 0.06% ?? 0x0000ffff6830985c: str q16, [x15,#64] > > # After > ?? 0x0000ffff98308d64: add x6, x2, x15 > 14.77% ?? 0x0000ffff98308d68: ldr q16, [x6,#16] > ?? 0x0000ffff98308d6c: add x7, x3, x15 > 4.55% ?? 0x0000ffff98308d70: ldr q17, [x7,#16] > ?? 0x0000ffff98308d74: ssra v17.16b, v16.16b, #1 > ?? 0x0000ffff98308d78: add x15, x4, x15 > 0.02% ?? 0x0000ffff98308d7c: str q17, [x15,#16] > 6.14% ?? 0x0000ffff98308d80: ldr q16, [x6,#32] > 5.22% ?? 0x0000ffff98308d84: ldr q17, [x7,#32] > ?? 0x0000ffff98308d88: ssra v17.16b, v16.16b, #1 > ?? 0x0000ffff98308d8c: str q17, [x15,#32] > 5.26% ?? 0x0000ffff98308d90: ldr q16, [x6,#48] > 5.14% ?? 0x0000ffff98308d94: ldr q17, [x7,#48] > ?? 0x0000ffff98308d98: ssra v17.16b, v16.16b, #1 > ?? 0x0000ffff98308d9c: str q17, [x15,#48] > 6.56% ?? 0x0000ffff98308da0: ldr q16, [x6,#64] > 5.10% ?? 0x0000ffff98308da4: ldr q17, [x7,#64] > ?? 0x0000ffff98308da8: ssra v17.16b, v16.16b, #1 > 0.06% ?? 0x0000ffff98308dac: str q17, [x15,#64] > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 11/7/20 8:43 AM, Dong Bo wrote: > > > I think that's why the improvements are small, hope this could address what you considered, thanks. > > OK, but let's think about how this works in the real world outside > benchmarking. If you're missing L1 it really doesn't matter much what > you do with the data, that 12-cycle load latency is going to dominate > whether you use vectorized shifts or not. > > Hopefully, though, shifting and accumulating isn't the only thing > you're doing with that data. Probably, you're going to be doing > other things with it too. > > With that in mind, please produce a benchmark that fits in L1, so > that we can see if it works better. > I think the benchmark fits L1 already. Tests shift(U)RightAccumulateLong handle the maximum size of data. The array size is 1028 (count=1028), basic type long (8B), there are 3 arrays. So the data size is abount 24KB. The data cache of Kunpeng916 (cpu cortex-A72) is 32KB per core, it can hold all the data accessed. According to the PMU counters, the cache misses count is negligible. The perf L1-DCache profile of shiftRightAccumulateByte (improvement 14.13%, 3KB data size): # r3: L1-DCache refill # r4: L1-DCache accesses # (1 - r3/r4) is L1-DCache hit ratio $ perf stat -C 0-3 -e r3,r4 -I 1000 # time counts unit events 1.000212280 32,169 r3 1.000212280 2,582,977,726 r4 2.000423060 34,958 r3 2.000423060 2,582,545,543 r4 3.000591100 67,446 r3 3.000591100 2,583,826,062 r4 4.000828880 35,932 r3 4.000828880 2,583,342,061 r4 5.001060280 39,008 r3 5.001060280 2,582,724,118 r4 The cache refill nosie may be from the OS intterrupts, context switch or somesuch. I tried 2 other ways to see if we can do better, but both failed. 1. Reducing the number of load instuctions, like: --- a/test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java +++ b/test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java @@ -81,7 +81,7 @@ public class VectorShiftAccumulate { @Benchmark public void shiftRightAccumulateByte() { for (int i = 0; i < count; i++) { - bytesD[i] = (byte) (bytesA[i] + (bytesB[i] >> 1)); + bytesD[i] = (byte) (0x45 + (bytesB[i] >> 1)); } } The improvement regressed to 7.24%, due to additinal `mov` is added to keep the constant unchanged during the loop: # after, shift and accumulate, additional mov is added ?? 0x0000ffff703075a4: add x16, x3, x15 15.65% ?? 0x0000ffff703075a8: ldr q17, [x16,#16] ?? 0x0000ffff703075ac: mov v18.16b, v16.16b ?? 0x0000ffff703075b0: ssra v18.16b, v17.16b, #1 # before, default 10.43% ?? 0x0000ffff98309f0c: ldr q16, [x6,#16] 4.41% ?? 0x0000ffff98309f10: ldr q17, [x7,#16] ?? 0x0000ffff98309f14: sshr v16.16b, v16.16b, #1 ?? 0x0000ffff98309f18: add v16.16b, v16.16b, v17.16b 2. Adding more shift and accumulate operations, like: --- a/test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java +++ b/test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java @@ -82,6 +82,7 @@ public class VectorShiftAccumulate { public void shiftRightAccumulateByte() { for (int i = 0; i < count; i++) { bytesD[i] = (byte) (bytesA[i] + (bytesB[i] >> 1)); + bytesA[i] = (byte) (bytesB[i] + (bytesD[i] >> 1)); } } But it fails to vectorlize. :( ------------- PR: https://git.openjdk.java.net/jdk/pull/1087 From shade at openjdk.java.net Mon Nov 9 08:04:53 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 08:04:53 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad In-Reply-To: References: Message-ID: On Sat, 7 Nov 2020 07:50:17 GMT, Jie Fu wrote: > src/hotspot/share/adlc/Test/i486.ad is empty. > It might be better to remove it. > Thanks. Might also be a good time to rewrite the references to `x486.ad` to either `x86.ad` or `x86_32.ad` or `x86_64.ad`: $ ack i486.ad src src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp 343: // TODO: The encoding of D2I in i486.ad can cause an exception src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp 552: // TODO: The encoding of D2I in i486.ad can cause an exception src/hotspot/cpu/ppc/ppc.ad 3891: // Comment taken from i486.ad: 3903: // Comment taken from i486.ad: src/hotspot/share/opto/parse2.cpp 2340: // For i486.ad, FILD doesn't restrict precision to 24 or 53 bits. 2355: // For i486.ad, rounding is always necessary (see _l2f above). src/hotspot/share/opto/machnode.hpp 154: // Only returns non-null value for i486.ad's indOffset32X src/hotspot/share/opto/machnode.cpp 298: // In i486.ad, indOffset32X uses base==RegI and disp==RegP, src/hotspot/share/adlc/output_c.cpp 3003: // See BugId 4796752, operand indOffset32X in i486.ad src/hotspot/share/runtime/synchronizer.cpp 62:// variants of the enter-exit fast-path operations. See i486.ad fast_lock(), ------------- PR: https://git.openjdk.java.net/jdk/pull/1107 From roland at openjdk.java.net Mon Nov 9 08:31:04 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 9 Nov 2020 08:31:04 GMT Subject: RFR: 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops Message-ID: PhiNode::Value() has special logic to compute the type of an iv phi based on the counted loop's init value and limit. The type is recomputed from scratch on every call to PhiNode::Value(). As the loop is transformed, PhiNode::Value() may return a type for the iv phi that widens. For instance, for: for (int i = 0; i < 100; i++) { } PhiNode::value() returns accurate bounds for the iv phi. But if the loop is transformed to pre/main/post loops, the init and limit no longer have types that no longer allow an accurate computation of the iv phi bounds. The fix is to filter with the recorded _type for the Phi on every call of PhiNode::Value(). This change was considered before (by Christian) but was not proposed for integration because of a performance regression on a micro benchmark. I investigated the performance regression and added my findings to the bug report. While I'm not 100% sure I found the root cause of the regression, the differences I see in the ideal graph of the hottest methods of the micro benchmark with the change are fairly small and I don't think that regression should block this fix. ------------- Commit messages: - fix Changes: https://git.openjdk.java.net/jdk/pull/1114/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1114&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8250607 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1114.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1114/head:pull/1114 PR: https://git.openjdk.java.net/jdk/pull/1114 From chagedorn at openjdk.java.net Mon Nov 9 08:56:54 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 9 Nov 2020 08:56:54 GMT Subject: RFR: 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops In-Reply-To: References: Message-ID: <4Z5F3vWCBLuXbKXeitG6ezYNtIlrUl0SeaB6kZWzwkE=.1827e983-9ff3-4070-9efe-67d4bfc8695d@github.com> On Mon, 9 Nov 2020 08:26:17 GMT, Roland Westrelin wrote: > PhiNode::Value() has special logic to compute the type of an iv phi > based on the counted loop's init value and limit. The type is > recomputed from scratch on every call to PhiNode::Value(). As the loop > is transformed, PhiNode::Value() may return a type for the iv phi that > widens. For instance, for: > > for (int i = 0; i < 100; i++) { > } > > PhiNode::value() returns accurate bounds for the iv phi. But if the > loop is transformed to pre/main/post loops, the init and limit no > longer have types that no longer allow an accurate computation of the > iv phi bounds. > > The fix is to filter with the recorded _type for the Phi on every call > of PhiNode::Value(). > > This change was considered before (by Christian) but was not proposed > for integration because of a performance regression on a micro > benchmark. I investigated the performance regression and added my > findings to the bug report. While I'm not 100% sure I found the root > cause of the regression, the differences I see in the ideal graph of > the hottest methods of the micro benchmark with the change are fairly > small and I don't think that regression should block this fix. Thanks for the investigation of the micro benchmark regression! I also think that having the additional type information should justify the small regression and should not block this change. But would be interesting to hear what others think about that. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1114 From chagedorn at openjdk.java.net Mon Nov 9 09:04:58 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 9 Nov 2020 09:04:58 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 16:41:28 GMT, Roland Westrelin wrote: > A loop's trip count is computed to have exact trip count 6. Then: > > 1- pre/main/post loops are created which brings the trip count from 6 > to 5 > 2- main loop is unrolled which brings the trip count to 2 > 3- main loop is peeled: Trip count is 1 > 4- pre/main/post loops are created again. Trip count of main loop is 0. > 5- peeling is attempted again and the assert fires > > IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip > count is 1. I propose that IdealLoopTree::policy_range_check() (that > causes the pre/main/post loops insertion the second time) performs the > same check so step 4 doesn't happen. Looks good to me! src/hotspot/share/opto/loopTransform.cpp line 1066: > 1064: Node *trip_counter = cl->phi(); > 1065: > 1066: // check for vectorized loops, some opts are no longer needed Maybe you can update this comment to also include the new check. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1096 From roland at openjdk.java.net Mon Nov 9 09:09:25 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 9 Nov 2020 09:09:25 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v4] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - CastLL should define carry_depency - intrinsic comments - Jorn's comments - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - non x86_64 arch support - c2 test case - intrinsic - Use overloads of method names. Simplify internally to avoid overload resolution issues, leverging List for the exception mapper. - Vladimir's comments - ... and 1 more: https://git.openjdk.java.net/jdk/compare/976fa4c6...692b4298 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1003/files - new: https://git.openjdk.java.net/jdk/pull/1003/files/aaacd328..692b4298 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=02-03 Stats: 515 lines in 41 files changed: 400 ins; 66 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Mon Nov 9 09:12:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 9 Nov 2020 09:12:58 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v3] In-Reply-To: <0P2H9rV2KCehYBo_hE-uYriC_S0YAs_1QOePRcRtQjI=.614ae3b3-316b-4bd9-a547-680acb193d3d@github.com> References: <0P2H9rV2KCehYBo_hE-uYriC_S0YAs_1QOePRcRtQjI=.614ae3b3-316b-4bd9-a547-680acb193d3d@github.com> Message-ID: On Sat, 7 Nov 2020 11:38:37 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/share/opto/castnode.cpp line 100: > >> 98: } >> 99: case Op_CastLL: { >> 100: assert(!carry_dependency, "carry dependency not supported"); > > Any particular reason to reject control dependency (except it is not used right now)? Because it's not used. But actually now that I think more about it, I realize it's required. I updated the change with support for carry_dependency for CastLL. Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From aph at redhat.com Mon Nov 9 09:37:24 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 9 Nov 2020 09:37:24 +0000 Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: <9sN31hCWHAUnlqwM9Trqe5aY3reNRta7HBvrQPTO83A=.e2728fe7-4616-4a71-a0c5-b56b36e7b9c2@github.com> References: <9sN31hCWHAUnlqwM9Trqe5aY3reNRta7HBvrQPTO83A=.e2728fe7-4616-4a71-a0c5-b56b36e7b9c2@github.com> Message-ID: <8f8899ce-790c-13d1-973c-6def204758aa@redhat.com> On 11/9/20 5:55 AM, Dong Bo wrote: > On Sat, 7 Nov 2020 08:40:52 GMT, Dong Bo wrote: > >>> This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. >>> >>> Verified with linux-aarch64-server-release, tier1-3. >>> >>> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >>> We witness about ~20% with different basic types on Kunpeng916. The JMH results: >>> Benchmark (count) (seed) Mode Cnt Score Error Units >>> # before, Kunpeng 916 >>> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) >>> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op >>> # after shift right and accumulate, Kunpeng 916 >>> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op >>> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) >>> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op >>> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op >>> >>> We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: >>> src/hotspot/share/opto/vectornode.cpp, line 226: >>> case Op_URShiftI: >>> switch (bt) { >>> case T_BOOLEAN:return Op_URShiftVB; >>> case T_CHAR: return Op_URShiftVS; >>> case T_BYTE: >>> case T_SHORT: return 0; // Vector logical right shift for signed short >>> // values produces incorrect Java result for >>> // negative data because java code should convert >>> // a short value into int value with sign >>> // extension before a shift. >>> case T_INT: return Op_URShiftVI; >>> default: ShouldNotReachHere(); return 0; >>> } >>> We also tried the existing vector operation micro urShiftB, i.e.: >>> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 >>> @Benchmark >>> public void urShiftB() { >>> for (int i = 0; i < COUNT; i++) { >>> resB[i] = (byte) (bytesA[i] >>> 3); >>> } >>> } >>> It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. >> >>> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >>> >>> On 11/6/20 3:44 AM, Dong Bo wrote: >>> >>>> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >>>> We witness about ~20% with different basic types on Kunpeng916. >>> >>> Do you find it disappointing that there is such a small improvement? >>> Do you konw why that is? Perhaps the benchmark is memory bound, or >>> somesuch? >>> >> >> @theRealAph Thanks for the quick review. >> >> For test shiftURightAccumulateByte, as claimed before, it is not vectorized with/without this patch, so the performance are all the same. >> >> For other tests (14.13%~19.53% improvement), I checked the profile from `-prof perfasm` in JMH framwork. >> The runtime is mainly took by load/store instructions other than shifting and accumulating. >> As far as I considered, there is no way that we can test these improvements without these memory accesses. >> >> BTW, according to the hardware PMU counters, 99.617%~99.901% the memory accesses mainly hit in L1/L2 data cache. >> But the cpu cycles took for load/store in L1/L2 data cache can still be serveral times more than shifting and accumulating registers. >> >> I think that's why the improvements are small, hope this could address what you considered, thanks. >> >> The profile with test shiftRightAccumulateByte (14.13% improvement): >> >> # Before >> ?? 0x0000ffff68309804: add x6, x2, x15 >> ?? 0x0000ffff68309808: add x7, x3, x15 >> 19.81% ?? 0x0000ffff6830980c: ldr q16, [x6,#16] >> 3.81% ?? 0x0000ffff68309810: ldr q17, [x7,#16] >> ?? 0x0000ffff68309814: sshr v16.16b, v16.16b, #1 >> ?? 0x0000ffff68309818: add v16.16b, v16.16b, v17.16b >> ?? 0x0000ffff6830981c: add x15, x4, x15 >> ?? 0x0000ffff68309820: str q16, [x15,#16] >> 4.06% ?? 0x0000ffff68309824: ldr q16, [x6,#32] >> 3.79% ?? 0x0000ffff68309828: ldr q17, [x7,#32] >> ?? 0x0000ffff6830982c: sshr v16.16b, v16.16b, #1 >> ?? 0x0000ffff68309830: add v16.16b, v16.16b, v17.16b >> ?? 0x0000ffff68309834: str q16, [x15,#32] >> 6.05% ?? 0x0000ffff68309838: ldr q16, [x6,#48] >> 3.48% ?? 0x0000ffff6830983c: ldr q17, [x7,#48] >> ?? 0x0000ffff68309840: sshr v16.16b, v16.16b, #1 >> ?? 0x0000ffff68309844: add v16.16b, v16.16b, v17.16b >> 0.25% ?? 0x0000ffff68309848: str q16, [x15,#48] >> 8.67% ?? 0x0000ffff6830984c: ldr q16, [x6,#64] >> 4.30% ?? 0x0000ffff68309850: ldr q17, [x7,#64] >> ?? 0x0000ffff68309854: sshr v16.16b, v16.16b, #1 >> ?? 0x0000ffff68309858: add v16.16b, v16.16b, v17.16b >> 0.06% ?? 0x0000ffff6830985c: str q16, [x15,#64] >> >> # After >> ?? 0x0000ffff98308d64: add x6, x2, x15 >> 14.77% ?? 0x0000ffff98308d68: ldr q16, [x6,#16] >> ?? 0x0000ffff98308d6c: add x7, x3, x15 >> 4.55% ?? 0x0000ffff98308d70: ldr q17, [x7,#16] >> ?? 0x0000ffff98308d74: ssra v17.16b, v16.16b, #1 >> ?? 0x0000ffff98308d78: add x15, x4, x15 >> 0.02% ?? 0x0000ffff98308d7c: str q17, [x15,#16] >> 6.14% ?? 0x0000ffff98308d80: ldr q16, [x6,#32] >> 5.22% ?? 0x0000ffff98308d84: ldr q17, [x7,#32] >> ?? 0x0000ffff98308d88: ssra v17.16b, v16.16b, #1 >> ?? 0x0000ffff98308d8c: str q17, [x15,#32] >> 5.26% ?? 0x0000ffff98308d90: ldr q16, [x6,#48] >> 5.14% ?? 0x0000ffff98308d94: ldr q17, [x7,#48] >> ?? 0x0000ffff98308d98: ssra v17.16b, v16.16b, #1 >> ?? 0x0000ffff98308d9c: str q17, [x15,#48] >> 6.56% ?? 0x0000ffff98308da0: ldr q16, [x6,#64] >> 5.10% ?? 0x0000ffff98308da4: ldr q17, [x7,#64] >> ?? 0x0000ffff98308da8: ssra v17.16b, v16.16b, #1 >> 0.06% ?? 0x0000ffff98308dac: str q17, [x15,#64] > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >> >> On 11/7/20 8:43 AM, Dong Bo wrote: >> >>> I think that's why the improvements are small, hope this could address what you considered, thanks. >> >> OK, but let's think about how this works in the real world outside >> benchmarking. If you're missing L1 it really doesn't matter much what >> you do with the data, that 12-cycle load latency is going to dominate >> whether you use vectorized shifts or not. >> >> Hopefully, though, shifting and accumulating isn't the only thing >> you're doing with that data. Probably, you're going to be doing >> other things with it too. >> >> With that in mind, please produce a benchmark that fits in L1, so >> that we can see if it works better. >> > I think the benchmark fits L1 already. > > Tests shift(U)RightAccumulateLong handle the maximum size of data. > The array size is 1028 (count=1028), basic type long (8B), there are 3 arrays. So the data size is abount 24KB. > The data cache of Kunpeng916 (cpu cortex-A72) is 32KB per core, it can hold all the data accessed. Wow, OK. So the problem is that the memory system can barely keep up with the processor, even when all data is coming in from L1. Fair enough. Approved. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tschatzl at openjdk.java.net Mon Nov 9 10:16:58 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Nov 2020 10:16:58 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 22:29:11 GMT, Martin Doerr wrote: >> JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. >> This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. >> PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) >> We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > add comment and use CompressedOops::is_null An alternative change could be investigating whether setting `top()` earlier in the serial gc full gc. It seems to only be a problem with that collector, as the others are seemingly not affected. I'm okay with this change too though. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1078 From jiefu at openjdk.java.net Mon Nov 9 11:26:10 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 9 Nov 2020 11:26:10 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad [v2] In-Reply-To: References: Message-ID: > src/hotspot/share/adlc/Test/i486.ad is empty. > It might be better to remove it. > Thanks. Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Update the comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1107/files - new: https://git.openjdk.java.net/jdk/pull/1107/files/8fe31182..773f19a2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1107&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1107&range=00-01 Stats: 11 lines in 8 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1107.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1107/head:pull/1107 PR: https://git.openjdk.java.net/jdk/pull/1107 From shade at openjdk.java.net Mon Nov 9 11:26:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 11:26:11 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad [v2] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:23:29 GMT, Jie Fu wrote: >> src/hotspot/share/adlc/Test/i486.ad is empty. >> It might be better to remove it. >> Thanks. > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Update the comments I like this, thanks! Make sure compiler folks approve this. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1107 From jiefu at openjdk.java.net Mon Nov 9 11:26:11 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 9 Nov 2020 11:26:11 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad In-Reply-To: References: Message-ID: <91Qj7Vgu9UahuWWj7L6t40F4UyYLNw8wYoNOjdB7_5k=.88ca16d5-64a5-43d8-b71e-9f31a65991f1@github.com> On Mon, 9 Nov 2020 08:02:13 GMT, Aleksey Shipilev wrote: > Might also be a good time to rewrite the references to `x486.ad` to either `x86.ad` or `x86_32.ad` or `x86_64.ad`: > > ``` > $ ack i486.ad src > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp > 343: // TODO: The encoding of D2I in i486.ad can cause an exception > > src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp > 552: // TODO: The encoding of D2I in i486.ad can cause an exception > > src/hotspot/cpu/ppc/ppc.ad > 3891: // Comment taken from i486.ad: > 3903: // Comment taken from i486.ad: > > src/hotspot/share/opto/parse2.cpp > 2340: // For i486.ad, FILD doesn't restrict precision to 24 or 53 bits. > 2355: // For i486.ad, rounding is always necessary (see _l2f above). > > src/hotspot/share/opto/machnode.hpp > 154: // Only returns non-null value for i486.ad's indOffset32X > > src/hotspot/share/opto/machnode.cpp > 298: // In i486.ad, indOffset32X uses base==RegI and disp==RegP, > > src/hotspot/share/adlc/output_c.cpp > 3003: // See BugId 4796752, operand indOffset32X in i486.ad > > src/hotspot/share/runtime/synchronizer.cpp > 62:// variants of the enter-exit fast-path operations. See i486.ad fast_lock(), > ``` Good idea. Updated. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1107 From shade at openjdk.java.net Mon Nov 9 12:04:09 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 12:04:09 GMT Subject: RFR: 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 Message-ID: This seems to happen only under release bits, and only with Shenandoah. This shows up in tier1 after JDK-8255762, but only in release builds. The section in question is "MethodHandles adapters". $ CONF=linux-x86_64-server-release make run-test TEST=java/lang/invoke/6987555/Test6987555.java TEST_VM_OPTS="-XX:+UseShenandoahGC" STDOUT: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (codeBuffer.cpp:971), pid=907468, tid=907470 # guarantee(sect->end() <= tend) failed: sanity: MethodHandles adapters, 0x00007f76902a9da9, 0x00007f76902a9d38 # # JRE version: (16.0) (build ) # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.shade.jdk, interpreted mode, sharing, compressed oops, shenandoah gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x58f47f] CodeBuffer::verify_section_allocation()+0x1ff Additional testing: - [x] `java/lang/invoke` tests on Linux x86_64 {fastdebug,release} with Shenandoah - [x] `java/lang/invoke` tests on Linux x86_32 {fastdebug,release} with Shenandoah - [x] `java/lang/invoke` tests on Linux AArch64 {fastdebug,release} with Shenandoah ------------- Commit messages: - 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 Changes: https://git.openjdk.java.net/jdk/pull/1121/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1121&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256036 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1121.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1121/head:pull/1121 PR: https://git.openjdk.java.net/jdk/pull/1121 From dongbo4 at huawei.com Mon Nov 9 12:40:05 2020 From: dongbo4 at huawei.com (dongbo (E)) Date: Mon, 9 Nov 2020 20:40:05 +0800 Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: <8f8899ce-790c-13d1-973c-6def204758aa@redhat.com> References: <9sN31hCWHAUnlqwM9Trqe5aY3reNRta7HBvrQPTO83A=.e2728fe7-4616-4a71-a0c5-b56b36e7b9c2@github.com> <8f8899ce-790c-13d1-973c-6def204758aa@redhat.com> Message-ID: <206e2245-1ae0-fdf5-85ec-6ca8d956c896@huawei.com> On 2020/11/9 17:37, Andrew Haley wrote: > On 11/9/20 5:55 AM, Dong Bo wrote: >> On Sat, 7 Nov 2020 08:40:52 GMT, Dong Bo wrote: >> >>>> This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. >>>> >>>> Verified with linux-aarch64-server-release, tier1-3. >>>> >>>> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >>>> We witness about ~20% with different basic types on Kunpeng916. >>>> >>>> >>>> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >>>> >>>> On 11/6/20 3:44 AM, Dong Bo wrote: >>>> >>>>> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >>>>> We witness about ~20% with different basic types on Kunpeng916. >>>> Do you find it disappointing that there is such a small improvement? >>>> Do you konw why that is? Perhaps the benchmark is memory bound, or >>>> somesuch? >>>> @theRealAph Thanks for the quick review. >>>> >>>> BTW, according to the hardware PMU counters, 99.617%~99.901% the memory accesses mainly hit in L1/L2 data cache. >>>> But the cpu cycles took for load/store in L1/L2 data cache can still be serveral times more than shifting and accumulating registers. >>>> >>>> I think that's why the improvements are small, hope this could address what you considered, thanks. >>> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >>> >>> On 11/7/20 8:43 AM, Dong Bo wrote: >>> >>>> I think that's why the improvements are small, hope this could address what you considered, thanks. >>> OK, but let's think about how this works in the real world outside >>> benchmarking. If you're missing L1 it really doesn't matter much what >>> you do with the data, that 12-cycle load latency is going to dominate >>> whether you use vectorized shifts or not. >>> >>> Hopefully, though, shifting and accumulating isn't the only thing >>> you're doing with that data. Probably, you're going to be doing >>> other things with it too. >>> >>> With that in mind, please produce a benchmark that fits in L1, so >>> that we can see if it works better. >>> >> I think the benchmark fits L1 already. >> >> Tests shift(U)RightAccumulateLong handle the maximum size of data. >> The array size is 1028 (count=1028), basic type long (8B), there are 3 arrays. So the data size is abount 24KB. >> The data cache of Kunpeng916 (cpu cortex-A72) is 32KB per core, it can hold all the data accessed. > Wow, OK. So the problem is that the memory system can barely keep up with > the processor, even when all data is coming in from L1. Fair enough. Totally agree. > Approved. Thanks. Could you please approve this on the Github page of these PR? Link: https://git.openjdk.java.net/jdk/pull/1087 BTW, the Base64.encode intrinsic we discussed few days ago has not been approved neither. Is there any further consideration for that? Base64.encode PR link: https://git.openjdk.java.net/jdk/pull/992 From jiefu at openjdk.java.net Mon Nov 9 13:07:58 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 9 Nov 2020 13:07:58 GMT Subject: RFR: 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:31:16 GMT, Aleksey Shipilev wrote: > This seems to happen only under release bits, and only with Shenandoah. This shows up in tier1 after JDK-8255762, but only in release builds. The section in question is "MethodHandles adapters". > > $ CONF=linux-x86_64-server-release make run-test TEST=java/lang/invoke/6987555/Test6987555.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > STDOUT: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (codeBuffer.cpp:971), pid=907468, tid=907470 > # guarantee(sect->end() <= tend) failed: sanity: MethodHandles adapters, 0x00007f76902a9da9, 0x00007f76902a9d38 > # > # JRE version: (16.0) (build ) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.shade.jdk, interpreted mode, sharing, compressed oops, shenandoah gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x58f47f] CodeBuffer::verify_section_allocation()+0x1ff > > Additional testing: > - [x] `java/lang/invoke` tests on Linux x86_64 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux x86_32 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux AArch64 {fastdebug,release} with Shenandoah After JDK-8255762, the adapter_code_size seems to be increased by 213 bytes with Shenandoah. So changing it from 3000 to 4000 seems fine. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/1121 From redestad at openjdk.java.net Mon Nov 9 14:17:16 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 14:17:16 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v2] In-Reply-To: References: Message-ID: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> > This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. > > To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: > > Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op > > Patched: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op > > This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. > > Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Avoid using ULL ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1102/files - new: https://git.openjdk.java.net/jdk/pull/1102/files/fde1fc5e..38c60560 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1102&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1102&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1102.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1102/head:pull/1102 PR: https://git.openjdk.java.net/jdk/pull/1102 From vlivanov at openjdk.java.net Mon Nov 9 14:40:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 14:40:03 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails Message-ID: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. Testing: - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails - [x] hs-precheckin-comp,hs-tier1,hs-tier2 ------------- Commit messages: - Fix PrintDeoptimizationDetails Changes: https://git.openjdk.java.net/jdk/pull/1124/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256050 Stats: 30 lines in 7 files changed: 15 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/1124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1124/head:pull/1124 PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Mon Nov 9 14:43:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 14:43:06 GMT Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken Message-ID: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Vector box allocation is a multi-step process: it involves 2 allocations (typed vector instance + primitive array) and 2 initializing stores (vector value store into primitive array and field initialization in typed vector instance). If deoptimization happens at any of the aforementioned allocation points the result is broken: either wrong instance is put on stack (primitive array instead of typed vector) or vector initializing store is missing. There are 2 ways to fix the problem: - piggy-back on rematerialization; - reexecute the bytecode which allocates the instance. I chose the latter option because it's simpler to implement. (Rematerialization requires some adjustment of JVM state associated with each allocation to record vector type and value.) The downside is there shouldn't be any side effects present. It's not a problem right now, because boxing happens only at vector intrinsics use sites and the only intrinsic which has any side effects is vector store operation (it doesn't produce vectors, hence, no boxing needed). The actual fix is small: adding `PreserveReexecuteState` in `LibraryCallKit::box_vector` is enough for the problem to go away. The rest is cleanups/refactorings. Testing: - [x] jdk/incubator/vector tests w/ -XX:+DeoptimizeALot & -XX:UseAVX={3,2,1,0} - [ ] hs-precheckin-comp, hs-tier1, hs-tier2 Thanks! ------------- Commit messages: - Trivial cleanup - Fix deoptimization during vector boxing Changes: https://git.openjdk.java.net/jdk/pull/1126/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1126&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255367 Stats: 171 lines in 4 files changed: 79 ins; 84 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/1126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1126/head:pull/1126 PR: https://git.openjdk.java.net/jdk/pull/1126 From redestad at openjdk.java.net Mon Nov 9 14:55:56 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 14:55:56 GMT Subject: RFR: 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:31:16 GMT, Aleksey Shipilev wrote: > This seems to happen only under release bits, and only with Shenandoah. This shows up in tier1 after JDK-8255762, but only in release builds. The section in question is "MethodHandles adapters". > > $ CONF=linux-x86_64-server-release make run-test TEST=java/lang/invoke/6987555/Test6987555.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > STDOUT: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (codeBuffer.cpp:971), pid=907468, tid=907470 > # guarantee(sect->end() <= tend) failed: sanity: MethodHandles adapters, 0x00007f76902a9da9, 0x00007f76902a9d38 > # > # JRE version: (16.0) (build ) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.shade.jdk, interpreted mode, sharing, compressed oops, shenandoah gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x58f47f] CodeBuffer::verify_section_allocation()+0x1ff > > Additional testing: > - [x] `java/lang/invoke` tests on Linux x86_64 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux x86_32 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux AArch64 {fastdebug,release} with Shenandoah Unfortunate. I've considered taking a stab at having these buffers downsize to their actual use after creation, similar to how we do things in the interpreter, but it's significant amount of work for a questionable benefit. Fine adding back a little bit of extra room. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1121 From vlivanov at openjdk.java.net Mon Nov 9 15:07:02 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 15:07:02 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values Message-ID: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. Testing: - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Thanks! ------------- Commit messages: - Fix EVEX variants of MinV/MaxV on x86. Changes: https://git.openjdk.java.net/jdk/pull/1128/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1128&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256054 Stats: 8 lines in 2 files changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1128.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1128/head:pull/1128 PR: https://git.openjdk.java.net/jdk/pull/1128 From redestad at openjdk.java.net Mon Nov 9 15:20:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 15:20:58 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 14:57:09 GMT, Vladimir Ivanov wrote: > Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. > > The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. > > The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. > > Testing: > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From aph at openjdk.java.net Mon Nov 9 16:10:58 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 9 Nov 2020 16:10:58 GMT Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 03:36:57 GMT, Dong Bo wrote: > This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. > We witness about ~20% with different basic types on Kunpeng916. The JMH results: > Benchmark (count) (seed) Mode Cnt Score Error Units > # before, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op > # after shift right and accumulate, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op > > We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: > src/hotspot/share/opto/vectornode.cpp, line 226: > case Op_URShiftI: > switch (bt) { > case T_BOOLEAN:return Op_URShiftVB; > case T_CHAR: return Op_URShiftVS; > case T_BYTE: > case T_SHORT: return 0; // Vector logical right shift for signed short > // values produces incorrect Java result for > // negative data because java code should convert > // a short value into int value with sign > // extension before a shift. > case T_INT: return Op_URShiftVI; > default: ShouldNotReachHere(); return 0; > } > We also tried the existing vector operation micro urShiftB, i.e.: > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 > @Benchmark > public void urShiftB() { > for (int i = 0; i < COUNT; i++) { > resB[i] = (byte) (bytesA[i] >>> 3); > } > } > It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1087 From aph at redhat.com Mon Nov 9 16:14:08 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 9 Nov 2020 16:14:08 +0000 Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: <206e2245-1ae0-fdf5-85ec-6ca8d956c896@huawei.com> References: <9sN31hCWHAUnlqwM9Trqe5aY3reNRta7HBvrQPTO83A=.e2728fe7-4616-4a71-a0c5-b56b36e7b9c2@github.com> <8f8899ce-790c-13d1-973c-6def204758aa@redhat.com> <206e2245-1ae0-fdf5-85ec-6ca8d956c896@huawei.com> Message-ID: <8d572935-d752-4958-fb05-bf09f5631840@redhat.com> On 09/11/2020 12:40, dongbo (E) wrote: > BTW, the Base64.encode intrinsic we discussed few days ago has not been > approved neither. > > Is there any further consideration for that? > > Base64.encode PR link: https://git.openjdk.java.net/jdk/pull/992 Yes, there was one minor style thing: Register doff = c_rarg4; // position for writing to dest array Register isURL = c_rarg5; // Base64 or URL chracter set Register codec = r6; Register length = r7; I guess the change in naming style here is because isURL really is an argument, but code and length are temps, but this expects the reader to be aware that r6 == c_rarg6 and r7 == c_rarg7. I just find the change in register naming style confusing. Otherwise it's all perfectly fine. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From psandoz at openjdk.java.net Mon Nov 9 16:25:53 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 9 Nov 2020 16:25:53 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 14:57:09 GMT, Vladimir Ivanov wrote: > Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. > > The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. > > The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. > > Testing: > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From kvn at openjdk.java.net Mon Nov 9 17:04:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 9 Nov 2020 17:04:59 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v2] In-Reply-To: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> References: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> Message-ID: On Mon, 9 Nov 2020 14:17:16 GMT, Claes Redestad wrote: >> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. >> >> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op >> >> Patched: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op >> >> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. >> >> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Avoid using ULL Looks good. src/hotspot/share/opto/indexSet.hpp line 67: > 65: block_index_length = 8, > 66: // Split over 4 or 8 words depending on bitness > 67: word_index_length = block_index_length - LogBitsPerWord, Nice. I also thought about using ?word? definitions. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1102 From redestad at openjdk.java.net Mon Nov 9 17:10:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 17:10:57 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v2] In-Reply-To: References: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> Message-ID: On Mon, 9 Nov 2020 16:59:52 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid using ULL > > src/hotspot/share/opto/indexSet.hpp line 67: > >> 65: block_index_length = 8, >> 66: // Split over 4 or 8 words depending on bitness >> 67: word_index_length = block_index_length - LogBitsPerWord, > > Nice. I also thought about using ?word? definitions. Thanks! @vidmik pushed me to come up with derived definitions here rather than adding another magic constant for 64-bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/1102 From kvn at openjdk.java.net Mon Nov 9 17:15:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 9 Nov 2020 17:15:56 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Mon Nov 9 17:30:04 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 17:30:04 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers Message-ID: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. Testing (with some other relevant patches): - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware - [x] hs-precheckin-comp, hs-tier1, hs-tier2 ------------- Commit messages: - Save upper halves of ZMM0-15 when AVX512 support is enabled. Changes: https://git.openjdk.java.net/jdk/pull/1131/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1131&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256061 Stats: 23 lines in 1 file changed: 14 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1131.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1131/head:pull/1131 PR: https://git.openjdk.java.net/jdk/pull/1131 From vlivanov at openjdk.java.net Mon Nov 9 19:01:01 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 19:01:01 GMT Subject: RFR: 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 Message-ID: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> `RegisterMap` handles registers on a per-slot basis: every register is split into slot-sized (32-bit) parts that are tracked independently. On x86 vector registers are up to 512-bit in size and occupy up to 16 slots. In order to save on constructing RegisterMaps, `RegisterMap`s are sparsely populated: location of the first slot is recorded and the rest is derived from it and `RegisterMap::pd_location()` is used to compute the address of a particular slot if it is missing in the map. As I mentioned in #1131, frame layout for vector registers is quite complicated: ZMM0-15 are split in 3 parts (2 x 128-bit + 1 x 256-bit) when saved while ZMM16-31 are stored in full. Proposed patch reifies those details in `RegisterMap::pd_location()` logic and it becomes possible to initialize just 3 slot locations for ZMM0-15 to be able to recover every slot location inside the register while for ZMM16-31 initializing a single (base) slot is enough. Testing (along with some other relevant patches): - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware - [x] hs-precheckin-comp, hs-tier1, hs-tier2 ------------- Commit messages: - Improve vector register handling in RegisterMap::pd_location() Changes: https://git.openjdk.java.net/jdk/pull/1132/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1132&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256058 Stats: 35 lines in 1 file changed: 19 ins; 7 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1132.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1132/head:pull/1132 PR: https://git.openjdk.java.net/jdk/pull/1132 From vlivanov at openjdk.java.net Mon Nov 9 19:17:02 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 19:17:02 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 Message-ID: Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. Testing (with some other relevant patches): - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware - [x] hs-precheckin-comp, hs-tier1, hs-tier2 ------------- Commit messages: - Save save vector registers in deopt stub. Changes: https://git.openjdk.java.net/jdk/pull/1134/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1134&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256056 Stats: 16 lines in 1 file changed: 5 ins; 1 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1134.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1134/head:pull/1134 PR: https://git.openjdk.java.net/jdk/pull/1134 From sviswanathan at openjdk.java.net Mon Nov 9 19:23:56 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 9 Nov 2020 19:23:56 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: <8xyunX3va_TPcnQQ_iOZR376mgpVeSZrlJhVCci3N6U=.e13245ba-8c22-4684-aef9-e67c91ddc898@github.com> On Mon, 9 Nov 2020 16:22:57 GMT, Paul Sandoz wrote: >> Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. >> >> The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. >> >> The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. >> >> Testing: >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 >> >> Thanks! > > Marked as reviewed by psandoz (Reviewer). Changes look good. Thanks for fixing this. ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From sviswanathan at openjdk.java.net Mon Nov 9 20:02:55 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 9 Nov 2020 20:02:55 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:59:30 GMT, Vladimir Ivanov wrote: > Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). > > The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. > > As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 If possible, it will be good to do this optionally. Doing it unconditionally, the concern is the frequency drop with larger vector width. ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From mdoerr at openjdk.java.net Mon Nov 9 20:43:03 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 9 Nov 2020 20:43:03 GMT Subject: RFR: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap [v2] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 10:14:06 GMT, Thomas Schatzl wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and use CompressedOops::is_null > > An alternative change could be investigating whether setting `top()` earlier in the serial gc full gc. It seems to only be a problem with that collector, as the others are seemingly not affected. > > I'm okay with this change too though. Thanks for the reviews! @tschatzl: Calling `set_top` during phase 2 sounds like a nice idea, but it's currently done in `reset_after_compaction` and it may possibly break other things if I move it to an earlier phase. So I should probably better go ahead with the current version. ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From dcubed at openjdk.java.net Mon Nov 9 21:05:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 21:05:57 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Looks good. Your call on whether to add the comment I proposed. src/hotspot/share/runtime/basicLock.cpp line 34: > 32: markWord mark_word = displaced_header(); > 33: if (mark_word.value() != 0) { > 34: bool print_monitor_info = (owner != NULL) && (owner->mark() == markWord::from_pointer((void*)this)); Could use a comment between L33 and L34: // Print monitor info if there's an owning oop and it refers to this BasicLock. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1124 From dlong at openjdk.java.net Mon Nov 9 21:14:55 2020 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 9 Nov 2020 21:14:55 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 14:57:09 GMT, Vladimir Ivanov wrote: > Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. > > The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. > > The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. > > Testing: > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 9 21:18:08 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 9 Nov 2020 21:18:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 - Added test cases for exp at the value of 1024 and 10000 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/305d915b..757192c3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From vlivanov at openjdk.java.net Mon Nov 9 21:24:55 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 21:24:55 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 20:00:20 GMT, Sandhya Viswanathan wrote: > If possible, it will be good to do this optionally. As I mentioned earlier, it's possible, but requires 2 variants of deopt stub and dispatching logic to choose proper one depeding on the location it is patched with. Deoptimization process is already expensive and execution ends up in interpreter anyway. Is the frequency drop such a problem in case of deopt stub to justify the increase in complexity? Also, I'd like to point out that on older CPUs with AVX512 supported (like Skylake) AVX=3 is turned off by default to avoid generating EVEX-encoded instructions on CPUs which have frequency scaling issues. So, if AVX=3 is specified, there are already many places (which are more performance sensitive IMO) with EVEX-encoded instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From sviswanathan at openjdk.java.net Mon Nov 9 21:28:56 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 9 Nov 2020 21:28:56 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> Message-ID: On Mon, 9 Nov 2020 16:44:23 GMT, Vladimir Ivanov wrote: > `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). > > The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 It looks like the upper 256 bits of ZMM0-15 are already being saved as part of the following statements in sharedRuntime_x86_64.cpp: 191 if (save_vectors) { 192 // Save upper half of YMM registers(0..15) 193 int base_addr = XSAVE_AREA_YMM_BEGIN; 194 for (int n = 0; n < 16; n++) { 195 __ vextractf128_high(Address(rsp, base_addr+n*16), as_XMMRegister(n)); 196 } 197 if (VM_Version::supports_evex()) { 198 // Save upper half of ZMM registers(0..15) 199 base_addr = XSAVE_AREA_ZMM_BEGIN; 200 for (int n = 0; n < 16; n++) { 201 __ vextractf64x4_high(Address(rsp, base_addr+n*32), as_XMMRegister(n)); 202 } 203 // Save full ZMM registers(16..num_xmm_regs) 204 base_addr = XSAVE_AREA_UPPERBANK; 205 off = 0; 206 int vector_len = Assembler::AVX_512bit; 207 for (int n = 16; n < num_xmm_regs; n++) { 208 __ evmovdqul(Address(rsp, base_addr+(off++*64)), as_XMMRegister(n), vector_le n); 209 } 210 } 211 } ------------- PR: https://git.openjdk.java.net/jdk/pull/1131 From darcy at openjdk.java.net Mon Nov 9 21:34:57 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 9 Nov 2020 21:34:57 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:18:08 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 > - Added test cases for exp at the value of 1024 and 10000 test/jdk/java/lang/Math/WorstCaseTests.java line 117: > 115: {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, > 116: {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, > 117: {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From vlivanov at openjdk.java.net Mon Nov 9 21:36:59 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 21:36:59 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> Message-ID: On Mon, 9 Nov 2020 21:26:06 GMT, Sandhya Viswanathan wrote: >> `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). >> >> The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. >> >> Testing (with some other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > It looks like the upper 256 bits of ZMM0-15 are already being saved as part of the following statements in sharedRuntime_x86_64.cpp: > 191 if (save_vectors) { > 192 // Save upper half of YMM registers(0..15) > 193 int base_addr = XSAVE_AREA_YMM_BEGIN; > 194 for (int n = 0; n < 16; n++) { > 195 __ vextractf128_high(Address(rsp, base_addr+n*16), as_XMMRegister(n)); > 196 } > 197 if (VM_Version::supports_evex()) { > 198 // Save upper half of ZMM registers(0..15) > 199 base_addr = XSAVE_AREA_ZMM_BEGIN; > 200 for (int n = 0; n < 16; n++) { > 201 __ vextractf64x4_high(Address(rsp, base_addr+n*32), as_XMMRegister(n)); > 202 } > 203 // Save full ZMM registers(16..num_xmm_regs) > 204 base_addr = XSAVE_AREA_UPPERBANK; > 205 off = 0; > 206 int vector_len = Assembler::AVX_512bit; > 207 for (int n = 16; n < num_xmm_regs; n++) { > 208 __ evmovdqul(Address(rsp, base_addr+(off++*64)), as_XMMRegister(n), vector_le > n); > 209 } > 210 } > 211 } Thanks for the correction, Sandhya. You are absolutely right that the values are already saved. The problem is the `OopMap` doesn't reflect that and the patch fixes that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1131 From sviswanathan at openjdk.java.net Mon Nov 9 21:40:56 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 9 Nov 2020 21:40:56 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:21:51 GMT, Vladimir Ivanov wrote: >> If possible, it will be good to do this optionally. >> Doing it unconditionally, the concern is the frequency drop with larger vector width. > >> If possible, it will be good to do this optionally. > As I mentioned earlier, it's possible, but requires 2 variants of deopt stub and dispatching logic to choose proper one depeding on the location it is patched with. > > Deoptimization process is already expensive and execution ends up in interpreter anyway. > > Is the frequency drop such a problem in case of deopt stub to justify the increase in complexity? > Also, I'd like to point out that on older CPUs with AVX512 supported (like Skylake) AVX=3 is turned off by default to avoid generating EVEX-encoded instructions on CPUs which have frequency scaling issues. So, if AVX=3 is specified, there are already many places (which are more performance sensitive IMO) with EVEX-encoded instructions. May be not. Let us go with your solution and we can revisit if the performance runs show otherwise. ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From vlivanov at openjdk.java.net Mon Nov 9 22:09:02 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 22:09:02 GMT Subject: RFR: 8256073: Improve vector rematerialization support Message-ID: Having #1131, #1132, #1134, and #1136 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) Testing (with other relevant patches): - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware - [x] hs-precheckin-comp, hs-tier1, hs-tier2 ------------- Commit messages: - Improve vector rematerialization support Changes: https://git.openjdk.java.net/jdk/pull/1136/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1136&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256073 Stats: 136 lines in 2 files changed: 32 ins; 70 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/1136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1136/head:pull/1136 PR: https://git.openjdk.java.net/jdk/pull/1136 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 9 22:12:57 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 9 Nov 2020 22:12:57 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On Mon, 2 Nov 2020 17:42:27 GMT, Joe Darcy wrote: >> Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 >> - Added test cases for exp at the value of 1024 and 10000 > > The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. Hi Darcy, Where should the test be? A new test file? Best regards, Xubo From: Joe Darcy Sent: Monday, November 9, 2020 1:33 PM To: openjdk/jdk Cc: Zhang, Xubo ; Mention Subject: Re: [openjdk/jdk] 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms (#894) @jddarcy commented on this pull request. ________________________________ In test/jdk/java/lang/Math/WorstCaseTests.java: > @@ -114,8 +114,8 @@ private static int testWorstExp() { {+0x1.A8EAD058BC6B8p3, 0x1.1D71965F516ADp19}, {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, - {+0x4.0p8, Double.POSITIVE_INFINITY}, - {+0x2.71p12, Double.POSITIVE_INFINITY}, + {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From redestad at openjdk.java.net Mon Nov 9 23:13:56 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 23:13:56 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:59:30 GMT, Vladimir Ivanov wrote: > Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). > > The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. > > As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Looks OK - on my AVX=2-enabled test system the added overhead of the extra work during bootstrap is negligible (~16k instructions). ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1134 From sviswanathan at openjdk.java.net Mon Nov 9 23:30:55 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 9 Nov 2020 23:30:55 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> Message-ID: On Mon, 9 Nov 2020 21:34:26 GMT, Vladimir Ivanov wrote: >> It looks like the upper 256 bits of ZMM0-15 are already being saved as part of the following statements in sharedRuntime_x86_64.cpp: >> 191 if (save_vectors) { >> 192 // Save upper half of YMM registers(0..15) >> 193 int base_addr = XSAVE_AREA_YMM_BEGIN; >> 194 for (int n = 0; n < 16; n++) { >> 195 __ vextractf128_high(Address(rsp, base_addr+n*16), as_XMMRegister(n)); >> 196 } >> 197 if (VM_Version::supports_evex()) { >> 198 // Save upper half of ZMM registers(0..15) >> 199 base_addr = XSAVE_AREA_ZMM_BEGIN; >> 200 for (int n = 0; n < 16; n++) { >> 201 __ vextractf64x4_high(Address(rsp, base_addr+n*32), as_XMMRegister(n)); >> 202 } >> 203 // Save full ZMM registers(16..num_xmm_regs) >> 204 base_addr = XSAVE_AREA_UPPERBANK; >> 205 off = 0; >> 206 int vector_len = Assembler::AVX_512bit; >> 207 for (int n = 16; n < num_xmm_regs; n++) { >> 208 __ evmovdqul(Address(rsp, base_addr+(off++*64)), as_XMMRegister(n), vector_le >> n); >> 209 } >> 210 } >> 211 } > > Thanks for the correction, Sandhya. > You are absolutely right that the values are already saved. > The problem is the `OopMap` doesn't reflect that and the patch fixes that. Thanks for the clarification, the changes looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1131 From sviswanathan at openjdk.java.net Tue Nov 10 00:07:00 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 10 Nov 2020 00:07:00 GMT Subject: RFR: 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 In-Reply-To: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> References: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> Message-ID: <8UNcmlgDfe0n91qyB--BweQ1itzs4BNTu2gsuFmZ9YE=.f781e71c-9509-4d28-b538-334c15f63ec5@github.com> On Mon, 9 Nov 2020 16:54:56 GMT, Vladimir Ivanov wrote: > `RegisterMap` handles registers on a per-slot basis: every register is split into slot-sized (32-bit) parts that are tracked independently. On x86 vector registers are up to 512-bit in size and occupy up to 16 slots. In order to save on constructing RegisterMaps, `RegisterMap`s are sparsely populated: location of the first slot is recorded and the rest is derived from it and `RegisterMap::pd_location()` is used to compute the address of a particular slot if it is missing in the map. > > As I mentioned in #1131, frame layout for vector registers is quite complicated: ZMM0-15 are split in 3 parts (2 x 128-bit + 1 x 256-bit) when saved while ZMM16-31 are stored in full. > > Proposed patch reifies those details in `RegisterMap::pd_location()` logic and it becomes possible to initialize just 3 slot locations for ZMM0-15 to be able to recover every slot location inside the register while for ZMM16-31 initializing a single (base) slot is enough. > > Testing (along with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Changes look good. ------------- PR: https://git.openjdk.java.net/jdk/pull/1132 From kvn at openjdk.java.net Tue Nov 10 00:16:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 10 Nov 2020 00:16:55 GMT Subject: RFR: 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 In-Reply-To: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> References: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> Message-ID: On Mon, 9 Nov 2020 16:54:56 GMT, Vladimir Ivanov wrote: > `RegisterMap` handles registers on a per-slot basis: every register is split into slot-sized (32-bit) parts that are tracked independently. On x86 vector registers are up to 512-bit in size and occupy up to 16 slots. In order to save on constructing RegisterMaps, `RegisterMap`s are sparsely populated: location of the first slot is recorded and the rest is derived from it and `RegisterMap::pd_location()` is used to compute the address of a particular slot if it is missing in the map. > > As I mentioned in #1131, frame layout for vector registers is quite complicated: ZMM0-15 are split in 3 parts (2 x 128-bit + 1 x 256-bit) when saved while ZMM16-31 are stored in full. > > Proposed patch reifies those details in `RegisterMap::pd_location()` logic and it becomes possible to initialize just 3 slot locations for ZMM0-15 to be able to recover every slot location inside the register while for ZMM16-31 initializing a single (base) slot is enough. > > Testing (along with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1132 From kvn at openjdk.java.net Tue Nov 10 00:23:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 10 Nov 2020 00:23:57 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> Message-ID: <-F5bYNjrIidP-2Jeln2nLg-Vr9VJNKqDk2PUUNMCNGA=.20bfdfea-65cf-4f32-8b7c-c8a6621cf6e0@github.com> On Mon, 9 Nov 2020 16:44:23 GMT, Vladimir Ivanov wrote: > `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). > > The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1131 From kvn at openjdk.java.net Tue Nov 10 00:43:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 10 Nov 2020 00:43:58 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:59:30 GMT, Vladimir Ivanov wrote: > Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). > > The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. > > As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 We did record that compiled code has wide vectors. May be we need specialized depot stub for 512 bits vectors ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From dongbo at openjdk.java.net Tue Nov 10 01:20:56 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 01:20:56 GMT Subject: RFR: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 16:08:04 GMT, Andrew Haley wrote: >> This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. >> We witness about ~20% with different basic types on Kunpeng916. The JMH results: >> Benchmark (count) (seed) Mode Cnt Score Error Units >> # before, Kunpeng 916 >> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op >> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op >> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op >> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op >> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) >> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op >> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op >> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op >> # after shift right and accumulate, Kunpeng 916 >> VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op >> VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op >> VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op >> VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op >> VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) >> VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op >> VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op >> VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op >> >> We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: >> src/hotspot/share/opto/vectornode.cpp, line 226: >> case Op_URShiftI: >> switch (bt) { >> case T_BOOLEAN:return Op_URShiftVB; >> case T_CHAR: return Op_URShiftVS; >> case T_BYTE: >> case T_SHORT: return 0; // Vector logical right shift for signed short >> // values produces incorrect Java result for >> // negative data because java code should convert >> // a short value into int value with sign >> // extension before a shift. >> case T_INT: return Op_URShiftVI; >> default: ShouldNotReachHere(); return 0; >> } >> We also tried the existing vector operation micro urShiftB, i.e.: >> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 >> @Benchmark >> public void urShiftB() { >> for (int i = 0; i < COUNT; i++) { >> resB[i] = (byte) (bytesA[i] >>> 3); >> } >> } >> It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. > > Marked as reviewed by aph (Reviewer). @theRealAph Thanks for the review. I'll fix the register naming style of Base64.encode intrinisc in that PR as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/1087 From dongbo at openjdk.java.net Tue Nov 10 01:28:57 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 01:28:57 GMT Subject: Integrated: 8255949: AArch64: Add support for vectorized shift right and accumulate In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 03:36:57 GMT, Dong Bo wrote: > This supports missing NEON shift right and accumulate instructions, i.e. SSRA and USRA, for AArch64 backend. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorShiftAccumulate.java` for performance test. > We witness about ~20% with different basic types on Kunpeng916. The JMH results: > Benchmark (count) (seed) Mode Cnt Score Error Units > # before, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 146.259 ? 0.123 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 454.781 ? 3.856 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 938.842 ? 23.288 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 205.493 ? 4.938 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.483 ? 0.309 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 220.847 ? 5.868 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 442.587 ? 6.980 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 936.289 ? 21.458 ns/op > # after shift right and accumulate, Kunpeng 916 > VectorShiftAccumulate.shiftRightAccumulateByte 1028 0 avgt 10 125.586 ? 0.204 ns/op > VectorShiftAccumulate.shiftRightAccumulateInt 1028 0 avgt 10 365.973 ? 6.466 ns/op > VectorShiftAccumulate.shiftRightAccumulateLong 1028 0 avgt 10 804.605 ? 12.336 ns/op > VectorShiftAccumulate.shiftRightAccumulateShort 1028 0 avgt 10 170.123 ? 4.678 ns/op > VectorShiftAccumulate.shiftURightAccumulateByte 1028 0 avgt 10 905.779 ? 0.587 ns/op (not vectorized) > VectorShiftAccumulate.shiftURightAccumulateChar 1028 0 avgt 10 185.799 ? 4.764 ns/op > VectorShiftAccumulate.shiftURightAccumulateInt 1028 0 avgt 10 364.360 ? 6.522 ns/op > VectorShiftAccumulate.shiftURightAccumulateLong 1028 0 avgt 10 800.737 ? 13.735 ns/op > > We checked the shiftURightAccumulateByte test, the performance stays same since it is not vectorized with or without this patch, due to: > src/hotspot/share/opto/vectornode.cpp, line 226: > case Op_URShiftI: > switch (bt) { > case T_BOOLEAN:return Op_URShiftVB; > case T_CHAR: return Op_URShiftVS; > case T_BYTE: > case T_SHORT: return 0; // Vector logical right shift for signed short > // values produces incorrect Java result for > // negative data because java code should convert > // a short value into int value with sign > // extension before a shift. > case T_INT: return Op_URShiftVI; > default: ShouldNotReachHere(); return 0; > } > We also tried the existing vector operation micro urShiftB, i.e.: > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java, line 116 > @Benchmark > public void urShiftB() { > for (int i = 0; i < COUNT; i++) { > resB[i] = (byte) (bytesA[i] >>> 3); > } > } > It is not vectorlized too. Seems it's hard to match JAVA code with the URShiftVB node. This pull request has now been integrated. Changeset: f71f9dc9 Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/f71f9dc9 Stats: 349 lines in 3 files changed: 349 ins; 0 del; 0 mod 8255949: AArch64: Add support for vectorized shift right and accumulate Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1087 From zhaixiang at loongson.cn Tue Nov 10 01:31:31 2020 From: zhaixiang at loongson.cn (Leslie Zhai) Date: Tue, 10 Nov 2020 09:31:31 +0800 Subject: [RFC] Introduce BreakAtCompileId In-Reply-To: References: Message-ID: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> Hi Mikael, Thanks for your teaching! ? 2020?11?10? 06:24, Mikael Vidstedt ??: > Please take this to hotspot-dev-compiler if you want to continue the discussion. > > > FWIW ?continue? does take an argument, would that work for your use case? Yes! It should work for my use case. It doesn't need to make the same wheel any more :) > > (gdb) help continue > Continue program being debugged, after signal or breakpoint. > Usage: continue [N] > If proceeding from breakpoint, a number N may be used as an argument, > which means to set the ignore count of that breakpoint to N - 1 (so that > the breakpoint won't break until the Nth time it is reached). NullCheck also generated signal SIGSEGV: 064 LD Rd, [Rt + #12 (8-bit)] @ indOffset8 # compressed ptr @ loadN ! Field: java/lang/String.value 068 decode_heap_oop Rd, Rt @ decodeHeapOop 070 MOV Rd, [Rt + #12 (8-bit)] @ indOffset8 @ loadRange =>074 NullCheck Rd It needs to press `c` when the base register is indeed NULL by comparing with other CPUs. So continue N -1 might not precise. Thanks, Leslie Zhai > > If non-stop mode is enabled, continue only the current thread, > otherwise all the threads in the program are continued. To > continue all stopped threads in non-stop mode, use the -a option. > Specifying -a and an ignore count simultaneously is an error. > > > Cheers, > Mikael > > >> On Nov 9, 2020, at 7:48 AM, Leslie Zhai wrote: >> >> Hi, >> >> OptoBreakpoint is very useful for debugging C2 compiler when porting some cpu for jdk8u. It is able to `b breakpoint` for breaking at the prolog about for example Compile_id = 2. But when fixed several bugs for C2, it needs to press `c` times and times again for reaching the proper Compile_id. >> >> I `grep` Opto related options. But there is no such one to break at the precise Compile_id. Please teach me if there was :) >> >> So here comes BreakAtCompileId. It is able to break at the precise Compile_id which you want to debug. >> >> >> gdb -ex=r --args $JAVA -Xcomp -XX:+PrintCompilation -XX:BreakAtCompileId=430 -version >> >> >> (gdb) b breakpoint >> Breakpoint 1 at 0x7ffff633b0d0: breakpoint. (133 locations) >> >> 6333 433 3 java.util.ImmutableCollections$List12::get (35 bytes) made not entrant >> >> Thread 2 "java" hit Breakpoint 1, 0x00007ffff711f410 in os::breakpoint() () at /home/zhaixiang/projects >> 452 if (pslash != NULL) { >> (gdb) si >> 0x00007fffe01b46d1 in ?? () >> (gdb) x/22i $pc-44 >> 0x7fffe01b46a5: shl $0x3,%edx >> 0x7fffe01b46a8: movabs $0x800000000,%r11 >> 0x7fffe01b46b2: add %r11,%r10 >> 0x7fffe01b46b5: cmp %r10,%rax >> 0x7fffe01b46b8: jne 0x7fffd87cf920 >> 0x7fffe01b46be: nop >> 0x7fffe01b46bf: nop >> 0x7fffe01b46c0: mov %eax,-0x16000(%rsp) >> 0x7fffe01b46c7: push %rbp >> 0x7fffe01b46c8: sub $0x10,%rsp >> 0x7fffe01b46cc: callq 0x7ffff711f410 <_ZN2os10breakpointEv> >> => 0x7fffe01b46d1: mov 0xc(%rsi),%r11d >> 0x7fffe01b46d5: mov 0x10(%rsi),%r10d >> 0x7fffe01b46d9: cmp %r11d,%r10d >> 0x7fffe01b46dc: je 0x7fffe01b46f6 >> 0x7fffe01b46de: mov $0x1,%eax >> 0x7fffe01b46e3: add $0x10,%rsp >> 0x7fffe01b46e7: pop %rbp >> 0x7fffe01b46e8: cmp 0x128(%r15),%rsp >> 0x7fffe01b46ef: ja 0x7fffe01b46fa >> 0x7fffe01b46f5: retq >> 0x7fffe01b46f6: xor %eax,%eax >> (gdb) i r >> rax 0x7fffe01b46c0 140736953272000 >> rbx 0x8004699e8 34364365288 >> rcx 0x1 1 >> rdx 0x7fffc5068980 140736498928000 >> rsi 0x62a020b40 26474580800 >> rdi 0x1 1 >> rbp 0x7ffff5a668d0 0x7ffff5a668d0 >> rsp 0x7ffff5a66290 0x7ffff5a66290 >> r8 0x80008eeb0 34360323760 >> r9 0x7ffb4cf60 34354810720 >> r10 0x800012488 34359813256 >> r11 0x800000000 34359738368 >> r12 0x0 0 >> r13 0x7292dae5 1922226917 >> r14 0x62a020818 26474579992 >> r15 0x7ffff00271e0 140737220080096 >> rip 0x7fffe01b46d1 0x7fffe01b46d1 >> eflags 0x206 [ PF IF ] >> cs 0x33 51 >> ss 0x2b 43 >> ds 0x0 0 >> es 0x0 0 >> fs 0x0 0 >> gs 0x0 0 >> >> >> Thanks, >> Leslie Zhai >> From ngasson at openjdk.java.net Tue Nov 10 03:24:02 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 10 Nov 2020 03:24:02 GMT Subject: RFR: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls Message-ID: The PR for JDK-8254231 introduces a new assertion in opto/output.cpp to check the current instruction offset against the offset of the call return address reported by ret_addr_offset(). This fails on AArch64 because MachCallRuntimeNode::ret_addr_offset() claims four instructions are generated for a stub call (far branch) but actually it's just one (blr to stub or trampoline). Tested tier1. ------------- Commit messages: - 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls Changes: https://git.openjdk.java.net/jdk/pull/1138/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1138&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256025 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1138.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1138/head:pull/1138 PR: https://git.openjdk.java.net/jdk/pull/1138 From mikael.vidstedt at oracle.com Tue Nov 10 04:45:14 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Mon, 9 Nov 2020 20:45:14 -0800 Subject: [RFC] Introduce BreakAtCompileId In-Reply-To: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> References: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> Message-ID: > On Nov 9, 2020, at 5:31 PM, Leslie Zhai wrote: > > Hi Mikael, > > Thanks for your teaching! > > > ? 2020?11?10? 06:24, Mikael Vidstedt ??: >> Please take this to hotspot-dev-compiler if you want to continue the discussion. >> >> >> FWIW ?continue? does take an argument, would that work for your use case? > > Yes! It should work for my use case. It doesn't need to make the same wheel any more :) > >> >> (gdb) help continue >> Continue program being debugged, after signal or breakpoint. >> Usage: continue [N] >> If proceeding from breakpoint, a number N may be used as an argument, >> which means to set the ignore count of that breakpoint to N - 1 (so that >> the breakpoint won't break until the Nth time it is reached). > > NullCheck also generated signal SIGSEGV: > > > 064 LD Rd, [Rt + #12 (8-bit)] @ indOffset8 # compressed ptr @ loadN ! Field: java/lang/String.value > 068 decode_heap_oop Rd, Rt @ decodeHeapOop > 070 MOV Rd, [Rt + #12 (8-bit)] @ indOffset8 @ loadRange > =>074 NullCheck Rd > > > It needs to press `c` when the base register is indeed NULL by comparing with other CPUs. So continue N -1 might not precise. Indeed. You may or may not be interested in another gdb command: handle SIGSEGV nostop noprint There are of course pros and cons to ignoring that and other signals like it. Cheers, Mikael > > Thanks, > Leslie Zhai > >> >> If non-stop mode is enabled, continue only the current thread, >> otherwise all the threads in the program are continued. To >> continue all stopped threads in non-stop mode, use the -a option. >> Specifying -a and an ignore count simultaneously is an error. >> >> >> Cheers, >> Mikael >> >> >>> On Nov 9, 2020, at 7:48 AM, Leslie Zhai wrote: >>> >>> Hi, >>> >>> OptoBreakpoint is very useful for debugging C2 compiler when porting some cpu for jdk8u. It is able to `b breakpoint` for breaking at the prolog about for example Compile_id = 2. But when fixed several bugs for C2, it needs to press `c` times and times again for reaching the proper Compile_id. >>> >>> I `grep` Opto related options. But there is no such one to break at the precise Compile_id. Please teach me if there was :) >>> >>> So here comes BreakAtCompileId. It is able to break at the precise Compile_id which you want to debug. >>> >>> >>> gdb -ex=r --args $JAVA -Xcomp -XX:+PrintCompilation -XX:BreakAtCompileId=430 -version >>> >>> >>> (gdb) b breakpoint >>> Breakpoint 1 at 0x7ffff633b0d0: breakpoint. (133 locations) >>> >>> 6333 433 3 java.util.ImmutableCollections$List12::get (35 bytes) made not entrant >>> >>> Thread 2 "java" hit Breakpoint 1, 0x00007ffff711f410 in os::breakpoint() () at /home/zhaixiang/projects >>> 452 if (pslash != NULL) { >>> (gdb) si >>> 0x00007fffe01b46d1 in ?? () >>> (gdb) x/22i $pc-44 >>> 0x7fffe01b46a5: shl $0x3,%edx >>> 0x7fffe01b46a8: movabs $0x800000000,%r11 >>> 0x7fffe01b46b2: add %r11,%r10 >>> 0x7fffe01b46b5: cmp %r10,%rax >>> 0x7fffe01b46b8: jne 0x7fffd87cf920 >>> 0x7fffe01b46be: nop >>> 0x7fffe01b46bf: nop >>> 0x7fffe01b46c0: mov %eax,-0x16000(%rsp) >>> 0x7fffe01b46c7: push %rbp >>> 0x7fffe01b46c8: sub $0x10,%rsp >>> 0x7fffe01b46cc: callq 0x7ffff711f410 <_ZN2os10breakpointEv> >>> => 0x7fffe01b46d1: mov 0xc(%rsi),%r11d >>> 0x7fffe01b46d5: mov 0x10(%rsi),%r10d >>> 0x7fffe01b46d9: cmp %r11d,%r10d >>> 0x7fffe01b46dc: je 0x7fffe01b46f6 >>> 0x7fffe01b46de: mov $0x1,%eax >>> 0x7fffe01b46e3: add $0x10,%rsp >>> 0x7fffe01b46e7: pop %rbp >>> 0x7fffe01b46e8: cmp 0x128(%r15),%rsp >>> 0x7fffe01b46ef: ja 0x7fffe01b46fa >>> 0x7fffe01b46f5: retq >>> 0x7fffe01b46f6: xor %eax,%eax >>> (gdb) i r >>> rax 0x7fffe01b46c0 140736953272000 >>> rbx 0x8004699e8 34364365288 >>> rcx 0x1 1 >>> rdx 0x7fffc5068980 140736498928000 >>> rsi 0x62a020b40 26474580800 >>> rdi 0x1 1 >>> rbp 0x7ffff5a668d0 0x7ffff5a668d0 >>> rsp 0x7ffff5a66290 0x7ffff5a66290 >>> r8 0x80008eeb0 34360323760 >>> r9 0x7ffb4cf60 34354810720 >>> r10 0x800012488 34359813256 >>> r11 0x800000000 34359738368 >>> r12 0x0 0 >>> r13 0x7292dae5 1922226917 >>> r14 0x62a020818 26474579992 >>> r15 0x7ffff00271e0 140737220080096 >>> rip 0x7fffe01b46d1 0x7fffe01b46d1 >>> eflags 0x206 [ PF IF ] >>> cs 0x33 51 >>> ss 0x2b 43 >>> ds 0x0 0 >>> es 0x0 0 >>> fs 0x0 0 >>> gs 0x0 0 >>> >>> >>> Thanks, >>> Leslie Zhai >>> > From zhaixiang at loongson.cn Tue Nov 10 06:07:47 2020 From: zhaixiang at loongson.cn (Leslie Zhai) Date: Tue, 10 Nov 2020 14:07:47 +0800 Subject: [RFC] Introduce BreakAtCompileId In-Reply-To: References: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> Message-ID: <6c3f9e9f-a83e-98d7-6cc6-2afa1d46c62b@loongson.cn> ? 2020?11?10? 12:45, Mikael Vidstedt ??: > >> On Nov 9, 2020, at 5:31 PM, Leslie Zhai wrote: >> >> Hi Mikael, >> >> Thanks for your teaching! >> >> >> ? 2020?11?10? 06:24, Mikael Vidstedt ??: >>> Please take this to hotspot-dev-compiler if you want to continue the discussion. >>> >>> >>> FWIW ?continue? does take an argument, would that work for your use case? >> Yes! It should work for my use case. It doesn't need to make the same wheel any more :) >> >>> (gdb) help continue >>> Continue program being debugged, after signal or breakpoint. >>> Usage: continue [N] >>> If proceeding from breakpoint, a number N may be used as an argument, >>> which means to set the ignore count of that breakpoint to N - 1 (so that >>> the breakpoint won't break until the Nth time it is reached). >> NullCheck also generated signal SIGSEGV: >> >> >> 064 LD Rd, [Rt + #12 (8-bit)] @ indOffset8 # compressed ptr @ loadN ! Field: java/lang/String.value >> 068 decode_heap_oop Rd, Rt @ decodeHeapOop >> 070 MOV Rd, [Rt + #12 (8-bit)] @ indOffset8 @ loadRange >> =>074 NullCheck Rd >> >> >> It needs to press `c` when the base register is indeed NULL by comparing with other CPUs. So continue N -1 might not precise. > Indeed. You may or may not be interested in another gdb command: handle SIGSEGV nostop noprint But some NullCheck should not be NULL. So it couldn't echo `handle SIGSEGV nostop noprint` >> ~/.gdbinit I fixed a NullCheck related bug for C2 compiler this morning :) Cheers, Leslie Zhai > > There are of course pros and cons to ignoring that and other signals like it. > > Cheers, > Mikael > >> Thanks, >> Leslie Zhai >> >>> If non-stop mode is enabled, continue only the current thread, >>> otherwise all the threads in the program are continued. To >>> continue all stopped threads in non-stop mode, use the -a option. >>> Specifying -a and an ignore count simultaneously is an error. >>> >>> >>> Cheers, >>> Mikael >>> >>> >>>> On Nov 9, 2020, at 7:48 AM, Leslie Zhai wrote: >>>> >>>> Hi, >>>> >>>> OptoBreakpoint is very useful for debugging C2 compiler when porting some cpu for jdk8u. It is able to `b breakpoint` for breaking at the prolog about for example Compile_id = 2. But when fixed several bugs for C2, it needs to press `c` times and times again for reaching the proper Compile_id. >>>> >>>> I `grep` Opto related options. But there is no such one to break at the precise Compile_id. Please teach me if there was :) >>>> >>>> So here comes BreakAtCompileId. It is able to break at the precise Compile_id which you want to debug. >>>> >>>> >>>> gdb -ex=r --args $JAVA -Xcomp -XX:+PrintCompilation -XX:BreakAtCompileId=430 -version >>>> >>>> >>>> (gdb) b breakpoint >>>> Breakpoint 1 at 0x7ffff633b0d0: breakpoint. (133 locations) >>>> >>>> 6333 433 3 java.util.ImmutableCollections$List12::get (35 bytes) made not entrant >>>> >>>> Thread 2 "java" hit Breakpoint 1, 0x00007ffff711f410 in os::breakpoint() () at /home/zhaixiang/projects >>>> 452 if (pslash != NULL) { >>>> (gdb) si >>>> 0x00007fffe01b46d1 in ?? () >>>> (gdb) x/22i $pc-44 >>>> 0x7fffe01b46a5: shl $0x3,%edx >>>> 0x7fffe01b46a8: movabs $0x800000000,%r11 >>>> 0x7fffe01b46b2: add %r11,%r10 >>>> 0x7fffe01b46b5: cmp %r10,%rax >>>> 0x7fffe01b46b8: jne 0x7fffd87cf920 >>>> 0x7fffe01b46be: nop >>>> 0x7fffe01b46bf: nop >>>> 0x7fffe01b46c0: mov %eax,-0x16000(%rsp) >>>> 0x7fffe01b46c7: push %rbp >>>> 0x7fffe01b46c8: sub $0x10,%rsp >>>> 0x7fffe01b46cc: callq 0x7ffff711f410 <_ZN2os10breakpointEv> >>>> => 0x7fffe01b46d1: mov 0xc(%rsi),%r11d >>>> 0x7fffe01b46d5: mov 0x10(%rsi),%r10d >>>> 0x7fffe01b46d9: cmp %r11d,%r10d >>>> 0x7fffe01b46dc: je 0x7fffe01b46f6 >>>> 0x7fffe01b46de: mov $0x1,%eax >>>> 0x7fffe01b46e3: add $0x10,%rsp >>>> 0x7fffe01b46e7: pop %rbp >>>> 0x7fffe01b46e8: cmp 0x128(%r15),%rsp >>>> 0x7fffe01b46ef: ja 0x7fffe01b46fa >>>> 0x7fffe01b46f5: retq >>>> 0x7fffe01b46f6: xor %eax,%eax >>>> (gdb) i r >>>> rax 0x7fffe01b46c0 140736953272000 >>>> rbx 0x8004699e8 34364365288 >>>> rcx 0x1 1 >>>> rdx 0x7fffc5068980 140736498928000 >>>> rsi 0x62a020b40 26474580800 >>>> rdi 0x1 1 >>>> rbp 0x7ffff5a668d0 0x7ffff5a668d0 >>>> rsp 0x7ffff5a66290 0x7ffff5a66290 >>>> r8 0x80008eeb0 34360323760 >>>> r9 0x7ffb4cf60 34354810720 >>>> r10 0x800012488 34359813256 >>>> r11 0x800000000 34359738368 >>>> r12 0x0 0 >>>> r13 0x7292dae5 1922226917 >>>> r14 0x62a020818 26474579992 >>>> r15 0x7ffff00271e0 140737220080096 >>>> rip 0x7fffe01b46d1 0x7fffe01b46d1 >>>> eflags 0x206 [ PF IF ] >>>> cs 0x33 51 >>>> ss 0x2b 43 >>>> ds 0x0 0 >>>> es 0x0 0 >>>> fs 0x0 0 >>>> gs 0x0 0 >>>> >>>> >>>> Thanks, >>>> Leslie Zhai >>>> From shade at openjdk.java.net Tue Nov 10 06:32:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 06:32:57 GMT Subject: Integrated: 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:31:16 GMT, Aleksey Shipilev wrote: > This seems to happen only under release bits, and only with Shenandoah. This shows up in tier1 after JDK-8255762, but only in release builds. The section in question is "MethodHandles adapters". > > $ CONF=linux-x86_64-server-release make run-test TEST=java/lang/invoke/6987555/Test6987555.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > STDOUT: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (codeBuffer.cpp:971), pid=907468, tid=907470 > # guarantee(sect->end() <= tend) failed: sanity: MethodHandles adapters, 0x00007f76902a9da9, 0x00007f76902a9d38 > # > # JRE version: (16.0) (build ) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.shade.jdk, interpreted mode, sharing, compressed oops, shenandoah gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x58f47f] CodeBuffer::verify_section_allocation()+0x1ff > > Additional testing: > - [x] `java/lang/invoke` tests on Linux x86_64 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux x86_32 {fastdebug,release} with Shenandoah > - [x] `java/lang/invoke` tests on Linux AArch64 {fastdebug,release} with Shenandoah This pull request has now been integrated. Changeset: 01567b51 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/01567b51 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8256036: Shenandoah: MethodHandles adapters section overflows after JDK-8255762 Reviewed-by: jiefu, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/1121 From xxinliu at amazon.com Tue Nov 10 07:49:12 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 10 Nov 2020 07:49:12 +0000 Subject: [RFC] Introduce BreakAtCompileId In-Reply-To: <6c3f9e9f-a83e-98d7-6cc6-2afa1d46c62b@loongson.cn> References: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> <6c3f9e9f-a83e-98d7-6cc6-2afa1d46c62b@loongson.cn> Message-ID: <91A0558B-03A2-44E6-9455-5DAF39A41460@amazon.com> Hi, Leslie, Actually, there're 2 compile_id-specific flags -- CIBreakAt and CIBreakAtOSR in global.hpp They are now not in use due to JDK-8255216. In my understanding, compiler driver simply ignores them. If they are in effect, I believe you should stop at both compiler-and-run-time. May I know what's the meaning of RFC in the subject? Are you proposing a new feature? I've seen some RFC emails in hotspot maillists, but I am still not familiar with the formal process of a proposal. I know JEP, but it's more formal and for bigger project, right? Would you mind sharing more information about the RFC process? Thanks, --lx ?On 11/9/20, 10:10 PM, "hotspot-compiler-dev on behalf of Leslie Zhai" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. ? 2020?11?10? 12:45, Mikael Vidstedt ??: > >> On Nov 9, 2020, at 5:31 PM, Leslie Zhai wrote: >> >> Hi Mikael, >> >> Thanks for your teaching! >> >> >> ? 2020?11?10? 06:24, Mikael Vidstedt ??: >>> Please take this to hotspot-dev-compiler if you want to continue the discussion. >>> >>> >>> FWIW ?continue? does take an argument, would that work for your use case? >> Yes! It should work for my use case. It doesn't need to make the same wheel any more :) >> >>> (gdb) help continue >>> Continue program being debugged, after signal or breakpoint. >>> Usage: continue [N] >>> If proceeding from breakpoint, a number N may be used as an argument, >>> which means to set the ignore count of that breakpoint to N - 1 (so that >>> the breakpoint won't break until the Nth time it is reached). >> NullCheck also generated signal SIGSEGV: >> >> >> 064 LD Rd, [Rt + #12 (8-bit)] @ indOffset8 # compressed ptr @ loadN ! Field: java/lang/String.value >> 068 decode_heap_oop Rd, Rt @ decodeHeapOop >> 070 MOV Rd, [Rt + #12 (8-bit)] @ indOffset8 @ loadRange >> =>074 NullCheck Rd >> >> >> It needs to press `c` when the base register is indeed NULL by comparing with other CPUs. So continue N -1 might not precise. > Indeed. You may or may not be interested in another gdb command: handle SIGSEGV nostop noprint But some NullCheck should not be NULL. So it couldn't echo `handle SIGSEGV nostop noprint` >> ~/.gdbinit I fixed a NullCheck related bug for C2 compiler this morning :) Cheers, Leslie Zhai > > There are of course pros and cons to ignoring that and other signals like it. > > Cheers, > Mikael > >> Thanks, >> Leslie Zhai >> >>> If non-stop mode is enabled, continue only the current thread, >>> otherwise all the threads in the program are continued. To >>> continue all stopped threads in non-stop mode, use the -a option. >>> Specifying -a and an ignore count simultaneously is an error. >>> >>> >>> Cheers, >>> Mikael >>> >>> >>>> On Nov 9, 2020, at 7:48 AM, Leslie Zhai wrote: >>>> >>>> Hi, >>>> >>>> OptoBreakpoint is very useful for debugging C2 compiler when porting some cpu for jdk8u. It is able to `b breakpoint` for breaking at the prolog about for example Compile_id = 2. But when fixed several bugs for C2, it needs to press `c` times and times again for reaching the proper Compile_id. >>>> >>>> I `grep` Opto related options. But there is no such one to break at the precise Compile_id. Please teach me if there was :) >>>> >>>> So here comes BreakAtCompileId. It is able to break at the precise Compile_id which you want to debug. >>>> >>>> >>>> gdb -ex=r --args $JAVA -Xcomp -XX:+PrintCompilation -XX:BreakAtCompileId=430 -version >>>> >>>> >>>> (gdb) b breakpoint >>>> Breakpoint 1 at 0x7ffff633b0d0: breakpoint. (133 locations) >>>> >>>> 6333 433 3 java.util.ImmutableCollections$List12::get (35 bytes) made not entrant >>>> >>>> Thread 2 "java" hit Breakpoint 1, 0x00007ffff711f410 in os::breakpoint() () at /home/zhaixiang/projects >>>> 452 if (pslash != NULL) { >>>> (gdb) si >>>> 0x00007fffe01b46d1 in ?? () >>>> (gdb) x/22i $pc-44 >>>> 0x7fffe01b46a5: shl $0x3,%edx >>>> 0x7fffe01b46a8: movabs $0x800000000,%r11 >>>> 0x7fffe01b46b2: add %r11,%r10 >>>> 0x7fffe01b46b5: cmp %r10,%rax >>>> 0x7fffe01b46b8: jne 0x7fffd87cf920 >>>> 0x7fffe01b46be: nop >>>> 0x7fffe01b46bf: nop >>>> 0x7fffe01b46c0: mov %eax,-0x16000(%rsp) >>>> 0x7fffe01b46c7: push %rbp >>>> 0x7fffe01b46c8: sub $0x10,%rsp >>>> 0x7fffe01b46cc: callq 0x7ffff711f410 <_ZN2os10breakpointEv> >>>> => 0x7fffe01b46d1: mov 0xc(%rsi),%r11d >>>> 0x7fffe01b46d5: mov 0x10(%rsi),%r10d >>>> 0x7fffe01b46d9: cmp %r11d,%r10d >>>> 0x7fffe01b46dc: je 0x7fffe01b46f6 >>>> 0x7fffe01b46de: mov $0x1,%eax >>>> 0x7fffe01b46e3: add $0x10,%rsp >>>> 0x7fffe01b46e7: pop %rbp >>>> 0x7fffe01b46e8: cmp 0x128(%r15),%rsp >>>> 0x7fffe01b46ef: ja 0x7fffe01b46fa >>>> 0x7fffe01b46f5: retq >>>> 0x7fffe01b46f6: xor %eax,%eax >>>> (gdb) i r >>>> rax 0x7fffe01b46c0 140736953272000 >>>> rbx 0x8004699e8 34364365288 >>>> rcx 0x1 1 >>>> rdx 0x7fffc5068980 140736498928000 >>>> rsi 0x62a020b40 26474580800 >>>> rdi 0x1 1 >>>> rbp 0x7ffff5a668d0 0x7ffff5a668d0 >>>> rsp 0x7ffff5a66290 0x7ffff5a66290 >>>> r8 0x80008eeb0 34360323760 >>>> r9 0x7ffb4cf60 34354810720 >>>> r10 0x800012488 34359813256 >>>> r11 0x800000000 34359738368 >>>> r12 0x0 0 >>>> r13 0x7292dae5 1922226917 >>>> r14 0x62a020818 26474579992 >>>> r15 0x7ffff00271e0 140737220080096 >>>> rip 0x7fffe01b46d1 0x7fffe01b46d1 >>>> eflags 0x206 [ PF IF ] >>>> cs 0x33 51 >>>> ss 0x2b 43 >>>> ds 0x0 0 >>>> es 0x0 0 >>>> fs 0x0 0 >>>> gs 0x0 0 >>>> >>>> >>>> Thanks, >>>> Leslie Zhai >>>> From chagedorn at openjdk.java.net Tue Nov 10 08:18:55 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 10 Nov 2020 08:18:55 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v3] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 16:16:01 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments and invariant selection in offset_plus_k > > Updated changes looks good @vnkozlov Thanks a lot for the careful review! ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From zhaixiang at loongson.cn Tue Nov 10 08:21:23 2020 From: zhaixiang at loongson.cn (Leslie Zhai) Date: Tue, 10 Nov 2020 16:21:23 +0800 Subject: [RFC] Introduce BreakAtCompileId In-Reply-To: <91A0558B-03A2-44E6-9455-5DAF39A41460@amazon.com> References: <684da2a3-9d2b-b0e8-c3f5-6c1f8a2556b7@loongson.cn> <6c3f9e9f-a83e-98d7-6cc6-2afa1d46c62b@loongson.cn> <91A0558B-03A2-44E6-9455-5DAF39A41460@amazon.com> Message-ID: Hi Liu, Thanks for your kind response! ? 2020?11?10? 15:49, Liu, Xin ??: > Hi, Leslie, > > Actually, there're 2 compile_id-specific flags -- CIBreakAt and CIBreakAtOSR in global.hpp > They are now not in use due to JDK-8255216. In my understanding, compiler driver simply ignores them. > If they are in effect, I believe you should stop at both compiler-and-run-time. Thanks for your information about JDK-8255216. We are debugging the C2 and disabled C1 compiler for some cpu. So perhaps it doesn't need to consider about the break at OSR or sort of things. Please point out my fault :) > > May I know what's the meaning of RFC in the subject? Are you proposing a new feature? > I've seen some RFC emails in hotspot maillists, but I am still not familiar with the formal process of a proposal. I know JEP, but it's more formal and for bigger project, right? > Would you mind sharing more information about the RFC process? Ao Qi, our leader, suggested me to request for comments about the BreakAtC2CompileId. Actually `BreakAtCompileId` is not precise. And in the very beginning, java -Xcomp -XX:+OptoBreakpoint -version, it was able to debug the first, second, less than 10 compiler_id methods for C2 compiler. But after fixed several bugs, and specially it is not able to use `handle SIGSEGV nostop noprint` due to some NullCheck related bugs. So I introduced the `BreakAtCompileId` for inner debugging purpose. Sorry that we are so busy about debugging the execute thread for C2 compiler. So that is all the information I could share. Thanks, Leslie Zhai > > Thanks, > --lx > > ?On 11/9/20, 10:10 PM, "hotspot-compiler-dev on behalf of Leslie Zhai" wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > ? 2020?11?10? 12:45, Mikael Vidstedt ??: > > > >> On Nov 9, 2020, at 5:31 PM, Leslie Zhai wrote: > >> > >> Hi Mikael, > >> > >> Thanks for your teaching! > >> > >> > >> ? 2020?11?10? 06:24, Mikael Vidstedt ??: > >>> Please take this to hotspot-dev-compiler if you want to continue the discussion. > >>> > >>> > >>> FWIW ?continue? does take an argument, would that work for your use case? > >> Yes! It should work for my use case. It doesn't need to make the same wheel any more :) > >> > >>> (gdb) help continue > >>> Continue program being debugged, after signal or breakpoint. > >>> Usage: continue [N] > >>> If proceeding from breakpoint, a number N may be used as an argument, > >>> which means to set the ignore count of that breakpoint to N - 1 (so that > >>> the breakpoint won't break until the Nth time it is reached). > >> NullCheck also generated signal SIGSEGV: > >> > >> > >> 064 LD Rd, [Rt + #12 (8-bit)] @ indOffset8 # compressed ptr @ loadN ! Field: java/lang/String.value > >> 068 decode_heap_oop Rd, Rt @ decodeHeapOop > >> 070 MOV Rd, [Rt + #12 (8-bit)] @ indOffset8 @ loadRange > >> =>074 NullCheck Rd > >> > >> > >> It needs to press `c` when the base register is indeed NULL by comparing with other CPUs. So continue N -1 might not precise. > > Indeed. You may or may not be interested in another gdb command: handle SIGSEGV nostop noprint > > But some NullCheck should not be NULL. So it couldn't echo `handle > SIGSEGV nostop noprint` >> ~/.gdbinit > > I fixed a NullCheck related bug for C2 compiler this morning :) > > Cheers, > Leslie Zhai > > > > > There are of course pros and cons to ignoring that and other signals like it. > > > > Cheers, > > Mikael > > > >> Thanks, > >> Leslie Zhai > >> > >>> If non-stop mode is enabled, continue only the current thread, > >>> otherwise all the threads in the program are continued. To > >>> continue all stopped threads in non-stop mode, use the -a option. > >>> Specifying -a and an ignore count simultaneously is an error. > >>> > >>> > >>> Cheers, > >>> Mikael > >>> > >>> > >>>> On Nov 9, 2020, at 7:48 AM, Leslie Zhai wrote: > >>>> > >>>> Hi, > >>>> > >>>> OptoBreakpoint is very useful for debugging C2 compiler when porting some cpu for jdk8u. It is able to `b breakpoint` for breaking at the prolog about for example Compile_id = 2. But when fixed several bugs for C2, it needs to press `c` times and times again for reaching the proper Compile_id. > >>>> > >>>> I `grep` Opto related options. But there is no such one to break at the precise Compile_id. Please teach me if there was :) > >>>> > >>>> So here comes BreakAtCompileId. It is able to break at the precise Compile_id which you want to debug. > >>>> > >>>> > >>>> gdb -ex=r --args $JAVA -Xcomp -XX:+PrintCompilation -XX:BreakAtCompileId=430 -version > >>>> > >>>> > >>>> (gdb) b breakpoint > >>>> Breakpoint 1 at 0x7ffff633b0d0: breakpoint. (133 locations) > >>>> > >>>> 6333 433 3 java.util.ImmutableCollections$List12::get (35 bytes) made not entrant > >>>> > >>>> Thread 2 "java" hit Breakpoint 1, 0x00007ffff711f410 in os::breakpoint() () at /home/zhaixiang/projects > >>>> 452 if (pslash != NULL) { > >>>> (gdb) si > >>>> 0x00007fffe01b46d1 in ?? () > >>>> (gdb) x/22i $pc-44 > >>>> 0x7fffe01b46a5: shl $0x3,%edx > >>>> 0x7fffe01b46a8: movabs $0x800000000,%r11 > >>>> 0x7fffe01b46b2: add %r11,%r10 > >>>> 0x7fffe01b46b5: cmp %r10,%rax > >>>> 0x7fffe01b46b8: jne 0x7fffd87cf920 > >>>> 0x7fffe01b46be: nop > >>>> 0x7fffe01b46bf: nop > >>>> 0x7fffe01b46c0: mov %eax,-0x16000(%rsp) > >>>> 0x7fffe01b46c7: push %rbp > >>>> 0x7fffe01b46c8: sub $0x10,%rsp > >>>> 0x7fffe01b46cc: callq 0x7ffff711f410 <_ZN2os10breakpointEv> > >>>> => 0x7fffe01b46d1: mov 0xc(%rsi),%r11d > >>>> 0x7fffe01b46d5: mov 0x10(%rsi),%r10d > >>>> 0x7fffe01b46d9: cmp %r11d,%r10d > >>>> 0x7fffe01b46dc: je 0x7fffe01b46f6 > >>>> 0x7fffe01b46de: mov $0x1,%eax > >>>> 0x7fffe01b46e3: add $0x10,%rsp > >>>> 0x7fffe01b46e7: pop %rbp > >>>> 0x7fffe01b46e8: cmp 0x128(%r15),%rsp > >>>> 0x7fffe01b46ef: ja 0x7fffe01b46fa > >>>> 0x7fffe01b46f5: retq > >>>> 0x7fffe01b46f6: xor %eax,%eax > >>>> (gdb) i r > >>>> rax 0x7fffe01b46c0 140736953272000 > >>>> rbx 0x8004699e8 34364365288 > >>>> rcx 0x1 1 > >>>> rdx 0x7fffc5068980 140736498928000 > >>>> rsi 0x62a020b40 26474580800 > >>>> rdi 0x1 1 > >>>> rbp 0x7ffff5a668d0 0x7ffff5a668d0 > >>>> rsp 0x7ffff5a66290 0x7ffff5a66290 > >>>> r8 0x80008eeb0 34360323760 > >>>> r9 0x7ffb4cf60 34354810720 > >>>> r10 0x800012488 34359813256 > >>>> r11 0x800000000 34359738368 > >>>> r12 0x0 0 > >>>> r13 0x7292dae5 1922226917 > >>>> r14 0x62a020818 26474579992 > >>>> r15 0x7ffff00271e0 140737220080096 > >>>> rip 0x7fffe01b46d1 0x7fffe01b46d1 > >>>> eflags 0x206 [ PF IF ] > >>>> cs 0x33 51 > >>>> ss 0x2b 43 > >>>> ds 0x0 0 > >>>> es 0x0 0 > >>>> fs 0x0 0 > >>>> gs 0x0 0 > >>>> > >>>> > >>>> Thanks, > >>>> Leslie Zhai > >>>> > > From rcastanedalo at openjdk.java.net Tue Nov 10 08:30:08 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Nov 2020 08:30:08 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v4] In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: > Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Use 'hash_find_insert' to look for existing ConvI2L nodes - Merge master - Update tests Simplify JVM arguments and run each test case 100000 times to still trigger C2. Use randomization to avoid constant propagation in C2. Increase the load of the stress tests and their timeout to 30s to further reduce the risk of false positives. - Merge master - Generalize the fix to handle any input where AddIs are used multiple times by other AddIs, which could also lead to an exponential number of calls to ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove "hook" node solution introduced in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is called within iterative GVN using phase->is_IterGVN() rather than can_reshape, for clarity. Merge all tests into a single class. Reimplement the microbenchmark as a test case that should time out in case of a combinatorial explosion. Add a second similar microbenchmark that demonstrates the need for this generalization. - Merge master - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically equivalent ConvI2L nodes. This avoids an exponential number of calls to ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. Also add a microbenchmark that exercises the special case, for performance regression testing. ------------- Changes: https://git.openjdk.java.net/jdk/pull/727/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=03 Stats: 190 lines in 2 files changed: 180 ins; 4 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/727/head:pull/727 PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Tue Nov 10 08:39:56 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Nov 2020 08:39:56 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v3] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> <2zse5jr98dv4fKliUBNS-TKg1-XPe5zwBeJMhPXN3Ic=.aa4a0672-f863-4358-a8f8-3237b21575dc@github.com> Message-ID: On Thu, 22 Oct 2020 17:03:13 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Update tests >> >> Simplify JVM arguments and run each test case 100000 times to still trigger >> C2. Use randomization to avoid constant propagation in C2. Increase the load of >> the stress tests and their timeout to 30s to further reduce the risk of false >> positives. > > Good. Thanks! I just simplified the PR by using IGVN's hash table mechanism rather than custom logic to find existing ConvI2L nodes (thanks to Vladimir Ivanov for the suggestion). The new version is tested on tier1-3. Please review. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From aph at openjdk.java.net Tue Nov 10 08:52:54 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 10 Nov 2020 08:52:54 GMT Subject: RFR: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 03:19:23 GMT, Nick Gasson wrote: > The PR for JDK-8254231 introduces a new assertion in opto/output.cpp to > check the current instruction offset against the offset of the call > return address reported by ret_addr_offset(). This fails on AArch64 > because MachCallRuntimeNode::ret_addr_offset() claims four instructions > are generated for a stub call (far branch) but actually it's just > one (blr to stub or trampoline). > > Tested tier1. So here's a weird thing: this code has been wrong forever, but apparently it never mattered. I wonder why it didn't break anything before now. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1138 From jiefu at openjdk.java.net Tue Nov 10 09:22:54 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 10 Nov 2020 09:22:54 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad [v2] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:22:13 GMT, Aleksey Shipilev wrote: > I like this, thanks! Make sure compiler folks approve this. Thanks @shipilev for your review. May I get one more review from the hotspot compiler team? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1107 From vlivanov at openjdk.java.net Tue Nov 10 09:56:02 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 09:56:02 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v4] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Tue, 10 Nov 2020 08:30:08 GMT, Roberto Casta?eda Lozano wrote: >> Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use 'hash_find_insert' to look for existing ConvI2L nodes > - Merge master > - Update tests > > Simplify JVM arguments and run each test case 100000 times to still trigger > C2. Use randomization to avoid constant propagation in C2. Increase the load of > the stress tests and their timeout to 30s to further reduce the risk of false > positives. > - Merge master > - Generalize the fix to handle any input where AddIs are used multiple times by > other AddIs, which could also lead to an exponential number of calls to > ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if > possible rather than eagerly creating new ones and (2) postponing the > optimization of newly created ConvI2Ls. Remove "hook" node solution introduced > in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is > called within iterative GVN using phase->is_IterGVN() rather than can_reshape, > for clarity. > > Merge all tests into a single class. Reimplement the microbenchmark as a test > case that should time out in case of a combinatorial explosion. Add a second > similar microbenchmark that demonstrates the need for this generalization. > - Merge master > - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially > > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within > ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of > AddL from a single ConvI2L node rather than creating two semantically equivalent > ConvI2L nodes. This avoids an exponential number of calls to > ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the > optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero > during parsing anyway. Add a set of regression tests for the transformation that > cover different shapes of AddI subgraphs. Also add a microbenchmark that > exercises the special case, for performance regression testing. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Tue Nov 10 10:02:55 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Nov 2020 10:02:55 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v4] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Tue, 10 Nov 2020 09:53:33 GMT, Vladimir Ivanov wrote: > Looks good. Thanks for reviewing Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From jvernee at openjdk.java.net Tue Nov 10 11:06:01 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 10 Nov 2020 11:06:01 GMT Subject: RFR: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 08:49:45 GMT, Andrew Haley wrote: >> The PR for JDK-8254231 introduces a new assertion in opto/output.cpp to >> check the current instruction offset against the offset of the call >> return address reported by ret_addr_offset(). This fails on AArch64 >> because MachCallRuntimeNode::ret_addr_offset() claims four instructions >> are generated for a stub call (far branch) but actually it's just >> one (blr to stub or trampoline). >> >> Tested tier1. > > So here's a weird thing: this code has been wrong forever, but apparently it never mattered. I wonder why it didn't break anything before now. @theRealAph The bug that this catches manifests when the reported return offset lines up _exactly_ with that of a later call. In that case two calls will use the same PC for their oop map, and one will be overwritten. Maybe we've been lucky that this is never actually the case for ARM, but I'd imaging the oop map annotation in PrintAssembly output might be in the wrong place, and the oop map offset it prints should be wrong. But, as long as the return PC it reports is unique, I don't think it will cause an immediate problem to functionality. ------------- PR: https://git.openjdk.java.net/jdk/pull/1138 From neliasso at openjdk.java.net Tue Nov 10 11:28:58 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 10 Nov 2020 11:28:58 GMT Subject: Integrated: 8255011: [TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 08:14:29 GMT, Nils Eliasson wrote: > Hi, > > This patch updates the code cache stress tests. I haven't been able to reproduce or retrieve a core file. > > What I can see is that the tests provokes compile storms, and that the single C1 thread (on a 4CPU system) sometimes has trouble keeping up. A factor may also be that the tests run time scale with the timeout time - so that the time allotted as margin before the timeout is only 20% of the total runtime. Combining this with Xcomp, and that the test may run concurrently with other stress tests, it is reasonable that a timeout may occur. > > I suggest to cap the tests to 60 seconds of testing. I've experimented with meassuring how much work is done and use that as a metric - but the different tests that use the CodeCacheStressRunner has completely different profiles. > > In UnexpectedDeoptimizationAllTest.java I have adjusted the sleep time to 100 millis between the invalidations of the entire code cache. > > In UnexpectedDeoptimizationTest.java I have added a sleep of 10 miilis between deoptimizing parts of the stack. The idea is to give the stack time to growth a bit before the next deoptimization. Otherwise the test might end up running mostly in the interpreter. > > Please review, > Nils Eliasson This pull request has now been integrated. Changeset: e281b135 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/e281b135 Stats: 12 lines in 2 files changed: 8 ins; 2 del; 2 mod 8255011: [TESTBUG] compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java timed out Change CodeCacheStressRunner to have a 60 second test time Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/1030 From mdoerr at openjdk.java.net Tue Nov 10 11:52:55 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 10 Nov 2020 11:52:55 GMT Subject: Integrated: 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 16:46:43 GMT, Martin Doerr wrote: > JDK-8237363 introduced "assert(Universe::heap()->is_in..." check in CompressedOops::decode functions. > This assertion restricts the usability of the decode functions. There are periods of time (during GC) at which we can't use " Universe::heap()->is_in" because the pointer gets switched between old and new location, but "Universe::heap()->is_in" is not yet accurate. > PPC64 code has a usage of CompressedOops::decode which is affected by this problem. (It was observed with SerialGC, see JBS.) > We could also use a weaker assertion, but seems like other people value the stronger assertion more. So I suggest to use decode_raw as workaround for PPC64. This pull request has now been integrated. Changeset: 9d07259f Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/9d07259f Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8255598: [PPC64] assert(Universe::heap()->is_in(result)) failed: object not in heap Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1078 From vlivanov at openjdk.java.net Tue Nov 10 12:39:14 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:39:14 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails [v2] In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1124/files - new: https://git.openjdk.java.net/jdk/pull/1124/files/a41c82cd..89550ba2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1124/head:pull/1124 PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Tue Nov 10 12:39:15 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:39:15 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails [v2] In-Reply-To: References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 21:03:05 GMT, Daniel D. Daugherty wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > Looks good. Your call on whether to add the comment I proposed. Thanks for the reviews, Vladimir and Dan! > src/hotspot/share/runtime/basicLock.cpp line 34: > >> 32: markWord mark_word = displaced_header(); >> 33: if (mark_word.value() != 0) { >> 34: bool print_monitor_info = (owner != NULL) && (owner->mark() == markWord::from_pointer((void*)this)); > > Could use a comment between L33 and L34: > // Print monitor info if there's an owning oop and it refers to this BasicLock. Yes, I'll incorporate the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Tue Nov 10 12:42:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:42:57 GMT Subject: RFR: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:12:39 GMT, Dean Long wrote: >> Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. >> >> The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. >> >> The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. >> >> Testing: >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 >> >> Thanks! > > Marked as reviewed by dlong (Reviewer). Thanks for the reviews, Claes, Paul, Sandhya, and Dean. ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From vlivanov at openjdk.java.net Tue Nov 10 12:42:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:42:58 GMT Subject: Integrated: 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 14:57:09 GMT, Vladimir Ivanov wrote: > Floating-point min/max operations on vectors intermittently produce wrong results for NaN values. > > The problem boils down to a missing "TEMP dst" declaration on AVX512-related AD instruction. Without it`dst` vector register may match one of the input registers and it breaks the computation since it assumes all the used registers are different. > > The fix adds missing effect and also introduces additional asserts to catch similar problems in the future. > > Testing: > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! This pull request has now been integrated. Changeset: e6df13e6 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/e6df13e6 Stats: 8 lines in 2 files changed: 7 ins; 0 del; 1 mod 8256054: C2: Floating-point min/max operations on vectors intermittently produce wrong results for NaN values Reviewed-by: redestad, psandoz, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/1128 From vlivanov at openjdk.java.net Tue Nov 10 12:44:56 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:44:56 GMT Subject: Integrated: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: <0MlWpgsM3xXFe-OlCzKhH1rr9TsTN8WhIWWyddoZLXI=.0156927b-9657-45b9-a5e1-6add0f2d35ec@github.com> On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's not necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 This pull request has now been integrated. Changeset: 3455fa9b Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/3455fa9b Stats: 31 lines in 7 files changed: 16 ins; 0 del; 15 mod 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails Reviewed-by: kvn, dcubed ------------- PR: https://git.openjdk.java.net/jdk/pull/1124 From thartmann at openjdk.java.net Tue Nov 10 13:07:01 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 10 Nov 2020 13:07:01 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad [v2] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:26:10 GMT, Jie Fu wrote: >> src/hotspot/share/adlc/Test/i486.ad is empty. >> It might be better to remove it. >> Thanks. > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Update the comments Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1107 From thartmann at openjdk.java.net Tue Nov 10 13:17:59 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 10 Nov 2020 13:17:59 GMT Subject: RFR: 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 08:26:17 GMT, Roland Westrelin wrote: > PhiNode::Value() has special logic to compute the type of an iv phi > based on the counted loop's init value and limit. The type is > recomputed from scratch on every call to PhiNode::Value(). As the loop > is transformed, PhiNode::Value() may return a type for the iv phi that > widens. For instance, for: > > for (int i = 0; i < 100; i++) { > } > > PhiNode::value() returns accurate bounds for the iv phi. But if the > loop is transformed to pre/main/post loops, the init and limit no > longer have types that no longer allow an accurate computation of the > iv phi bounds. > > The fix is to filter with the recorded _type for the Phi on every call > of PhiNode::Value(). > > This change was considered before (by Christian) but was not proposed > for integration because of a performance regression on a micro > benchmark. I investigated the performance regression and added my > findings to the bug report. While I'm not 100% sure I found the root > cause of the regression, the differences I see in the ideal graph of > the hottest methods of the micro benchmark with the change are fairly > small and I don't think that regression should block this fix. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1114 From thartmann at openjdk.java.net Tue Nov 10 13:27:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 10 Nov 2020 13:27:55 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 16:41:28 GMT, Roland Westrelin wrote: > A loop's trip count is computed to have exact trip count 6. Then: > > 1- pre/main/post loops are created which brings the trip count from 6 > to 5 > 2- main loop is unrolled which brings the trip count to 2 > 3- main loop is peeled: Trip count is 1 > 4- pre/main/post loops are created again. Trip count of main loop is 0. > 5- peeling is attempted again and the assert fires > > IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip > count is 1. I propose that IdealLoopTree::policy_range_check() (that > causes the pre/main/post loops insertion the second time) performs the > same check so step 4 doesn't happen. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1096 From thartmann at openjdk.java.net Tue Nov 10 13:45:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 10 Nov 2020 13:45:55 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v2] In-Reply-To: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> References: <0HQav-gTuUWrHL0tAWhIuZEQeSh_DAq1O4ENVyyL6W0=.0ccceb67-0606-4e77-b42e-4bf39db0f58f@github.com> Message-ID: On Mon, 9 Nov 2020 14:17:16 GMT, Claes Redestad wrote: >> This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. >> >> To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op >> >> Patched: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op >> SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op >> SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op >> SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op >> SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op >> SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op >> SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op >> >> This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. >> >> Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Avoid using ULL Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1102 From jiefu at openjdk.java.net Tue Nov 10 13:51:56 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 10 Nov 2020 13:51:56 GMT Subject: RFR: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad [v2] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 13:03:41 GMT, Tobias Hartmann wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Update the comments > > Looks good to me. Thanks @TobiHartmann for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1107 From jiefu at openjdk.java.net Tue Nov 10 13:51:57 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 10 Nov 2020 13:51:57 GMT Subject: Integrated: 8256009: Remove src/hotspot/share/adlc/Test/i486.ad In-Reply-To: References: Message-ID: <29nzCfhxBInxZRyGPnWRXRxEpcw5Vn8V3w1vJ7P1Kx4=.c7480514-e2a3-44fb-83da-ccd09dc0beb2@github.com> On Sat, 7 Nov 2020 07:50:17 GMT, Jie Fu wrote: > src/hotspot/share/adlc/Test/i486.ad is empty. > It might be better to remove it. > Thanks. This pull request has now been integrated. Changeset: a1d4b9f3 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/a1d4b9f3 Stats: 11 lines in 9 files changed: 0 ins; 0 del; 11 mod 8256009: Remove src/hotspot/share/adlc/Test/i486.ad Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1107 From eosterlund at openjdk.java.net Tue Nov 10 14:11:55 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Nov 2020 14:11:55 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: On Sun, 8 Nov 2020 21:35:29 GMT, Roman Kennke wrote: > In Shenandoah-testing, we noticed that compiler/jsr292/CallSiteDepContextTest.java fails with the following error: > > Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:92), pid=906849, tid=907073 > Error: Before Updating References, Marked; Must be marked in complete bitmap > > Referenced from: > interior location: 0x00000000fff87504 > 0x00000000fff874f8 - klass 0x000000010004ecd8 java.lang.invoke.MutableCallSite > allocated after mark start > not after update watermark > marked strong > marked weak > not in collection set > mark: mark(is_neutral no_hash age=0) > region: | 2565|R |BTE fff80000, fffc0000, fffc0000|TAMS fff80000|UWM fffc0000|U 256K|T 0B|G 256K|S 0B|L 0B|CP 0 > > Object: > 0x00000000d80a9210 - klass 0x000000010004cf58 java.lang.invoke.DirectMethodHandle > not allocated after mark start > not after update watermark > not marked strong > not marked weak > in collection set > mark: mark(is_neutral no_hash age=0) > region: | 9|CS |BTE d8080000, d80c0000, d80c0000|TAMS d80c0000|UWM d80c0000|U 256K|T 256K|G 0B|S 0B|L 22464B|CP 0 > > Forwardee: > (the object itself) > > In other words, a reachable (marked) MutableCallSite references an unreachable DirectMethodHandle. That reference would subsequently become dangling and lead to crashes if accessed. > > I narrowed it down to the access in Dependencies::DepStream::recorded_oop_at(int i) which is done as 'strong', which means that it would return the reference even if it is unreachable, e.g. during concurrent class-unloading. This resurrection of the unreachable DMH is potentially fatal: eventually the reference will become dangling (after GC) and lead to crashes when accessed. I believe that access should be 'phantom' instead which causes GCs like Shenandoah and ZGC to return NULL when encountering unreachable objects. > > (Notice that the bug only manifested after JDK-8255691, we accidentally applied the resurrection-preventing weak-LRB on strong access too) > > Testing: the offending CallSiteDepContextTest.java, tier1+UseShenandoahGC+ShenandoahVerify, tier2+UseShenandoahGC+ShenandoahVerify, hotspot_gc_shenandoah So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are *not* poking around at oops. So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From mdoerr at openjdk.java.net Tue Nov 10 15:16:54 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 10 Nov 2020 15:16:54 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 00:59:56 GMT, Paul Sandoz wrote: >> At the moment, neither PPC nor s390 support any conversion intrinsics. Modern s390 (or >> z/Architecture to be more precise) machines have vector instructions, but nobody implemented them in hotspot. So s390 uses the regular 8 Byte registers for vectors. PPC only uses 16 Byte vectors on modern hardware (8 Byte on older hardware). >> >> Default timeout is 1200 and I found out that 50% more makes the tests happy on all our machines. >> >> We could do an experiment, but I'm not familiar with the test. > > Perhaps the following patch might help. > > Still for say 512 conversion test on my mac that has no AVX512 support the test runs (including compilation) in about 60s. With the patch it reduces to about 40s. > > If you run jtreg in verbose mode, `-va` it should output individual test times. Perhaps some are taking longer than others? > > diff --git a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java > index d1303bfd295..1754af2110a 100644 > --- a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java > +++ b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java > @@ -551,7 +551,8 @@ abstract class AbstractVectorConversionTest { > int m = Math.max(dst_species_len,src_species_len) / Math.min(src_species_len,dst_species_len); > > int [] parts = getPartsArray(m, is_contracting_conv); > - for (int ic = 0; ic < INVOC_COUNT; ic++) { > + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); > + for (int ic = 0; ic < count; ic++) { > for (int i=0, j=0; i < in_len; i += src_species_len, j+= dst_species_len) { > int part = parts[i % parts.length]; > var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); > @@ -592,7 +593,8 @@ abstract class AbstractVectorConversionTest { > int m = Math.max(dst_vector_size,src_vector_size) / Math.min(dst_vector_size, src_vector_size); > > int [] parts = getPartsArray(m, is_contracting_conv); > - for (int ic = 0; ic < INVOC_COUNT; ic++) { > + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); > + for (int ic = 0; ic < count; ic++) { > for (int i = 0, j=0; i < in_len; i += src_vector_lane_cnt, j+= dst_vector_lane_cnt) { > int part = parts[i % parts.length]; > var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); > @@ -609,4 +611,15 @@ abstract class AbstractVectorConversionTest { > } > assertResultsEquals(boxed_res, boxed_ref, dst_vector_lane_cnt); > } > + > + static int invocationCount(int c, VectorSpecies... species) { > + return Arrays.asList(species).stream().allMatch(AbstractVectorConversionTest::leqPreferred) > + ? c > + : Math.min(c, c / 100); > + } > + > + static boolean leqPreferred(VectorSpecies species) { > + VectorSpecies preferred = VectorSpecies.ofPreferred(species.elementType()); > + return species.length() <= preferred.length(); > + } > } Hi Paul, your proposal helps a bit. Test has passed on some machines, but not on all ones. "contracting_conversion_scalar" is still very prominent on PPC. Do we need to implement intrinsics to get this fast? In addition, there's another timeout in AddTest.java on x86: https://bugs.openjdk.java.net/browse/JDK-8255915 So it seems like there's more work to do. Suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From rkennke at openjdk.java.net Tue Nov 10 16:17:06 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 16:17:06 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 14:08:53 GMT, Erik ?sterlund wrote: >> In Shenandoah-testing, we noticed that compiler/jsr292/CallSiteDepContextTest.java fails with the following error: >> >> CONF=linux-x86_64-server-fastdebug make run-test TEST=compiler/jsr292/CallSiteDepContextTest.java TEST_VM_OPTS="-XX:+UseShenandoahGC -XX:+ShenandoahVerify" >> >> # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:92), pid=2318905, tid=2318938 >> # Error: Before Updating References, Marked; Must be marked in complete bitmap >> >> Referenced from: >> interior location: 0x00000000fff8514c >> 0x00000000fff85140 - klass 0x000000010004ecd8 java.lang.invoke.MutableCallSite >> allocated after mark start >> not after update watermark >> marked strong >> marked weak >> not in collection set >> mark: mark(is_neutral no_hash age=0) >> region: | 2565|R |BTE fff80000, fffc0000, fffc0000|TAMS fff80000|UWM fffc0000|U 256K|T 0B|G 256K|S 0B|L 0B|CP 0 >> >> Object: >> 0x00000000d80a9210 - klass 0x000000010004cf58 java.lang.invoke.DirectMethodHandle >> not allocated after mark start >> not after update watermark >> not marked strong >> not marked weak >> in collection set >> mark: mark(is_neutral no_hash age=0) >> region: | 9|CS |BTE d8080000, d80c0000, d80c0000|TAMS d80c0000|UWM d80c0000|U 256K|T 256K|G 0B|S 0B|L 22464B|CP 0 >> >> Forwardee: >> (the object itself) >> >> In other words, a reachable (marked) MutableCallSite references an unreachable DirectMethodHandle. That reference would subsequently become dangling and lead to crashes if accessed. >> >> I narrowed it down to the access in Dependencies::DepStream::recorded_oop_at(int i) which is done as 'strong', which means that it would return the reference even if it is unreachable, e.g. during concurrent class-unloading. This resurrection of the unreachable DMH is potentially fatal: eventually the reference will become dangling (after GC) and lead to crashes when accessed. I believe that access should be 'phantom' instead which causes GCs like Shenandoah and ZGC to return NULL when encountering unreachable objects. >> >> (Notice that the bug only manifested after JDK-8255691, we accidentally applied the resurrection-preventing weak-LRB on strong access too) >> >> Testing: the offending CallSiteDepContextTest.java, tier1+UseShenandoahGC+ShenandoahVerify, tier2+UseShenandoahGC+ShenandoahVerify, hotspot_gc_shenandoah > > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are *not* poking around at oops. > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are _not_ poking around at oops. > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? I am digging a little deeper. Planting an assert in relevant place barrier shows that we are exposing an unmarked object in this code path in nmethod::flush_dependencies(): oop call_site = deps.argument_oop(0); if (delete_immediately) { assert_locked_or_safepoint(CodeCache_lock); MethodHandles::remove_dependent_nmethod(call_site, this); } else { MethodHandles::clean_dependency_context(call_site); } # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp:112), pid=2396340, tid=2396367 # assert(obj == __null || !_heap->is_concurrent_weak_root_in_progress() || _heap->marking_context()->is_marked(obj)) failed: only expose marked objects Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x626fa0] AccessInternal::PostRuntimeDispatch, (AccessInternal::BarrierType)2, 544868ul>::oop_access_barrier(void*)+0x290 V [libjvm.so+0x132b3ea] nmethod::oop_at(int) const+0x4a V [libjvm.so+0x9e862e] Dependencies::DepStream::argument_oop(int)+0x7e V [libjvm.so+0x132fe21] nmethod::flush_dependencies(bool)+0x1f1 V [libjvm.so+0x1580e79] ShenandoahNMethodUnlinkClosure::do_nmethod(nmethod*)+0x3d9 V [libjvm.so+0x16113fc] ShenandoahNMethodTableSnapshot::concurrent_nmethods_do(NMethodClosure*)+0x8c V [libjvm.so+0x157f19b] ShenandoahUnlinkTask::work(unsigned int)+0x2b V [libjvm.so+0x18faa54] GangWorker::run_task(WorkData)+0x84 V [libjvm.so+0x18fabb3] GangWorker::loop()+0x63 V [libjvm.so+0x17b6008] Thread::call_run()+0xf8 V [libjvm.so+0x13b6d2e] thread_native_entry(Thread*)+0x10e In other words, it gets the unmarked call_site, then does access it afterwards. TBH, I am not totally sure that we aren't doing something wrong somewhere. ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From eosterlund at openjdk.java.net Tue Nov 10 16:35:58 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Nov 2020 16:35:58 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 16:14:07 GMT, Roman Kennke wrote: > > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are _not_ poking around at oops. > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? > > I am digging a little deeper. Planting an assert in relevant place barrier shows that we are exposing an unmarked object in this code path in nmethod::flush_dependencies(): > > ``` > oop call_site = deps.argument_oop(0); > if (delete_immediately) { > assert_locked_or_safepoint(CodeCache_lock); > MethodHandles::remove_dependent_nmethod(call_site, this); > } else { > MethodHandles::clean_dependency_context(call_site); > } > ``` > > ``` > # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp:112), pid=2396340, tid=2396367 > # assert(obj == __null || !_heap->is_concurrent_weak_root_in_progress() || _heap->marking_context()->is_marked(obj)) failed: only expose marked objects > > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x626fa0] AccessInternal::PostRuntimeDispatch, (AccessInternal::BarrierType)2, 544868ul>::oop_access_barrier(void*)+0x290 > V [libjvm.so+0x132b3ea] nmethod::oop_at(int) const+0x4a > V [libjvm.so+0x9e862e] Dependencies::DepStream::argument_oop(int)+0x7e > V [libjvm.so+0x132fe21] nmethod::flush_dependencies(bool)+0x1f1 > V [libjvm.so+0x1580e79] ShenandoahNMethodUnlinkClosure::do_nmethod(nmethod*)+0x3d9 > V [libjvm.so+0x16113fc] ShenandoahNMethodTableSnapshot::concurrent_nmethods_do(NMethodClosure*)+0x8c > V [libjvm.so+0x157f19b] ShenandoahUnlinkTask::work(unsigned int)+0x2b > V [libjvm.so+0x18faa54] GangWorker::run_task(WorkData)+0x84 > V [libjvm.so+0x18fabb3] GangWorker::loop()+0x63 > V [libjvm.so+0x17b6008] Thread::call_run()+0xf8 > V [libjvm.so+0x13b6d2e] thread_native_entry(Thread*)+0x10e > ``` > > In other words, it gets the unmarked call_site, then does access it afterwards. TBH, I am not totally sure that we aren't doing something wrong somewhere. ? This is all very intentional. This is called by the class unloading code, and is the very reason why it is strong. The class unloading code, concurrent or STW doesn't matter, peeks and walks a chain of dead objects to find and clean up native dependency contexts of CallSites. This means that we have to expose the dead objects to the class unloading code, so that we can walk the dead objects, find the dependency context, and clean it. So if we would perform your requested change, we would probably crash here with SIGSEGV, trying to dereference NULL, as it is expected that the call_site is not NULL. ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From kvn at openjdk.java.net Tue Nov 10 16:37:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 10 Nov 2020 16:37:01 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v4] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Tue, 10 Nov 2020 08:30:08 GMT, Roberto Casta?eda Lozano wrote: >> Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Use 'hash_find_insert' to look for existing ConvI2L nodes > - Merge master > - Update tests > > Simplify JVM arguments and run each test case 100000 times to still trigger > C2. Use randomization to avoid constant propagation in C2. Increase the load of > the stress tests and their timeout to 30s to further reduce the risk of false > positives. > - Merge master > - Generalize the fix to handle any input where AddIs are used multiple times by > other AddIs, which could also lead to an exponential number of calls to > ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if > possible rather than eagerly creating new ones and (2) postponing the > optimization of newly created ConvI2Ls. Remove "hook" node solution introduced > in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is > called within iterative GVN using phase->is_IterGVN() rather than can_reshape, > for clarity. > > Merge all tests into a single class. Reimplement the microbenchmark as a test > case that should time out in case of a combinatorial explosion. Add a second > similar microbenchmark that demonstrates the need for this generalization. > - Merge master > - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially > > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within > ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of > AddL from a single ConvI2L node rather than creating two semantically equivalent > ConvI2L nodes. This avoids an exponential number of calls to > ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the > optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero > during parsing anyway. Add a set of regression tests for the transformation that > cover different shapes of AddI subgraphs. Also add a microbenchmark that > exercises the special case, for performance regression testing. Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/727 From redestad at openjdk.java.net Tue Nov 10 16:52:20 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 10 Nov 2020 16:52:20 GMT Subject: RFR: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t [v3] In-Reply-To: References: Message-ID: > This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. > > To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: > > Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op > > Patched: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op > > This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. > > Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into c2_uintptr_t - Copyrights - Merge branch 'master' into c2_uintptr_t - Avoid using ULL - unsigned overflow in find_last_elem (found by some tier6 tests) - Fix and clarify low_bits - Merge branch 'master' into c2_uintptr_t - Improve bitfield comments - ALL_BITS clash, rename constants. - Fix comments from Vladimir and Mikael. A few additional cleanups. - ... and 3 more: https://git.openjdk.java.net/jdk/compare/e9af71d3...3e29c4e1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1102/files - new: https://git.openjdk.java.net/jdk/pull/1102/files/38c60560..3e29c4e1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1102&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1102&range=01-02 Stats: 48818 lines in 390 files changed: 26968 ins; 14479 del; 7371 mod Patch: https://git.openjdk.java.net/jdk/pull/1102.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1102/head:pull/1102 PR: https://git.openjdk.java.net/jdk/pull/1102 From redestad at openjdk.java.net Tue Nov 10 16:52:22 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 10 Nov 2020 16:52:22 GMT Subject: Integrated: 8221404: C2: Convert RegMask and IndexSet to use uintptr_t In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 21:21:56 GMT, Claes Redestad wrote: > This patch refactors RegMask and IndexSet to use uintptr_t rather than int for storage, which may shorten some code paths and loops on 64-bit VMs. Making storage unsigned further allows for a few simplification, e.g. is_bound_set where there was logic to deal with sign extension that can no longer happen. > > To evaluate performance impact I created the included JMH microbenchmark which uses the RepeatCompilation command to repeat the compilation of a few methods: One trivial (`trivialMath`), one "regular" (`mixHashCode`), and one largish ( `largeMethod`..) with a lot of locals. These are designed to put no stress, some stress and quite a bit of stress on register allocation: > > Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 168.919 ? 2.839 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8920.305 ? 40.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 153.961 ? 2.762 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 8242.061 ? 71.989 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 69.526 ? 7.098 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6733.627 ? 63.689 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 316.862 ? 29.682 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4544.604 ? 57.439 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.757 ? 1.553 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 499.214 ? 35.984 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.345 ? 2.168 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 398.528 ? 4.718 ms/op > > Patched: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_baseline ss 10 164.355 ? 3.531 ms/op > SimpleRepeatCompilation.largeMethod_repeat ss 10 8516.033 ? 22.408 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c1 ss 10 151.181 ? 12.869 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 10 7857.373 ? 52.826 ms/op > SimpleRepeatCompilation.mixHashCode_baseline ss 10 65.085 ? 5.643 ms/op > SimpleRepeatCompilation.mixHashCode_repeat ss 10 6601.693 ? 57.898 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c1 ss 10 315.845 ? 27.474 ms/op > SimpleRepeatCompilation.mixHashCode_repeat_c2 ss 10 4456.847 ? 30.459 ms/op > SimpleRepeatCompilation.trivialMath_baseline ss 10 21.273 ? 2.115 ms/op > SimpleRepeatCompilation.trivialMath_repeat ss 10 506.873 ? 18.994 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c1 ss 10 100.184 ? 3.008 ms/op > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 397.010 ? 4.531 ms/op > > This shows that there's no significant change on `trivialMath`, `mixHashCode` see a small improvement (~2%) and `largeMethod` see a larger improvement (~4-5%) on C2 and Tiered tests with compiler repetition. > > Testing: tier 1-7 on all Oracle platforms, local testing and verification of linux-x86. This pull request has now been integrated. Changeset: 6ae5e5b6 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/6ae5e5b6 Stats: 479 lines in 5 files changed: 283 ins; 25 del; 171 mod 8221404: C2: Convert RegMask and IndexSet to use uintptr_t Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1102 From rkennke at openjdk.java.net Tue Nov 10 17:19:56 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 17:19:56 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 16:32:53 GMT, Erik ?sterlund wrote: > > > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are _not_ poking around at oops. > > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? > > > > > > I am digging a little deeper. Planting an assert in relevant place barrier shows that we are exposing an unmarked object in this code path in nmethod::flush_dependencies(): > > ``` > > oop call_site = deps.argument_oop(0); > > if (delete_immediately) { > > assert_locked_or_safepoint(CodeCache_lock); > > MethodHandles::remove_dependent_nmethod(call_site, this); > > } else { > > MethodHandles::clean_dependency_context(call_site); > > } > > ``` > > > > > > ``` > > # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp:112), pid=2396340, tid=2396367 > > # assert(obj == __null || !_heap->is_concurrent_weak_root_in_progress() || _heap->marking_context()->is_marked(obj)) failed: only expose marked objects > > > > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > > V [libjvm.so+0x626fa0] AccessInternal::PostRuntimeDispatch, (AccessInternal::BarrierType)2, 544868ul>::oop_access_barrier(void*)+0x290 > > V [libjvm.so+0x132b3ea] nmethod::oop_at(int) const+0x4a > > V [libjvm.so+0x9e862e] Dependencies::DepStream::argument_oop(int)+0x7e > > V [libjvm.so+0x132fe21] nmethod::flush_dependencies(bool)+0x1f1 > > V [libjvm.so+0x1580e79] ShenandoahNMethodUnlinkClosure::do_nmethod(nmethod*)+0x3d9 > > V [libjvm.so+0x16113fc] ShenandoahNMethodTableSnapshot::concurrent_nmethods_do(NMethodClosure*)+0x8c > > V [libjvm.so+0x157f19b] ShenandoahUnlinkTask::work(unsigned int)+0x2b > > V [libjvm.so+0x18faa54] GangWorker::run_task(WorkData)+0x84 > > V [libjvm.so+0x18fabb3] GangWorker::loop()+0x63 > > V [libjvm.so+0x17b6008] Thread::call_run()+0xf8 > > V [libjvm.so+0x13b6d2e] thread_native_entry(Thread*)+0x10e > > ``` > > > > > > In other words, it gets the unmarked call_site, then does access it afterwards. TBH, I am not totally sure that we aren't doing something wrong somewhere. ? > > This is all very intentional. This is called by the class unloading code, and is the very reason why it is strong. The class unloading code, concurrent or STW doesn't matter, peeks and walks a chain of dead objects to find and clean up native dependency contexts of CallSites. This means that we have to expose the dead objects to the class unloading code, so that we can walk the dead objects, find the dependency context, and clean it. So if we would perform your requested change, we would probably crash here with SIGSEGV, trying to dereference NULL, as it is expected that the call_site is not NULL. Right. No, it doesn't crash there. It only ever fails our verification. I see that a couple of IN_NATIVE accesses are also decorated with AS_NO_KEEPALIVE. We interpret that as 'skip SATB barrier' during marking, I wonder if that is even the correct interpretation. The original assert seems to imply that a new MutableCallSite refers to an old but unmarked DirectMethodHandle. ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From eosterlund at openjdk.java.net Tue Nov 10 17:59:59 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Nov 2020 17:59:59 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 17:17:13 GMT, Roman Kennke wrote: > > > > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are _not_ poking around at oops. > > > > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod? > > > > > > > > > > > > I am digging a little deeper. Planting an assert in relevant place barrier shows that we are exposing an unmarked object in this code path in nmethod::flush_dependencies(): > > > > ``` > > > > oop call_site = deps.argument_oop(0); > > > > if (delete_immediately) { > > > > assert_locked_or_safepoint(CodeCache_lock); > > > > MethodHandles::remove_dependent_nmethod(call_site, this); > > > > } else { > > > > MethodHandles::clean_dependency_context(call_site); > > > > } > > > > ``` > > > > > > > > > > > > ``` > > > > # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp:112), pid=2396340, tid=2396367 > > > > # assert(obj == __null || !_heap->is_concurrent_weak_root_in_progress() || _heap->marking_context()->is_marked(obj)) failed: only expose marked objects > > > > > > > > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > > > > V [libjvm.so+0x626fa0] AccessInternal::PostRuntimeDispatch, (AccessInternal::BarrierType)2, 544868ul>::oop_access_barrier(void*)+0x290 > > > > V [libjvm.so+0x132b3ea] nmethod::oop_at(int) const+0x4a > > > > V [libjvm.so+0x9e862e] Dependencies::DepStream::argument_oop(int)+0x7e > > > > V [libjvm.so+0x132fe21] nmethod::flush_dependencies(bool)+0x1f1 > > > > V [libjvm.so+0x1580e79] ShenandoahNMethodUnlinkClosure::do_nmethod(nmethod*)+0x3d9 > > > > V [libjvm.so+0x16113fc] ShenandoahNMethodTableSnapshot::concurrent_nmethods_do(NMethodClosure*)+0x8c > > > > V [libjvm.so+0x157f19b] ShenandoahUnlinkTask::work(unsigned int)+0x2b > > > > V [libjvm.so+0x18faa54] GangWorker::run_task(WorkData)+0x84 > > > > V [libjvm.so+0x18fabb3] GangWorker::loop()+0x63 > > > > V [libjvm.so+0x17b6008] Thread::call_run()+0xf8 > > > > V [libjvm.so+0x13b6d2e] thread_native_entry(Thread*)+0x10e > > > > ``` > > > > > > > > > > > > In other words, it gets the unmarked call_site, then does access it afterwards. TBH, I am not totally sure that we aren't doing something wrong somewhere. ? > > > > > > This is all very intentional. This is called by the class unloading code, and is the very reason why it is strong. The class unloading code, concurrent or STW doesn't matter, peeks and walks a chain of dead objects to find and clean up native dependency contexts of CallSites. This means that we have to expose the dead objects to the class unloading code, so that we can walk the dead objects, find the dependency context, and clean it. So if we would perform your requested change, we would probably crash here with SIGSEGV, trying to dereference NULL, as it is expected that the call_site is not NULL. > > > > Right. No, it doesn't crash there. It only ever fails our verification. Okay. > I see that a couple of IN_NATIVE accesses are also decorated with AS_NO_KEEPALIVE. We interpret that as 'skip SATB barrier' during marking, I wonder if that is even the correct interpretation. That is the right interpretation. > The original assert seems to imply that a new MutableCallSite refers to an old but unmarked DirectMethodHandle. Sounds like the old MutableCallSite escaped the snapshot at the beginning somehow. Maybe it is related to your new reference processor? ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From joe.darcy at oracle.com Tue Nov 10 19:04:07 2020 From: joe.darcy at oracle.com (Joe Darcy) Date: Tue, 10 Nov 2020 11:04:07 -0800 Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On 11/9/2020 2:12 PM, Xubo Zhang wrote: > On Mon, 2 Nov 2020 17:42:27 GMT, Joe Darcy wrote: > >>> Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: >>> >>> - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 >>> - Added test cases for exp at the value of 1024 and 10000 >> The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. > Hi Darcy, > Where should the test be? A new test file? > Given the absence of an existing dedicated exp test, yes, a new file in the test/jdk/java/lang/Math directory. I suggest looking at Atan2Tests.java as a model for another test that just probes a few values. HTH, -Joe From rkennke at openjdk.java.net Tue Nov 10 19:08:54 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 19:08:54 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: Message-ID: <0AUPxoKLq_6Ujz5I87PyO_l8LstqBqq2qPwSCh9kugQ=.dabe16f2-9162-4efe-abc2-5eda5f433bbc@github.com> On Tue, 10 Nov 2020 17:56:53 GMT, Erik ?sterlund wrote: > > The original assert seems to imply that a new MutableCallSite refers to an old but unmarked DirectMethodHandle. > > Sounds like the old MutableCallSite escaped the snapshot at the beginning somehow. Maybe it is related to your new reference processor? Hmm I think it is more subtle than that. I changed all occurances of argument_oop() one-by-one with the 'phantom' variant to narrow it down. It really is the flush_dependencies() method that breaks it. However, as you noted, it cannot accept NULL there, otherwise it would crash. However, we *do* have a special code-path there that returns the original object instead of NULL *when the calling thread is not a Java thread* (See https://bugs.openjdk.java.net/browse/JDK-8237874). I believe this is why the fix helps. OTOH, if I don't do this, the remainder of the barrier would evacuate the object, and thus make the unreachable object alive again. :-S I suspect that we must skip evacuating objects when we get AS_NO_KEEPALIVE, but that seems to result in a "memory stomping" error. Grrrr. ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From rcastanedalo at openjdk.java.net Tue Nov 10 19:17:57 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Nov 2020 19:17:57 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v4] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Tue, 10 Nov 2020 16:34:36 GMT, Vladimir Kozlov wrote: > Good Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From eosterlund at openjdk.java.net Tue Nov 10 20:12:55 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Nov 2020 20:12:55 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: <0AUPxoKLq_6Ujz5I87PyO_l8LstqBqq2qPwSCh9kugQ=.dabe16f2-9162-4efe-abc2-5eda5f433bbc@github.com> References: <0AUPxoKLq_6Ujz5I87PyO_l8LstqBqq2qPwSCh9kugQ=.dabe16f2-9162-4efe-abc2-5eda5f433bbc@github.com> Message-ID: On Tue, 10 Nov 2020 19:05:41 GMT, Roman Kennke wrote: > > > > > The original assert seems to imply that a new MutableCallSite refers to an old but unmarked DirectMethodHandle. > > > > > > Sounds like the old MutableCallSite escaped the snapshot at the beginning somehow. Maybe it is related to your new reference processor? > > > > Hmm I think it is more subtle than that. > > I changed all occurances of argument_oop() one-by-one with the 'phantom' variant to narrow it down. It really is the flush_dependencies() method that breaks it. However, as you noted, it cannot accept NULL there, otherwise it would crash. However, we *do* have a special code-path there that returns the original object instead of NULL *when the calling thread is not a Java thread* (See https://bugs.openjdk.java.net/browse/JDK-8237874). I believe this is why the fix helps. OTOH, if I don't do this, the remainder of the barrier would evacuate the object, and thus make the unreachable object alive again. :-S I suspect that we must skip evacuating objects when we get AS_NO_KEEPALIVE, but that seems to result in a "memory stomping" error. Grrrr. Any non-raw access load should expose only the to-space object. But that is completely orthogonal to whether it should be marked or not. And obviously, having completely different semantics for accesses depending on whether the access is performed on a Java thread or not, is not a good idea. Sounds like the barrier code needs fixing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From rkennke at openjdk.java.net Tue Nov 10 20:19:53 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 20:19:53 GMT Subject: RFR: 8256020: Don't resurrect objects on argument-dependency access In-Reply-To: References: <0AUPxoKLq_6Ujz5I87PyO_l8LstqBqq2qPwSCh9kugQ=.dabe16f2-9162-4efe-abc2-5eda5f433bbc@github.com> Message-ID: On Tue, 10 Nov 2020 20:10:03 GMT, Erik ?sterlund wrote: > Any non-raw access load should expose only the to-space object. But that is completely orthogonal to whether it should be marked or not. And obviously, having completely different semantics for accesses depending on whether the access is performed on a Java thread or not, is not a good idea. Sounds like the barrier code needs fixing. Why do these native-accesses be not-raw anyway? The trouble in Shenandoah is that if an object is unreachable, and we evacuate it, then it becomes 'live' at least in the sense that it is now beyond top-at-mark-start (TAMS), e.g. implicitely-live. I believe this is what the verifier is ultimately complaining about. This is *not* a marking issue. I have a fix that I'll push shortly after I did more testing. It does return the naked (from-space) oop when encountering AS_NO_KEEPALIVE on an unmarked object. That seems to fix this particular testcase as seems to be in the spirit of AS_NO_KEEPALIVE, assuming that AS_NO_KEEPALIVE access does not do anything nasty like storing the oop elsewhere. ------------- PR: https://git.openjdk.java.net/jdk/pull/1113 From github.com+1981974+kuaiwei at openjdk.java.net Tue Nov 10 21:29:04 2020 From: github.com+1981974+kuaiwei at openjdk.java.net (kuaiwei) Date: Tue, 10 Nov 2020 21:29:04 GMT Subject: Withdrawn: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 11:58:34 GMT, kuaiwei wrote: > Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only one klass. If not, a slow loop is used to checking both klasses. > > Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go through itable once and reduce memory operations. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8253049 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/128 From mdoerr at openjdk.java.net Tue Nov 10 22:16:59 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 10 Nov 2020 22:16:59 GMT Subject: RFR: 8248191: [PPC64] Replace lxvd2x/stxvd2x with lxvx/stxvx for Power10 In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 00:52:20 GMT, Ziviani wrote: > The pair lxvx/stxvx are more modern VSX instructions to load/store data. > These should benefit the Vector API because lxvd2x/stxvd2x may require > xxswapd, leading to a more difficult code generation. Did you run Vector API tests (test/jdk/jdk/incubator/vector) on Power 10? Doesn't this mean we use a different Byte order on Power 9 than on Power 10 on little endian? ------------- PR: https://git.openjdk.java.net/jdk/pull/1086 From xliu at openjdk.java.net Tue Nov 10 23:04:20 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 10 Nov 2020 23:04:20 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() [v2] In-Reply-To: References: Message-ID: <3GycuTnPQw5R5pN8fGkBYBqn6jUtBDen6-W8v0v9BNc=.f30fcebb-7a9a-4ead-b772-abc0ebcde5ee@github.com> > The optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) > to substring(base, beg, end) | base.startsWith(prefix, beg). > > it reduces uses of substring. hopefully c2 optimizer can remove the used substring. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8254807: Optimize startsWith() for String.substring() improve microbenchmark based on the PR feedback. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/974/files - new: https://git.openjdk.java.net/jdk/pull/974/files/56108f76..2543aa88 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=974&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=974&range=00-01 Stats: 180 lines in 2 files changed: 110 ins; 70 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/974.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/974/head:pull/974 PR: https://git.openjdk.java.net/jdk/pull/974 From ngasson at openjdk.java.net Wed Nov 11 02:27:08 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 11 Nov 2020 02:27:08 GMT Subject: RFR: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls [v2] In-Reply-To: References: Message-ID: > The PR for JDK-8254231 introduces a new assertion in opto/output.cpp to > check the current instruction offset against the offset of the call > return address reported by ret_addr_offset(). This fails on AArch64 > because MachCallRuntimeNode::ret_addr_offset() claims four instructions > are generated for a stub call (far branch) but actually it's just > one (blr to stub or trampoline). > > Tested tier1. Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Fix comment blr->bl ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1138/files - new: https://git.openjdk.java.net/jdk/pull/1138/files/1570be1e..b934df8e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1138&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1138&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1138.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1138/head:pull/1138 PR: https://git.openjdk.java.net/jdk/pull/1138 From ngasson at openjdk.java.net Wed Nov 11 02:27:09 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 11 Nov 2020 02:27:09 GMT Subject: RFR: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls [v2] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 11:03:41 GMT, Jorn Vernee wrote: >> So here's a weird thing: this code has been wrong forever, but apparently it never mattered. I wonder why it didn't break anything before now. > > @theRealAph The bug that this catches manifests when the reported return offset lines up _exactly_ with that of a later call. In that case two calls will use the same PC for their oop map, and one will be overwritten. > > Maybe we've been lucky that this is never actually the case for ARM, but I'd imaging the oop map annotation in PrintAssembly output might be in the wrong place, and the oop map offset it prints should be wrong. But, as long as the return PC it reports is unique, I don't think it will cause an immediate problem to functionality. Fixed the comment: `trampoline_call` generates `bl` not `blr`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1138 From xliu at openjdk.java.net Wed Nov 11 05:37:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 11 Nov 2020 05:37:56 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() [v2] In-Reply-To: References: Message-ID: <5XrnxBftqMeq-7XmbKlLDjhdBCZVolKCDPc9POdPubs=.f5591f89-4d87-4124-8cee-1f116a14a38f@github.com> On Mon, 2 Nov 2020 07:32:40 GMT, Xin Liu wrote: >> Some comments and nits on the microbenchmark. >> >> A general comment is that I think it would be good to add variants exercising UTF16 Strings: one where `sample` has some UTF-16 chars, and one where both `sample` and `prefix` do (latin-1 `sample` and UTF-16 `prefix` could be interesting too, to ensure this variant shortcuts quickly). >> >> Should the `prefix` be something a bit more complex than a single char string? `startsWith("a", off)` is a case that'd be tempting to optimize down to `charAt(off) == 'a'` and then this micro might no longer do what it intends to do. > > @cl4es > > Thank you for taking time to review this. I understand you would like to see more variants, such as UTF16 strings and different prefixes. > > This api-level substitution actually doesn't care the underlying representation of string and prefix of startsWith. it works in the same way. The purpose of this microbench is to prove that substring() is not inevitable in a certain pattern. JIT compilers can archive similar performance of the hand-craft code. Right now, I have only a single variable, which is the length of substring. > The result shows that the throughput is irrelevant of the lengths of substrings. > > My concern is that we would make results discernible if we introduce more than one variable. or I should write a group of benchmarks? here is the result of updated micro benchmark. DoubeleBytes group is the utf-16 string. SingleByte group the ascii string. without the optimization, the throughput is just 1/4~1/5 than hand-crafted code. With the optimization, the throughput can reach over 80% of hand-crafted code. The reason it can't reach 100% throughput of hand-crafted equivalence is that it has to generate stricter boundary checks. // without OptimizeSubstring Benchmark (substrLength) Mode Cnt Score Error Units SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 40847.644 ? 1269.274 ops/ms SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 39284.961 ? 155.109 ops/ms SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 12370.385 ? 622.280 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 60633.544 ? 1989.512 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 60613.490 ? 1024.846 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 30212.033 ? 1106.752 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154666.457 ? 7.132 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154659.583 ? 7.663 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154665.357 ? 6.414 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162833.972 ? 8.170 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162834.059 ? 7.862 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162456.046 ? 217.304 ops/ms // with OptimizeSubstring Benchmark (substrLength) Mode Cnt Score Error Units SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 119789.181 ? 3374.123 ops/ms SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 123740.637 ? 31.982 ops/ms SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 123701.525 ? 68.741 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 134529.257 ? 6.331 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 134517.222 ? 6.373 ops/ms SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 134527.929 ? 4.660 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154668.784 ? 6.990 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154567.971 ? 69.286 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154630.115 ? 36.157 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162852.894 ? 5.933 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162850.630 ? 5.990 ops/ms SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162841.513 ? 6.441 ops/ms ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From bulasevich at openjdk.java.net Wed Nov 11 07:51:06 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Wed, 11 Nov 2020 07:51:06 GMT Subject: RFR: 8254661: arm32: additional cleanup after fixing SIGSEGV Message-ID: This change fixes the fastdebug build assertion. This is actually a missing change for [8253901: ARM32: SIGSEGV during monitorexit](https://github.com/openjdk/jdk/pull/503) ------------- Commit messages: - 8254661: arm32: additional cleanup after fixing SIGSEGV Changes: https://git.openjdk.java.net/jdk/pull/1157/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1157&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254661 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1157.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1157/head:pull/1157 PR: https://git.openjdk.java.net/jdk/pull/1157 From xliu at openjdk.java.net Wed Nov 11 08:10:00 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 11 Nov 2020 08:10:00 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() [v2] In-Reply-To: <5XrnxBftqMeq-7XmbKlLDjhdBCZVolKCDPc9POdPubs=.f5591f89-4d87-4124-8cee-1f116a14a38f@github.com> References: <5XrnxBftqMeq-7XmbKlLDjhdBCZVolKCDPc9POdPubs=.f5591f89-4d87-4124-8cee-1f116a14a38f@github.com> Message-ID: On Wed, 11 Nov 2020 05:35:36 GMT, Xin Liu wrote: >> @cl4es >> >> Thank you for taking time to review this. I understand you would like to see more variants, such as UTF16 strings and different prefixes. >> >> This api-level substitution actually doesn't care the underlying representation of string and prefix of startsWith. it works in the same way. The purpose of this microbench is to prove that substring() is not inevitable in a certain pattern. JIT compilers can archive similar performance of the hand-craft code. Right now, I have only a single variable, which is the length of substring. >> The result shows that the throughput is irrelevant of the lengths of substrings. >> >> My concern is that we would make results discernible if we introduce more than one variable. or I should write a group of benchmarks? > > here is the result of updated micro benchmark. > DoubeleBytes group is the utf-16 string. SingleByte group the ascii string. > without the optimization, the throughput is just 1/4~1/5 than hand-crafted code. > > With the optimization, the throughput can reach over 80% of hand-crafted code. The reason it can't reach 100% throughput of hand-crafted equivalence is that it has to generate stricter boundary checks. > > // without OptimizeSubstring > Benchmark (substrLength) Mode Cnt Score Error Units > SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 40847.644 ? 1269.274 ops/ms > SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 39284.961 ? 155.109 ops/ms > SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 12370.385 ? 622.280 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 60633.544 ? 1989.512 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 60613.490 ? 1024.846 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 30212.033 ? 1106.752 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154666.457 ? 7.132 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154659.583 ? 7.663 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154665.357 ? 6.414 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162833.972 ? 8.170 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162834.059 ? 7.862 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162456.046 ? 217.304 ops/ms > > > // with OptimizeSubstring > Benchmark (substrLength) Mode Cnt Score Error Units > SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 119789.181 ? 3374.123 ops/ms > SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 123740.637 ? 31.982 ops/ms > SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 123701.525 ? 68.741 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 134529.257 ? 6.331 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 134517.222 ? 6.373 ops/ms > SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 134527.929 ? 4.660 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154668.784 ? 6.990 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154567.971 ? 69.286 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154630.115 ? 36.157 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162852.894 ? 5.933 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162850.630 ? 5.990 ops/ms > SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162841.513 ? 6.441 ops/ms hello, May I ping to review this patch? Many business-oriented Java applications manipulate strings a lot. Comparing to arithmetic scalar, string as oop is more expensive. furthermore, many string objects need to be allocated on heap, so they increase workload of GC. By analyzing the construction of string object, I found that one source is String.substring(). My target is to eliminate substring as many as possible. This is the first attempt for me to enable substring optimization. If it works out, I will apply the api-substitution approach on other APIs such as String.charAt, StringBuilder::append, and even String.split(). thanks you in advance. ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From ngasson at openjdk.java.net Wed Nov 11 09:58:59 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 11 Nov 2020 09:58:59 GMT Subject: RFR: 8254661: arm32: additional cleanup after fixing SIGSEGV In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 07:46:35 GMT, Boris Ulasevich wrote: > This change fixes the fastdebug build assertion. This is actually a missing change for [8253901: ARM32: SIGSEGV during monitorexit](https://github.com/openjdk/jdk/pull/503) Looks correct and I've tested on arm32 (not a Reviewer). ------------- Marked as reviewed by ngasson (Committer). PR: https://git.openjdk.java.net/jdk/pull/1157 From ngasson at openjdk.java.net Wed Nov 11 10:02:58 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 11 Nov 2020 10:02:58 GMT Subject: Integrated: 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 03:19:23 GMT, Nick Gasson wrote: > The PR for JDK-8254231 introduces a new assertion in opto/output.cpp to > check the current instruction offset against the offset of the call > return address reported by ret_addr_offset(). This fails on AArch64 > because MachCallRuntimeNode::ret_addr_offset() claims four instructions > are generated for a stub call (far branch) but actually it's just > one (blr to stub or trampoline). > > Tested tier1. This pull request has now been integrated. Changeset: 79ac0418 Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/79ac0418 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8256025: AArch64: MachCallRuntimeNode::ret_addr_offset() is incorrect for stub calls Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1138 From shade at openjdk.java.net Wed Nov 11 10:05:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 10:05:58 GMT Subject: RFR: 8254661: arm32: additional cleanup after fixing SIGSEGV In-Reply-To: <2PWXYyNoRNdctf6Pfd2qxJhN66mHqxi8b5UroA4Dx0Y=.3d56bb42-c244-4235-8959-d73adb4cc83c@github.com> References: <2PWXYyNoRNdctf6Pfd2qxJhN66mHqxi8b5UroA4Dx0Y=.3d56bb42-c244-4235-8959-d73adb4cc83c@github.com> Message-ID: On Wed, 11 Nov 2020 10:02:46 GMT, Aleksey Shipilev wrote: >> This change fixes the fastdebug build assertion. This is actually a missing change for [8253901: ARM32: SIGSEGV during monitorexit](https://github.com/openjdk/jdk/pull/503) > > Thanks, looks fine to me. I was a bit concerned that `InterpreterRuntime::monitorExit` might carry some result in `R0`, but it seems to return `void`, so that looks fine. Note that bots would not allow to integrate until PR title and JBS synopsis match. ------------- PR: https://git.openjdk.java.net/jdk/pull/1157 From shade at openjdk.java.net Wed Nov 11 10:05:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 10:05:58 GMT Subject: RFR: 8254661: arm32: additional cleanup after fixing SIGSEGV In-Reply-To: References: Message-ID: <2PWXYyNoRNdctf6Pfd2qxJhN66mHqxi8b5UroA4Dx0Y=.3d56bb42-c244-4235-8959-d73adb4cc83c@github.com> On Wed, 11 Nov 2020 07:46:35 GMT, Boris Ulasevich wrote: > This change fixes the fastdebug build assertion. This is actually a missing change for [8253901: ARM32: SIGSEGV during monitorexit](https://github.com/openjdk/jdk/pull/503) Thanks, looks fine to me. I was a bit concerned that `InterpreterRuntime::monitorExit` might carry some result in `R0`, but it seems to return `void`, so that looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1157 From rcastanedalo at openjdk.java.net Wed Nov 11 10:22:08 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Nov 2020 10:22:08 GMT Subject: Integrated: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 08:36:53 GMT, Roberto Casta?eda Lozano wrote: > Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. This pull request has now been integrated. Changeset: 432c387e Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/432c387e Stats: 190 lines in 2 files changed: 180 ins; 4 del; 6 mod 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially Prevent exponential number of calls to ConvI2LNode::Ideal() when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in 8217359, since this is subsumed by (2). Use phase->is_IterGVN() rather than can_reshape to check if ConvI2LNode::Ideal() is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From bulasevich at openjdk.java.net Wed Nov 11 11:12:57 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Wed, 11 Nov 2020 11:12:57 GMT Subject: Integrated: 8254661: arm32: additional cleanup after fixing SIGSEGV In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 07:46:35 GMT, Boris Ulasevich wrote: > This change fixes the fastdebug build assertion. This is actually a missing change for [8253901: ARM32: SIGSEGV during monitorexit](https://github.com/openjdk/jdk/pull/503) This pull request has now been integrated. Changeset: 362feaae Author: Boris Ulasevich URL: https://git.openjdk.java.net/jdk/commit/362feaae Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8254661: arm32: additional cleanup after fixing SIGSEGV Reviewed-by: ngasson, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1157 From mdoerr at openjdk.java.net Wed Nov 11 13:56:02 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 11 Nov 2020 13:56:02 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 Message-ID: C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. ------------- Commit messages: - 8256166: [C2] Registers get confused on Big Endian after 8221404 Changes: https://git.openjdk.java.net/jdk/pull/1165/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1165&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256166 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1165.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1165/head:pull/1165 PR: https://git.openjdk.java.net/jdk/pull/1165 From github.com+670087+jrziviani at openjdk.java.net Wed Nov 11 14:00:00 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 11 Nov 2020 14:00:00 GMT Subject: RFR: 8248191: [PPC64] Replace lxvd2x/stxvd2x with lxvx/stxvx for Power10 In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 22:13:53 GMT, Martin Doerr wrote: > Did you run Vector API tests (test/jdk/jdk/incubator/vector) on Power 10? just did it: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/jdk/jdk/incubator 73 73 0 0 ============================== TEST SUCCESS > Doesn't this mean we use a different Byte order on Power 9 than on Power 10 on little endian? hmm, good question. I'll investigate it. Looking at the ISA 3.1 (and comparing with 3.0) I don't the difference, but I'll check it. _(side node: if you want me to make any test on a P10 just tell me)_ ------------- PR: https://git.openjdk.java.net/jdk/pull/1086 From redestad at openjdk.java.net Wed Nov 11 14:12:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 11 Nov 2020 14:12:59 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 13:50:38 GMT, Martin Doerr wrote: > C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. > Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. Sorry for causing this headache. Patch looks OK to me, though I wish there was a cleaner way of doing this. Some time in the future I think we should consider refactoring this constructor to be RegMask(uintptr_t, uintptr_t, ...) and drop the _RM_I alias, but that pushes the complexity down into .ad files which means it'll probably be even trickier to maintain. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1165 From mdoerr at openjdk.java.net Wed Nov 11 15:17:56 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 11 Nov 2020 15:17:56 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 14:10:40 GMT, Claes Redestad wrote: >> C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. >> Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. > > Sorry for causing this headache. Patch looks OK to me, though I wish there was a cleaner way of doing this. > > Some time in the future I think we should consider refactoring this constructor to be RegMask(uintptr_t, uintptr_t, ...) and drop the _RM_I alias, but that pushes the complexity down into .ad files which means it'll probably be even trickier to maintain. Hi Claes, thanks for your help and review! Yeah, looks like more cleanup could be done, but we need to fix this severe issue urgently. I'll integrate it if there are no objections. ------------- PR: https://git.openjdk.java.net/jdk/pull/1165 From thartmann at openjdk.java.net Wed Nov 11 15:25:01 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 11 Nov 2020 15:25:01 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: References: Message-ID: <7aT2OP6ejurk12ctDWFIeXU6z1yiMVYrC7mKBc9hX9o=.65b4da4c-97da-4d7b-9f25-f11b15d6cfa6@github.com> On Wed, 11 Nov 2020 13:50:38 GMT, Martin Doerr wrote: > C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. > Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1165 From redestad at openjdk.java.net Wed Nov 11 15:31:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 11 Nov 2020 15:31:58 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: <7aT2OP6ejurk12ctDWFIeXU6z1yiMVYrC7mKBc9hX9o=.65b4da4c-97da-4d7b-9f25-f11b15d6cfa6@github.com> References: <7aT2OP6ejurk12ctDWFIeXU6z1yiMVYrC7mKBc9hX9o=.65b4da4c-97da-4d7b-9f25-f11b15d6cfa6@github.com> Message-ID: On Wed, 11 Nov 2020 15:22:19 GMT, Tobias Hartmann wrote: >> C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. >> Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. > > Marked as reviewed by thartmann (Reviewer). No objection from me, and I think this is both urgent and simple enough to not have to wait for objections. ------------- PR: https://git.openjdk.java.net/jdk/pull/1165 From mdoerr at openjdk.java.net Wed Nov 11 15:31:59 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 11 Nov 2020 15:31:59 GMT Subject: RFR: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: References: <7aT2OP6ejurk12ctDWFIeXU6z1yiMVYrC7mKBc9hX9o=.65b4da4c-97da-4d7b-9f25-f11b15d6cfa6@github.com> Message-ID: On Wed, 11 Nov 2020 15:25:36 GMT, Claes Redestad wrote: >> Marked as reviewed by thartmann (Reviewer). > > No objection from me, and I think this is both urgent and simple enough to not have to wait for objections. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1165 From mdoerr at openjdk.java.net Wed Nov 11 15:32:00 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 11 Nov 2020 15:32:00 GMT Subject: Integrated: 8256166: [C2] Registers get confused on Big Endian after 8221404 In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 13:50:38 GMT, Martin Doerr wrote: > C2 crashes with broken register encoding on Big Endian platforms after JDK-8221404 was pushed. > Reason is that int values "_RM_I" are pairwise swapped on 64 bit Big Endian platforms. This pull request has now been integrated. Changeset: 436019b8 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/436019b8 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8256166: [C2] Registers get confused on Big Endian after 8221404 Reviewed-by: redestad, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1165 From neliasso at openjdk.java.net Wed Nov 11 16:12:05 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 11 Nov 2020 16:12:05 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v13] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 07:23:07 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - ... and 4 more: https://git.openjdk.java.net/jdk/compare/5dfb42fc...ed343a9e src/hotspot/share/opto/cfgnode.hpp line 104: > 102: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); > 103: virtual const RegMask &out_RegMask() const; > 104: bool try_clean_mem_phi(PhaseGVN *phase); This changed line looks like a mistake. Please revert. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From neliasso at openjdk.java.net Wed Nov 11 16:12:03 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 11 Nov 2020 16:12:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 10:28:00 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Replacing explicit type checks with existing type checking routines > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a Do you have any tests that exercise the different possible versions? - dynamic length with both small and long copies - dynamic length that can be proven always less than PartialInliningSize - constant size less than PartialInliningSize Except for these minor comments, and the tests, I am ready to approve. src/hotspot/share/opto/cfgnode.cpp line 397: > 395: } > 396: > 397: Remove unnecessary empty line src/hotspot/share/opto/node.hpp line 162: > 160: class StoreVectorScatterNode; > 161: class VectorMaskCmpNode; > 162: Remove line break or move it down one line below "class VectorSet;" ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From zgu at openjdk.java.net Wed Nov 11 16:53:01 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 11 Nov 2020 16:53:01 GMT Subject: RFR: 8256051: nmethod_entry_barrier stub miscalculates xmm spill size on x86_32 Message-ID: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> nmethod_entry_barrier stub miscalculates xmm spill size on x86_32, instead of 4 * wordSize, it uses 2 * wordSize. This bug only affects Shenandoah, as it is the only GC that supports concurrent class unloading on x86_32. ------------- Commit messages: - Fix xmm spill size on x86_32 Changes: https://git.openjdk.java.net/jdk/pull/1170/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1170&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256051 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1170.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1170/head:pull/1170 PR: https://git.openjdk.java.net/jdk/pull/1170 From redestad at openjdk.java.net Wed Nov 11 17:41:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 11 Nov 2020 17:41:04 GMT Subject: RFR: 8256203: Simplify RegMask::Empty Message-ID: - Simplify RegMask::Empty to use default constructor. - Add missing validation in the empty constructor. ------------- Commit messages: - Simplify RegMask::Empty Changes: https://git.openjdk.java.net/jdk/pull/1167/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1167&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256203 Stats: 9 lines in 2 files changed: 2 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1167.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1167/head:pull/1167 PR: https://git.openjdk.java.net/jdk/pull/1167 From shade at openjdk.java.net Wed Nov 11 17:41:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 17:41:57 GMT Subject: RFR: 8256051: nmethod_entry_barrier stub miscalculates xmm spill size on x86_32 In-Reply-To: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> References: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> Message-ID: On Wed, 11 Nov 2020 16:46:43 GMT, Zhengyu Gu wrote: > nmethod_entry_barrier stub miscalculates xmm spill size on x86_32, instead of 4 * wordSize, it uses 2 * wordSize. > > This bug only affects Shenandoah, as it is the only GC that supports concurrent class unloading on x86_32. D'oh. Seems like a simple error while copy-pasting from `stubGenerator_x86_64.cpp`. Looks fine and trivial to me. I ran `x86_32` + `tier1` + `Shenandoah` tests on my own with this change, and it improves the test results a lot. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1170 From redestad at openjdk.java.net Wed Nov 11 18:55:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 11 Nov 2020 18:55:00 GMT Subject: RFR: 8256205: Simplify compiler calling convention handling Message-ID: Clean up and simplify some of the calling convention handling: - Remove Matcher::calling_convention/c_calling_convention and replace select few call sites with direct calls to SharedRuntime - Remove unused is_outgoing args. Since the SPARC removal the is_outgoing has no effect on the calling_convention or return_value methods. - Move in_preserved_stack_slots to SharedRuntime to match out_preserved_stack_slots. This has a tiny positive impact by reducing calls and improving inlining (at least gcc has a hard time inlining anything that goes in the .ad files, even when it's defined to the same class), but is mainly a cleanup effort. Testing: Oracle tier1-2 testing; S390, PPC and ARM32 builds ------------- Commit messages: - trailing whitespace - Merge branch 'master' into call_conv - Merge branch 'master' into call_conv - Remove in_preserve_stack_slots to SharedRuntime - Fix missing return_value fixup - Drop out_preserve_stack_slots from adlc output - Clean-up calling_convention handling Changes: https://git.openjdk.java.net/jdk/pull/1168/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1168&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256205 Stats: 359 lines in 31 files changed: 54 ins; 255 del; 50 mod Patch: https://git.openjdk.java.net/jdk/pull/1168.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1168/head:pull/1168 PR: https://git.openjdk.java.net/jdk/pull/1168 From mchung at openjdk.java.net Wed Nov 11 18:57:04 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 11 Nov 2020 18:57:04 GMT Subject: RFR: 8230501: Class data support for hidden classes Message-ID: Provide the `Lookup::defineHiddenClassWithClassData` API that allows live objects be shared between a hidden class and other classes. A hidden class can load these live objects as dynamically-computed constants via this API. Specdiff http://cr.openjdk.java.net/~mchung/jdk16/webrevs/8230501/specdiff/overview-summary.html With this class data support and hidden classes, `sun.misc.Unsafe::defineAnonymousClass` will be deprecated for removal. Existing libraries should replace their calls to `sun.misc.Unsafe::defineAnonymousClass` with `Lookup::defineHiddenClass` or `Lookup::defineHiddenClassWithClassData`. This patch also updates the implementation of lambda meta factory and `MemoryAccessVarHandleGenerator` to use class data. No performance difference observed in the jdk.incubator.foreign microbenchmarks. A side note: `MemoryAccessVarHandleGenerator` is removed in the upcoming integration of JDK-8254162 but it helps validating the class data support. Background ---------- This is an enhancement following up JEP 371: Hidden Classes w.r.t. "Constant-pool patching" in the "Risks and Assumption" section. A VM-anonymous class can be defined with its constant-pool entries already resolved to concrete values. This allows critical constants to be shared between a VM-anonymous class and the language runtime that defines it, and between multiple VM-anonymous classes. For example, a language runtime will often have `MethodHandle` objects in its address space that would be useful to newly-defined VM-anonymous classes. Instead of the runtime serializing the objects to constant-pool entries in VM-anonymous classes and then generating bytecode in those classes to laboriously `ldc` the entries, the runtime can simply supply `Unsafe::defineAnonymousClass` with references to its live objects. The relevant constant-pool entries in the newly-defined VM-anonymous class are pre-linked to those objects, improving performance and reducing footprint. In addition, this allows VM-anonymous classes to refer to each other: Constant-pool entries in a class file are based on names. They thus cannot refer to nameless VM-anonymous classes. A language runtime can, however, easily track the live Class objects for its VM-anonymous classes and supply them to `Unsafe::defineAnonymousClass`, thus pre-linking the new class's constant pool entries to other VM-anonymous classes. This extends the hidden classes to allow live objects to be injected in a hidden class and loaded them via condy. Details ------- A new `Lookup::defineHiddenClassWithClassData` API takes additional `classData` argument compared to `Lookup::defineHiddenClass`. Class data can be method handles, lookup objects, arbitrary user objects or collections of all of the above. This method behaves as if calling `Lookup::defineHiddenClass` to define a hidden class with a private static unnamed field that is initialized with `classData` at the first instruction of the class initializer. `MethodHandles::classData(Lookup lookup, String name, Class type)` is a bootstrap method to load the class data of the given lookup's lookup class. The hidden class will be initialized when `classData` method is called if the hidden class has not been initialized. For a class data containing more than one single element, libraries can create their convenience method to load a single live object via condy. We can reconsider if such a convenience method is needed in the future. Frameworks sometimes want to dynamically create a hidden class (HC) and add it it the lookup class nest and have HC to carry secrets hidden from that nest. In this case, frameworks should not to use private static finals (in the HCs they spin) to hold secrets because a nestmate of HC may obtain access to such a private static final and observe the framework's secret. It should use condy. In addition, we need to differentiate if a lookup object is created from the original lookup class or created from teleporting e.g. `Lookup::in` and `MethodHandles::privateLookupIn`. This proposes to add a new `ORIGINAL` bit that is only set if the lookup object is created by `MethodHandles::lookup` or by bootstrap method invocation. The operations only apply to a Lookup object with original access are: - create method handles for caller-sensitve methods - obtain class data associated with the lookup class No change to `Lookup::hasFullPrivilegeAccess` and `Lookup::toString` which ignores the ORIGINAL bit. Compatibility Risks ------------------- `Lookup::lookupModes` includes a new `ORIGINAL` bit. Most lookup operations ignore this original bit except creating method handles for caller-sensitive methods that expects the lookup from the original lookup class. Existing code compares the return value of `lookupModes` to be a fixed value may be impacted. However existing client has no need to expect a fixed value of lookup modes. The incompatibility risk of this spec change is low. ------------- Commit messages: - fix incorrect merge - more clean up - merge - Keep classDataAt package-private - Merge branch 'master' of https://github.com/openjdk/jdk into class-data - MethodHandles::hasFullPrivilegeAccess and Lookup::toString ignores ORIGINAL bit - revert some test changes - Merge branch 'master' of https://github.com/openjdk/jdk into class-data - Add ORIGINAL access - Merge branch 'master' of https://github.com/openjdk/jdk into class-data - ... and 5 more: https://git.openjdk.java.net/jdk/compare/2e19026d...5a3e29ba Changes: https://git.openjdk.java.net/jdk/pull/1171/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1171&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8230501 Stats: 970 lines in 16 files changed: 778 ins; 80 del; 112 mod Patch: https://git.openjdk.java.net/jdk/pull/1171.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1171/head:pull/1171 PR: https://git.openjdk.java.net/jdk/pull/1171 From forax at univ-mlv.fr Wed Nov 11 19:01:41 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 11 Nov 2020 20:01:41 +0100 (CET) Subject: RFR: 8230501: Class data support for hidden classes In-Reply-To: References: Message-ID: <1180961384.2348941.1605121301491.JavaMail.zimbra@u-pem.fr> Hi Mandy, maybe a stupid question but why this mechanism is limited to hidden classes ? regards, R?mi ----- Mail original ----- > De: "Mandy Chung" > ?: "core-libs-dev" , "hotspot compiler" > Envoy?: Mercredi 11 Novembre 2020 19:57:04 > Objet: RFR: 8230501: Class data support for hidden classes > Provide the `Lookup::defineHiddenClassWithClassData` API that allows live > objects > be shared between a hidden class and other classes. A hidden class can load > these live objects as dynamically-computed constants via this API. > > Specdiff > http://cr.openjdk.java.net/~mchung/jdk16/webrevs/8230501/specdiff/overview-summary.html > > With this class data support and hidden classes, > `sun.misc.Unsafe::defineAnonymousClass` > will be deprecated for removal. Existing libraries should replace their > calls to `sun.misc.Unsafe::defineAnonymousClass` with > `Lookup::defineHiddenClass` > or `Lookup::defineHiddenClassWithClassData`. > > This patch also updates the implementation of lambda meta factory and > `MemoryAccessVarHandleGenerator` to use class data. No performance difference > observed in the jdk.incubator.foreign microbenchmarks. A side note: > `MemoryAccessVarHandleGenerator` is removed in the upcoming integration of > JDK-8254162 but it helps validating the class data support. > > Background > ---------- > > This is an enhancement following up JEP 371: Hidden Classes w.r.t. > "Constant-pool patching" in the "Risks and Assumption" section. > > A VM-anonymous class can be defined with its constant-pool entries already > resolved to concrete values. This allows critical constants to be shared > between a VM-anonymous class and the language runtime that defines it, and > between multiple VM-anonymous classes. For example, a language runtime will > often have `MethodHandle` objects in its address space that would be useful > to newly-defined VM-anonymous classes. Instead of the runtime serializing > the objects to constant-pool entries in VM-anonymous classes and then > generating bytecode in those classes to laboriously `ldc` the entries, > the runtime can simply supply `Unsafe::defineAnonymousClass` with references > to its live objects. The relevant constant-pool entries in the newly-defined > VM-anonymous class are pre-linked to those objects, improving performance > and reducing footprint. In addition, this allows VM-anonymous classes to > refer to each other: Constant-pool entries in a class file are based on names. > They thus cannot refer to nameless VM-anonymous classes. A language runtime can, > however, easily track the live Class objects for its VM-anonymous classes and > supply them to `Unsafe::defineAnonymousClass`, thus pre-linking the new class's > constant pool entries to other VM-anonymous classes. > > This extends the hidden classes to allow live objects to be injected > in a hidden class and loaded them via condy. > > Details > ------- > > A new `Lookup::defineHiddenClassWithClassData` API takes additional > `classData` argument compared to `Lookup::defineHiddenClass`. > Class data can be method handles, lookup objects, arbitrary user objects > or collections of all of the above. > > This method behaves as if calling `Lookup::defineHiddenClass` to define > a hidden class with a private static unnamed field that is initialized > with `classData` at the first instruction of the class initializer. > > `MethodHandles::classData(Lookup lookup, String name, Class type)` > is a bootstrap method to load the class data of the given lookup's lookup class. > The hidden class will be initialized when `classData` method is called if > the hidden class has not been initialized. > > For a class data containing more than one single element, libraries can > create their convenience method to load a single live object via condy. > We can reconsider if such a convenience method is needed in the future. > > Frameworks sometimes want to dynamically create a hidden class (HC) and add it > it the lookup class nest and have HC to carry secrets hidden from that nest. > In this case, frameworks should not to use private static finals (in the HCs > they spin) to hold secrets because a nestmate of HC may obtain access to > such a private static final and observe the framework's secret. It should use > condy. In addition, we need to differentiate if a lookup object is created from > the original lookup class or created from teleporting e.g. `Lookup::in` > and `MethodHandles::privateLookupIn`. > > This proposes to add a new `ORIGINAL` bit that is only set if the lookup > object is created by `MethodHandles::lookup` or by bootstrap method invocation. > The operations only apply to a Lookup object with original access are: > - create method handles for caller-sensitve methods > - obtain class data associated with the lookup class > > No change to `Lookup::hasFullPrivilegeAccess` and `Lookup::toString` which > ignores the ORIGINAL bit. > > > Compatibility Risks > ------------------- > > `Lookup::lookupModes` includes a new `ORIGINAL` bit. Most lookup operations > ignore this original bit except creating method handles for caller-sensitive > methods > that expects the lookup from the original lookup class. Existing code compares > the return value of `lookupModes` to be a fixed value may be impacted. However > existing client has no need to expect a fixed value of lookup modes. > The incompatibility risk of this spec change is low. > > ------------- > > Commit messages: > - fix incorrect merge > - more clean up > - merge > - Keep classDataAt package-private > - Merge branch 'master' of https://github.com/openjdk/jdk into class-data > - MethodHandles::hasFullPrivilegeAccess and Lookup::toString ignores ORIGINAL > bit > - revert some test changes > - Merge branch 'master' of https://github.com/openjdk/jdk into class-data > - Add ORIGINAL access > - Merge branch 'master' of https://github.com/openjdk/jdk into class-data > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/2e19026d...5a3e29ba > > Changes: https://git.openjdk.java.net/jdk/pull/1171/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1171&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8230501 > Stats: 970 lines in 16 files changed: 778 ins; 80 del; 112 mod > Patch: https://git.openjdk.java.net/jdk/pull/1171.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1171/head:pull/1171 > > PR: https://git.openjdk.java.net/jdk/pull/1171 From zgu at openjdk.java.net Wed Nov 11 19:15:57 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 11 Nov 2020 19:15:57 GMT Subject: Integrated: 8256051: nmethod_entry_barrier stub miscalculates xmm spill size on x86_32 In-Reply-To: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> References: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> Message-ID: <0qb1bbBFIQOhMJ27jemslBFuCplazCNVRm9kdNU3eTs=.097ce556-ba33-4b2d-a94e-084d99953b41@github.com> On Wed, 11 Nov 2020 16:46:43 GMT, Zhengyu Gu wrote: > nmethod_entry_barrier stub miscalculates xmm spill size on x86_32, instead of 4 * wordSize, it uses 2 * wordSize. > > This bug only affects Shenandoah, as it is the only GC that supports concurrent class unloading on x86_32. This pull request has now been integrated. Changeset: bfa060f0 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/bfa060f0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8256051: nmethod_entry_barrier stub miscalculates xmm spill size on x86_32 Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1170 From zgu at openjdk.java.net Wed Nov 11 19:15:55 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 11 Nov 2020 19:15:55 GMT Subject: RFR: 8256051: nmethod_entry_barrier stub miscalculates xmm spill size on x86_32 In-Reply-To: References: <5rwWQ1KNBVs3x8STyNXBWYLrP43atygm6TwaF-4D5wM=.4f9b85ec-b5af-4515-90b0-885ec817b7a6@github.com> Message-ID: On Wed, 11 Nov 2020 17:39:35 GMT, Aleksey Shipilev wrote: > D'oh. Seems like a simple error while copy-pasting from `stubGenerator_x86_64.cpp`. Looks fine and trivial to me. I ran `x86_32` + `tier1` + `Shenandoah` tests on my own with this change, and it improves the test results a lot. Thanks, Aleksey. ------------- PR: https://git.openjdk.java.net/jdk/pull/1170 From redestad at openjdk.java.net Wed Nov 11 20:14:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 11 Nov 2020 20:14:00 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes Message-ID: This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. ------------- Commit messages: - Remove Matcher::pass_original_key_for_aes Changes: https://git.openjdk.java.net/jdk/pull/1175/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1175&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256238 Stats: 112 lines in 9 files changed: 0 ins; 101 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1175.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1175/head:pull/1175 PR: https://git.openjdk.java.net/jdk/pull/1175 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:35:17 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:35:17 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v4] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/757192c3..fc1dff49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=02-03 Stats: 69 lines in 3 files changed: 65 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:39:14 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:39:14 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/fc1dff49..b23c8cba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:50:55 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:50:55 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On Mon, 9 Nov 2020 22:10:07 GMT, Xubo Zhang wrote: >> The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. > > Hi Darcy, > Where should the test be? A new test file? > > Best regards, > Xubo > > From: Joe Darcy > Sent: Monday, November 9, 2020 1:33 PM > To: openjdk/jdk > Cc: Zhang, Xubo ; Mention > Subject: Re: [openjdk/jdk] 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms (#894) > > > @jddarcy commented on this pull request. > > ________________________________ > > In test/jdk/java/lang/Math/WorstCaseTests.java: > >> @@ -114,8 +114,8 @@ private static int testWorstExp() { > > {+0x1.A8EAD058BC6B8p3, 0x1.1D71965F516ADp19}, > > {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, > > {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, > > - {+0x4.0p8, Double.POSITIVE_INFINITY}, > > - {+0x2.71p12, Double.POSITIVE_INFINITY}, > > + {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 > > This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. > > ? > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. Sure. I reversed the changes in WorstCaseTests.java, and added a new test file ExpCornerCaseTests.java in test/jdk/java/lang/Math ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From vlivanov at openjdk.java.net Wed Nov 11 22:18:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 11 Nov 2020 22:18:58 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 00:40:53 GMT, Vladimir Kozlov wrote: > We did record that compiled code has wide vectors. > May be we need specialized depot stub for 512 bits vectors As we discussed with Sandhya, it is doable, but will introduce additional complexity. And benefits are not clear: deoptimization is far from performace critical. I made an artificial experiment and composed a micro-benchmark which performs 100k deoptimizations in a row without triggering any recompilations. The results (on a Skylake server) are: -XX:AVX =3 =2 =0 save_vectors=OFF 4,6s 4,4s 4,4s save_vectors=ON 5,5s 4,4s 4,4s Saving full ZMM0-31 register state adds ~20% of execution time when continuously deoptimizing trivial method (just a single call), but when lazy deoptimization behaves normally (deoptimized code is thrown away along the way), there's basically no difference in scores because the time is spent in interpreter waiting for recompiled code to arrive. ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From github.com+51720475+kdnilsen at openjdk.java.net Thu Nov 12 02:35:55 2020 From: github.com+51720475+kdnilsen at openjdk.java.net (Kelvin Nilsen) Date: Thu, 12 Nov 2020 02:35:55 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 17:15:16 GMT, Xin Liu wrote: > 8254807: Optimize startsWith() for String.substring() src/hotspot/share/opto/stringopts.cpp line 54: > 52: return false; > 53: } > 54: Was there a reason to shift the order of this function with respect to class StringConcat declaration? The change in ordering makes it more difficult to spot the differences, if any. src/hotspot/share/opto/stringopts.cpp line 572: > 570: } > 571: > 572: void PhaseStringOpts::eliminate_call(CallNode* call, CallProjections& projs) { Similar comment here. Could the eliminate_call() function after patch be in same sequence as before patch? test/hotspot/jtreg/compiler/c2/TestOptimizeSubstring.java line 77: > 75: String newStringAlloc = /*call ,*/"static wrapper for: _new_array_nozero_Java"; > 76: try { > 77: oa = ProcessTools.executeTestJvm("-XX:+UnlockDiagnosticVMOptions", "-Xbootclasspath/a:.", Is this comment identifying a future TODO item? That's not entirely clear to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/704 From xliu at openjdk.java.net Thu Nov 12 06:07:08 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 12 Nov 2020 06:07:08 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic Message-ID: 8247732: validate user-input intrinsic_ids in ControlIntrinsic ------------- Commit messages: - 8247732: validate user-input intrinsic_ids in ControlIntrinsic Changes: https://git.openjdk.java.net/jdk/pull/1179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8247732 Stats: 545 lines in 31 files changed: 522 ins; 2 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From xliu at openjdk.java.net Thu Nov 12 06:32:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 12 Nov 2020 06:32:56 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 22:57:50 GMT, Kelvin Nilsen wrote: >> 8254807: Optimize startsWith() for String.substring() > > test/hotspot/jtreg/compiler/c2/TestOptimizeSubstring.java line 77: > >> 75: String newStringAlloc = /*call ,*/"static wrapper for: _new_array_nozero_Java"; >> 76: try { >> 77: oa = ProcessTools.executeTestJvm("-XX:+UnlockDiagnosticVMOptions", "-Xbootclasspath/a:.", > > Is this comment identifying a future TODO item? That's not entirely clear to me. It's not a TODO. I try to explain why I remove the opcode call here. it's because x86_32.ad uses the capital opcode CALL, which is different from all other architectures. Unfortunately, `oa.shouldNotContain` and `oa.shouldContain` don't support case-insensitive comparison. Without the opcode, newStringAlloc is still a unique and portable pattern to check `OptimizeSubstring` takes effect or not. ------------- PR: https://git.openjdk.java.net/jdk/pull/704 From jbhateja at openjdk.java.net Thu Nov 12 10:10:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 12 Nov 2020 10:10:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: <0_LT7bB5ut9xNX4zaFidW8UBC3lFRMh91qaMtBw21nI=.d0cedf97-d1cb-43c0-bb69-04495f611bb3@github.com> On Wed, 11 Nov 2020 16:09:20 GMT, Nils Eliasson wrote: > Do you have any tests that exercise the different possible versions? > > * dynamic length with both small and long copies > * dynamic length that can be proven always less than PartialInliningSize > * constant size less than PartialInliningSize > > Except for these minor comments, and the tests, I am ready to approve. Hi Nils, Thanks for your comments, Suggested tests have already been added as the part of commit for JDK-8252847 test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyConjoint.java test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjoint.java I shall remove extra spaces before integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From thartmann at openjdk.java.net Thu Nov 12 10:17:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 12 Nov 2020 10:17:56 GMT Subject: RFR: 8256203: Simplify RegMask::Empty In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:22:59 GMT, Claes Redestad wrote: > - Simplify RegMask::Empty to use default constructor. > - Add missing validation in the empty constructor. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1167 From thartmann at openjdk.java.net Thu Nov 12 10:20:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 12 Nov 2020 10:20:00 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 20:05:17 GMT, Claes Redestad wrote: > This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. Otherwise looks good to me. src/hotspot/share/opto/library_call.cpp line 5785: > 5783: > 5784: // Call the stub, passing src_start, dest_start, k_start, r_start and src_len > 5785: Node* ecbCrypt = make_runtime_call(RC_LEAF | RC_NO_FP, Indentation should be fixed. src/hotspot/share/opto/library_call.cpp line 5864: > 5862: > 5863: // Call the stub, passing src_start, dest_start, k_start, r_start and src_len > 5864: Node* ctrCrypt = make_runtime_call(RC_LEAF|RC_NO_FP, Indentation should be fixed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1175 From redestad at openjdk.java.net Thu Nov 12 10:35:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 10:35:08 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes [v2] In-Reply-To: References: Message-ID: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> > This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1175/files - new: https://git.openjdk.java.net/jdk/pull/1175/files/81d84ed8..4d118e01 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1175&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1175&range=00-01 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1175.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1175/head:pull/1175 PR: https://git.openjdk.java.net/jdk/pull/1175 From shade at openjdk.java.net Thu Nov 12 10:50:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 12 Nov 2020 10:50:03 GMT Subject: RFR: 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg Message-ID: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> This failure manifests on many tests in `tier1`: $ CONF=linux-x86-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:UseSSE=1" # Internal Error (/home/shade/trunks/jdk/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp:794), pid=1484789, tid=1484820 # assert(false) failed: missed a fpu-operation # # JRE version: OpenJDK Runtime Environment (16.0) (fastdebug build 16-internal+0-adhoc.shade.jdk) # Java VM: OpenJDK Server VM (fastdebug 16-internal+0-adhoc.shade.jdk, mixed mode, tiered, g1 gc, linux-x86) # Problematic frame: # V [libjvm.so+0x6d6300] FpuStackAllocator::handle_op2(LIR_Op2*)+0xf0 Amending that assert implies we miss "neg". I believe it was missed when [JDK-8210764](https://bugs.openjdk.java.net/browse/JDK-8210764) changed `lir_neg` from `LIR_Op1` to `LIR_Op2`. At first I just moved the block to appropriate switch that handles `op2`, but then I realized `lir_neg` code is basically the same as for `lir_abs` in the same switch. Testing: - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=0` (some leftover failures) - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=1` (some leftover failures) ------------- Commit messages: - 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg Changes: https://git.openjdk.java.net/jdk/pull/1173/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1173&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256220 Stats: 17 lines in 1 file changed: 1 ins; 15 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1173.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1173/head:pull/1173 PR: https://git.openjdk.java.net/jdk/pull/1173 From thartmann at openjdk.java.net Thu Nov 12 11:02:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 12 Nov 2020 11:02:57 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes [v2] In-Reply-To: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> References: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> Message-ID: On Thu, 12 Nov 2020 10:35:08 GMT, Claes Redestad wrote: >> This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1175 From chagedorn at openjdk.java.net Thu Nov 12 11:21:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 12 Nov 2020 11:21:57 GMT Subject: RFR: 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg In-Reply-To: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> References: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> Message-ID: On Wed, 11 Nov 2020 19:32:44 GMT, Aleksey Shipilev wrote: > This failure manifests on many tests in `tier1`: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:UseSSE=1" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp:794), pid=1484789, tid=1484820 > # assert(false) failed: missed a fpu-operation > # > # JRE version: OpenJDK Runtime Environment (16.0) (fastdebug build 16-internal+0-adhoc.shade.jdk) > # Java VM: OpenJDK Server VM (fastdebug 16-internal+0-adhoc.shade.jdk, mixed mode, tiered, g1 gc, linux-x86) > # Problematic frame: > # V [libjvm.so+0x6d6300] FpuStackAllocator::handle_op2(LIR_Op2*)+0xf0 > > Amending that assert implies we miss "neg". I believe it was missed when [JDK-8210764](https://bugs.openjdk.java.net/browse/JDK-8210764) changed `lir_neg` from `LIR_Op1` to `LIR_Op2`. At first I just moved the block to appropriate switch that handles `op2`, but then I realized `lir_neg` code is basically the same as for `lir_abs` in the same switch. > > Testing: > - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=0` (some leftover failures) > - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=1` (some leftover failures) Yes, that block seems to do the same - looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1173 From chagedorn at openjdk.java.net Thu Nov 12 11:36:56 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 12 Nov 2020 11:36:56 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes [v2] In-Reply-To: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> References: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> Message-ID: On Thu, 12 Nov 2020 10:35:08 GMT, Claes Redestad wrote: >> This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1175 From chagedorn at openjdk.java.net Thu Nov 12 12:04:56 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 12 Nov 2020 12:04:56 GMT Subject: RFR: 8256203: Simplify RegMask::Empty In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:22:59 GMT, Claes Redestad wrote: > - Simplify RegMask::Empty to use default constructor. > - Add missing validation in the empty constructor. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1167 From shade at openjdk.java.net Thu Nov 12 13:11:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 12 Nov 2020 13:11:04 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE Message-ID: Reproduces like this: $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" STDOUT: ### NanTest started Written and read back float values match 0x7F800001 0x7F800001 STDERR: java.lang.RuntimeException: Original and read back double values mismatch 0xFFF0000000000001 0xFFF8000000000001 at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. Additional testing: - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` - [x] Affected test on Linux AArch64 ------------- Commit messages: - 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE Changes: https://git.openjdk.java.net/jdk/pull/1187/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256267 Stats: 59 lines in 1 file changed: 41 ins; 11 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/1187.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1187/head:pull/1187 PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Thu Nov 12 13:17:10 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 12 Nov 2020 13:17:10 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: References: Message-ID: > Reproduces like this: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" > > STDOUT: > ### NanTest started > Written and read back float values match > 0x7F800001 0x7F800001 > STDERR: > java.lang.RuntimeException: Original and read back double values mismatch > 0xFFF0000000000001 0xFFF8000000000001 > > at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) > at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) > > After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. > > Additional testing: > - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux AArch64 Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Indents and comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1187/files - new: https://git.openjdk.java.net/jdk/pull/1187/files/28dccf10..a47ee04c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1187.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1187/head:pull/1187 PR: https://git.openjdk.java.net/jdk/pull/1187 From roland at openjdk.java.net Thu Nov 12 14:18:56 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 14:18:56 GMT Subject: RFR: 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 13:15:34 GMT, Tobias Hartmann wrote: >> PhiNode::Value() has special logic to compute the type of an iv phi >> based on the counted loop's init value and limit. The type is >> recomputed from scratch on every call to PhiNode::Value(). As the loop >> is transformed, PhiNode::Value() may return a type for the iv phi that >> widens. For instance, for: >> >> for (int i = 0; i < 100; i++) { >> } >> >> PhiNode::value() returns accurate bounds for the iv phi. But if the >> loop is transformed to pre/main/post loops, the init and limit no >> longer have types that no longer allow an accurate computation of the >> iv phi bounds. >> >> The fix is to filter with the recorded _type for the Phi on every call >> of PhiNode::Value(). >> >> This change was considered before (by Christian) but was not proposed >> for integration because of a performance regression on a micro >> benchmark. I investigated the performance regression and added my >> findings to the bug report. While I'm not 100% sure I found the root >> cause of the regression, the differences I see in the ideal graph of >> the hottest methods of the micro benchmark with the change are fairly >> small and I don't think that regression should block this fix. > > Looks good to me. @TobiHartmann @chhagedorn thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/1114 From roland at openjdk.java.net Thu Nov 12 14:18:57 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 14:18:57 GMT Subject: Integrated: 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops In-Reply-To: References: Message-ID: <6J4bbfU3165zEpAeLXwpkDKgUaO57tmhy04BpevS9VY=.a2f55459-8775-4468-b44b-ebd24abf4b76@github.com> On Mon, 9 Nov 2020 08:26:17 GMT, Roland Westrelin wrote: > PhiNode::Value() has special logic to compute the type of an iv phi > based on the counted loop's init value and limit. The type is > recomputed from scratch on every call to PhiNode::Value(). As the loop > is transformed, PhiNode::Value() may return a type for the iv phi that > widens. For instance, for: > > for (int i = 0; i < 100; i++) { > } > > PhiNode::value() returns accurate bounds for the iv phi. But if the > loop is transformed to pre/main/post loops, the init and limit no > longer have types that no longer allow an accurate computation of the > iv phi bounds. > > The fix is to filter with the recorded _type for the Phi on every call > of PhiNode::Value(). > > This change was considered before (by Christian) but was not proposed > for integration because of a performance regression on a micro > benchmark. I investigated the performance regression and added my > findings to the bug report. While I'm not 100% sure I found the root > cause of the regression, the differences I see in the ideal graph of > the hottest methods of the micro benchmark with the change are fairly > small and I don't think that regression should block this fix. This pull request has now been integrated. Changeset: 70c7b1d9 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/70c7b1d9 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8250607: C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1114 From redestad at openjdk.java.net Thu Nov 12 14:23:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 14:23:55 GMT Subject: RFR: 8256203: Simplify RegMask::Empty In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 10:15:00 GMT, Tobias Hartmann wrote: >> - Simplify RegMask::Empty to use default constructor. >> - Add missing validation in the empty constructor. > > Looks good to me. @TobiHartmann @chhagedorn - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1167 From redestad at openjdk.java.net Thu Nov 12 14:23:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 14:23:57 GMT Subject: Integrated: 8256203: Simplify RegMask::Empty In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:22:59 GMT, Claes Redestad wrote: > - Simplify RegMask::Empty to use default constructor. > - Add missing validation in the empty constructor. This pull request has now been integrated. Changeset: f7685a46 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/f7685a46 Stats: 9 lines in 2 files changed: 2 ins; 5 del; 2 mod 8256203: Simplify RegMask::Empty Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/1167 From redestad at openjdk.java.net Thu Nov 12 14:24:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 14:24:59 GMT Subject: RFR: 8256238: Remove Matcher::pass_original_key_for_aes [v2] In-Reply-To: References: <74AN9UVPEueFvmcWBUSb9kb9fUIkT2zPtek-QrBZnPA=.2bcbbe03-ac0a-49d2-b322-88d2a3fa0a79@github.com> Message-ID: On Thu, 12 Nov 2020 10:59:53 GMT, Tobias Hartmann wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann @chhagedorn - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1175 From redestad at openjdk.java.net Thu Nov 12 14:25:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 14:25:00 GMT Subject: Integrated: 8256238: Remove Matcher::pass_original_key_for_aes In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 20:05:17 GMT, Claes Redestad wrote: > This removes Matcher::pass_original_key_for_aes() and related code. This was added to workaround some limitations with the AES provider on SPARC and is now effectively dead code. This pull request has now been integrated. Changeset: 19bade02 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/19bade02 Stats: 118 lines in 9 files changed: 0 ins; 101 del; 17 mod 8256238: Remove Matcher::pass_original_key_for_aes Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/1175 From roland at openjdk.java.net Thu Nov 12 14:26:56 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 14:26:56 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 13:25:38 GMT, Tobias Hartmann wrote: >> A loop's trip count is computed to have exact trip count 6. Then: >> >> 1- pre/main/post loops are created which brings the trip count from 6 >> to 5 >> 2- main loop is unrolled which brings the trip count to 2 >> 3- main loop is peeled: Trip count is 1 >> 4- pre/main/post loops are created again. Trip count of main loop is 0. >> 5- peeling is attempted again and the assert fires >> >> IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip >> count is 1. I propose that IdealLoopTree::policy_range_check() (that >> causes the pre/main/post loops insertion the second time) performs the >> same check so step 4 doesn't happen. > > Looks good to me. @TobiHartmann thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From roland at openjdk.java.net Thu Nov 12 14:26:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 14:26:58 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 08:24:29 GMT, Christian Hagedorn wrote: >> A loop's trip count is computed to have exact trip count 6. Then: >> >> 1- pre/main/post loops are created which brings the trip count from 6 >> to 5 >> 2- main loop is unrolled which brings the trip count to 2 >> 3- main loop is peeled: Trip count is 1 >> 4- pre/main/post loops are created again. Trip count of main loop is 0. >> 5- peeling is attempted again and the assert fires >> >> IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip >> count is 1. I propose that IdealLoopTree::policy_range_check() (that >> causes the pre/main/post loops insertion the second time) performs the >> same check so step 4 doesn't happen. > > src/hotspot/share/opto/loopTransform.cpp line 1066: > >> 1064: Node *trip_counter = cl->phi(); >> 1065: >> 1066: // check for vectorized loops, some opts are no longer needed > > Maybe you can update this comment to also include the new check. I will do that before I push it. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From redestad at openjdk.java.net Thu Nov 12 14:55:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 14:55:15 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v2] In-Reply-To: References: Message-ID: > - Remove unused (and subtly broken) Dict deep copy constructors > - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in > - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict > > This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op > Patch: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op > > (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) > > [1] when instrumenting with valgrind+callgrind Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into copy_shared_dict - Clean-up, copyrights - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1186/files - new: https://git.openjdk.java.net/jdk/pull/1186/files/ef930d0b..bcd6adb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1186&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1186&range=00-01 Stats: 7268 lines in 189 files changed: 2728 ins; 4024 del; 516 mod Patch: https://git.openjdk.java.net/jdk/pull/1186.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1186/head:pull/1186 PR: https://git.openjdk.java.net/jdk/pull/1186 From thartmann at openjdk.java.net Thu Nov 12 15:00:01 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 12 Nov 2020 15:00:01 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v2] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 11:13:04 GMT, Roland Westrelin wrote: >> This is a Shenandoah bug but the proposed fix is in shared code. >> >> In an infinite loop, a barrier is located right after the loop head >> and above the never branch. When the barrier is expanded, control flow >> is added between the loop and the never branch. During loop >> verification the assert fires because it doesn't expect any control >> flow between the never branch and the loop head. >> >> While it would have been nice to fix this Shenandoah issue in >> Shenandoah code, I think the cleaner fix is to preserve the invariant >> that the never branch is always right after the loop head in an >> infinite loop. In the proposed patch, this is achieved by moving all >> uses of the loop head to the never branch when it's constructed. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > fix & test Otherwise looks good to me. src/hotspot/share/opto/loopnode.cpp line 4569: > 4567: set_loop(if_t, l); > 4568: > 4569: Node* cfg = NULL; // Find the One True Control User of m Should be removed. src/hotspot/share/opto/loopnode.cpp line 4576: > 4574: int nb = x->replace_edge(m, if_t); > 4575: assert(nb > 0, "use should have been updated"); > 4576: --j, jmax -= nb; Indentation is wrong. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1073 From roland at openjdk.java.net Thu Nov 12 15:22:15 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 15:22:15 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop [v2] In-Reply-To: References: Message-ID: > A loop's trip count is computed to have exact trip count 6. Then: > > 1- pre/main/post loops are created which brings the trip count from 6 > to 5 > 2- main loop is unrolled which brings the trip count to 2 > 3- main loop is peeled: Trip count is 1 > 4- pre/main/post loops are created again. Trip count of main loop is 0. > 5- peeling is attempted again and the assert fires > > IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip > count is 1. I propose that IdealLoopTree::policy_range_check() (that > causes the pre/main/post loops insertion the second time) performs the > same check so step 4 doesn't happen. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - only for normal loops - test - fix ------------- Changes: https://git.openjdk.java.net/jdk/pull/1096/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1096&range=01 Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1096.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1096/head:pull/1096 PR: https://git.openjdk.java.net/jdk/pull/1096 From roland at openjdk.java.net Thu Nov 12 15:25:55 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 15:25:55 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 14:24:25 GMT, Roland Westrelin wrote: >> Looks good to me. > > @TobiHartmann thanks for the review I added a comment a made a tweak to the fix so it only really applies to normal loops. I suppose there's a chance if pre/main/post loops exist that RCE could help a bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From roland at openjdk.java.net Thu Nov 12 15:38:07 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 15:38:07 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 14:55:43 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. > > src/hotspot/share/opto/loopnode.cpp line 4575: > >> 4573: _igvn.rehash_node_delayed(x); >> 4574: int nb = x->replace_edge(m, if_t); >> 4575: assert(nb > 0, "use should have been updated"); > > Indentation is wrong. Updated. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From roland at openjdk.java.net Thu Nov 12 15:38:06 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 12 Nov 2020 15:38:06 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v3] In-Reply-To: References: Message-ID: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - review - fix & test ------------- Changes: https://git.openjdk.java.net/jdk/pull/1073/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1073&range=02 Stats: 14 lines in 2 files changed: 6 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1073/head:pull/1073 PR: https://git.openjdk.java.net/jdk/pull/1073 From xliu at openjdk.java.net Thu Nov 12 18:45:58 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 12 Nov 2020 18:45:58 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 06:02:34 GMT, Xin Liu wrote: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Here is the conversation we had a couple months ago. https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039742.html The current behavior conforms to the table that Nils gave to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From dongbohe at openjdk.java.net Fri Nov 13 04:30:08 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Fri, 13 Nov 2020 04:30:08 GMT Subject: RFR: 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v2] In-Reply-To: References: Message-ID: > Backport 8253623 https://github.com/openjdk/panama-vector/pull/8 Dongbo He has updated the pull request incrementally with one additional commit since the last revision: 8255448 Fastdebug JVM crashes with Vector API when PrintAssembly is turned on ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/853/files - new: https://git.openjdk.java.net/jdk/pull/853/files/90918cee..5cd059b5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=00-01 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/853/head:pull/853 PR: https://git.openjdk.java.net/jdk/pull/853 From jbhateja at openjdk.java.net Fri Nov 13 05:31:10 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 13 Nov 2020 05:31:10 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Review comments resolved - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge remote-tracking branch 'upstream' into JDK-8252848 - JDK-8252848 : Review comments resolved - JDK-8252848: Review comments resolution. - JDK-8252848: Review comments addressed. - Merge remote-tracking branch 'origin' into JDK-8252848 - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=13 Stats: 532 lines in 25 files changed: 483 ins; 23 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 13 05:39:59 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 13 Nov 2020 05:39:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 16:09:20 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > Do you have any tests that exercise the different possible versions? > - dynamic length with both small and long copies > - dynamic length that can be proven always less than PartialInliningSize > - constant size less than PartialInliningSize > > Except for these minor comments, and the tests, I am ready to approve. Hi @neliasso I have resolved your outstanding review comments, it will be helpful if you can regress the patch through your test infrastructure once. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From thartmann at openjdk.java.net Fri Nov 13 06:26:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 06:26:00 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 15:22:15 GMT, Roland Westrelin wrote: >> A loop's trip count is computed to have exact trip count 6. Then: >> >> 1- pre/main/post loops are created which brings the trip count from 6 >> to 5 >> 2- main loop is unrolled which brings the trip count to 2 >> 3- main loop is peeled: Trip count is 1 >> 4- pre/main/post loops are created again. Trip count of main loop is 0. >> 5- peeling is attempted again and the assert fires >> >> IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip >> count is 1. I propose that IdealLoopTree::policy_range_check() (that >> causes the pre/main/post loops insertion the second time) performs the >> same check so step 4 doesn't happen. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - only for normal loops > - test > - fix Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From thartmann at openjdk.java.net Fri Nov 13 06:28:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 06:28:56 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v3] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 15:38:06 GMT, Roland Westrelin wrote: >> This is a Shenandoah bug but the proposed fix is in shared code. >> >> In an infinite loop, a barrier is located right after the loop head >> and above the never branch. When the barrier is expanded, control flow >> is added between the loop and the never branch. During loop >> verification the assert fires because it doesn't expect any control >> flow between the never branch and the loop head. >> >> While it would have been nice to fix this Shenandoah issue in >> Shenandoah code, I think the cleaner fix is to preserve the invariant >> that the never branch is always right after the loop head in an >> infinite loop. In the proposed patch, this is achieved by moving all >> uses of the loop head to the never branch when it's constructed. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - review > - fix & test Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From thartmann at openjdk.java.net Fri Nov 13 06:28:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 06:28:57 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 08:44:03 GMT, Roland Westrelin wrote: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. @rwestrel please don't use force-push because it prevents getting incremental changes in the PR ("New changes since you last viewed"). ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From xliu at openjdk.java.net Fri Nov 13 06:56:55 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 13 Nov 2020 06:56:55 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: References: Message-ID: <5gcJ-y5dzLhf1AAx2Hl4I5Lo3EBbu6zvdia5d_qzuAU=.c0fcfef9-6429-4481-bb7d-9a43227476ef@github.com> On Thu, 5 Nov 2020 08:44:03 GMT, Roland Westrelin wrote: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. hi, @rwestrel Can we have an assertion to make this loop invariance more prominent? `I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. ` ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From chagedorn at openjdk.java.net Fri Nov 13 07:22:56 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 13 Nov 2020 07:22:56 GMT Subject: RFR: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 15:22:15 GMT, Roland Westrelin wrote: >> A loop's trip count is computed to have exact trip count 6. Then: >> >> 1- pre/main/post loops are created which brings the trip count from 6 >> to 5 >> 2- main loop is unrolled which brings the trip count to 2 >> 3- main loop is peeled: Trip count is 1 >> 4- pre/main/post loops are created again. Trip count of main loop is 0. >> 5- peeling is attempted again and the assert fires >> >> IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip >> count is 1. I propose that IdealLoopTree::policy_range_check() (that >> causes the pre/main/post loops insertion the second time) performs the >> same check so step 4 doesn't happen. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - only for normal loops > - test > - fix Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From shade at openjdk.java.net Fri Nov 13 07:42:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 07:42:58 GMT Subject: RFR: 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg In-Reply-To: References: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> Message-ID: On Thu, 12 Nov 2020 11:18:49 GMT, Christian Hagedorn wrote: >> This failure manifests on many tests in `tier1`: >> >> $ CONF=linux-x86-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:UseSSE=1" >> >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp:794), pid=1484789, tid=1484820 >> # assert(false) failed: missed a fpu-operation >> # >> # JRE version: OpenJDK Runtime Environment (16.0) (fastdebug build 16-internal+0-adhoc.shade.jdk) >> # Java VM: OpenJDK Server VM (fastdebug 16-internal+0-adhoc.shade.jdk, mixed mode, tiered, g1 gc, linux-x86) >> # Problematic frame: >> # V [libjvm.so+0x6d6300] FpuStackAllocator::handle_op2(LIR_Op2*)+0xf0 >> >> Amending that assert implies we miss "neg". I believe it was missed when [JDK-8210764](https://bugs.openjdk.java.net/browse/JDK-8210764) changed `lir_neg` from `LIR_Op1` to `LIR_Op2`. At first I just moved the block to appropriate switch that handles `op2`, but then I realized `lir_neg` code is basically the same as for `lir_abs` in the same switch. >> >> Testing: >> - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=0` (some leftover failures) >> - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=1` (some leftover failures) > > Yes, that block seems to do the same - looks good to me! Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1173 From shade at openjdk.java.net Fri Nov 13 07:43:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 07:43:00 GMT Subject: Integrated: 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg In-Reply-To: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> References: <5N5-ptZ70ojb9KknfwHqIjS3_riNYCRL5gcqvgv8tY0=.d5f14d60-3141-4166-8b54-c2dbd0e9cd90@github.com> Message-ID: <1gdRCXOvWY6njX1F6NBStbtswsTwhvGVTGZngxBKsMo=.adf3c7c8-a36b-49b6-8c34-5075a432a40d@github.com> On Wed, 11 Nov 2020 19:32:44 GMT, Aleksey Shipilev wrote: > This failure manifests on many tests in `tier1`: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:UseSSE=1" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp:794), pid=1484789, tid=1484820 > # assert(false) failed: missed a fpu-operation > # > # JRE version: OpenJDK Runtime Environment (16.0) (fastdebug build 16-internal+0-adhoc.shade.jdk) > # Java VM: OpenJDK Server VM (fastdebug 16-internal+0-adhoc.shade.jdk, mixed mode, tiered, g1 gc, linux-x86) > # Problematic frame: > # V [libjvm.so+0x6d6300] FpuStackAllocator::handle_op2(LIR_Op2*)+0xf0 > > Amending that assert implies we miss "neg". I believe it was missed when [JDK-8210764](https://bugs.openjdk.java.net/browse/JDK-8210764) changed `lir_neg` from `LIR_Op1` to `LIR_Op2`. At first I just moved the block to appropriate switch that handles `op2`, but then I realized `lir_neg` code is basically the same as for `lir_abs` in the same switch. > > Testing: > - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=0` (some leftover failures) > - [x] Linux x86_32 fastdebug tier1 with `-XX:UseSSE=1` (some leftover failures) This pull request has now been integrated. Changeset: c3139abe Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c3139abe Stats: 17 lines in 1 file changed: 1 ins; 15 del; 1 mod 8256220: C1: x86_32 fails with -XX:UseSSE=1 after JDK-8210764 due to mishandled lir_neg Reviewed-by: chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/1173 From roland at openjdk.java.net Fri Nov 13 08:22:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 13 Nov 2020 08:22:58 GMT Subject: Integrated: 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 16:41:28 GMT, Roland Westrelin wrote: > A loop's trip count is computed to have exact trip count 6. Then: > > 1- pre/main/post loops are created which brings the trip count from 6 > to 5 > 2- main loop is unrolled which brings the trip count to 2 > 3- main loop is peeled: Trip count is 1 > 4- pre/main/post loops are created again. Trip count of main loop is 0. > 5- peeling is attempted again and the assert fires > > IdealLoopTree::policy_peeling() doesn't attempt peeling if the trip > count is 1. I propose that IdealLoopTree::policy_range_check() (that > causes the pre/main/post loops insertion the second time) performs the > same check so step 4 doesn't happen. This pull request has now been integrated. Changeset: ea576ddb Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/ea576ddb Stats: 64 lines in 2 files changed: 63 ins; 0 del; 1 mod 8254887: C2: assert(cl->trip_count() > 0) failed: peeling a fully unrolled loop Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1096 From xxinliu at amazon.com Fri Nov 13 08:39:00 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 13 Nov 2020 08:39:00 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1596523192072.15354@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> Message-ID: Hi, Nils, Could you take a look at this PR: https://github.com/openjdk/jdk/pull/1179 It should be the last patch of ControlIntrinsic, which add validation logic and regression test. I just renew it from the webrev. Because global.hpp has changed since then, I modify slightly to make sure it works with the new macros. This is non-trivial because the end-users have multiple approaches to use Disable/ControlIntrinsic. Now it conforms to the spec you gave me. https://bugs.openjdk.java.net/browse/JDK-8247732?focusedCommentId=14362773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14362773 I add a few tests to cover different use cases. The only case that doesn't cover is invalid input directly from vmflag. Here it is: $java -XX:+UnlockDiagnosticVMOptions -XX:ControlIntrinsic=+_dtan,+_hello,+_max -version Unrecognized intrinsic detected in ControlIntrinsic: _hello Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Thanks, --lx ?On 8/3/20, 11:41 PM, "hotspot-runtime-dev on behalf of Liu, Xin" wrote: hi, Nils, Tobias would like to keep the parser behavior consistency. I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive. eg. -XX:CompileCommand=option,,ControlIntrinsic=-_nonexist. What do you think about it? Here is the latest webrev: http://cr.openjdk.java.net/~xliu/8247732/01/webrev/ thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Friday, July 24, 2020 2:52 AM To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Liu, On 23.07.20 18:02, Liu, Xin wrote: > That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. > It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. > > I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. > This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. Okay, thanks for the explanation! I would prefer consistency in error handling of compiler directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide. Best regards, Tobias From roland at openjdk.java.net Fri Nov 13 08:52:22 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 13 Nov 2020 08:52:22 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v5] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: - exclude compiler test when run with -Xcomp - CastLL should define carry_depency - intrinsic comments - Jorn's comments - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - non x86_64 arch support - c2 test case - intrinsic - Use overloads of method names. Simplify internally to avoid overload resolution issues, leverging List for the exception mapper. - ... and 13 more: https://git.openjdk.java.net/jdk/compare/a9aed82d...e3887a79 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1003/files - new: https://git.openjdk.java.net/jdk/pull/1003/files/692b4298..e3887a79 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=03-04 Stats: 48430 lines in 396 files changed: 26604 ins; 14405 del; 7421 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Fri Nov 13 08:52:22 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 13 Nov 2020 08:52:22 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v3] In-Reply-To: <0P2H9rV2KCehYBo_hE-uYriC_S0YAs_1QOePRcRtQjI=.614ae3b3-316b-4bd9-a547-680acb193d3d@github.com> References: <0P2H9rV2KCehYBo_hE-uYriC_S0YAs_1QOePRcRtQjI=.614ae3b3-316b-4bd9-a547-680acb193d3d@github.com> Message-ID: On Sat, 7 Nov 2020 11:38:50 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Marked as reviewed by vlivanov (Reviewer). I noticed the compiler test would fail with -Xcomp. I pushed a fix for that ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From roland at openjdk.java.net Fri Nov 13 09:04:11 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 13 Nov 2020 09:04:11 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v6] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 - exclude compiler test when run with -Xcomp - CastLL should define carry_depency - intrinsic comments - Jorn's comments - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - non x86_64 arch support - c2 test case - intrinsic - ... and 14 more: https://git.openjdk.java.net/jdk/compare/b4d01867...90493e6e ------------- Changes: https://git.openjdk.java.net/jdk/pull/1003/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=05 Stats: 897 lines in 30 files changed: 846 ins; 4 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From github.com+10482586+erik1iu at openjdk.java.net Fri Nov 13 10:39:03 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Fri, 13 Nov 2020 10:39:03 GMT Subject: RFR: 8254872: Optimize Rotate on AArch64 Message-ID: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> This patch is a supplement for https://bugs.openjdk.java.net/browse/JDK-8248830. With the implementation of rotate node in IR, this patch: 1. canonicalizes RotateLeft into RotateRight when shift is a constant, so that GVN could identify the pre-existing node better. 2. implements scalar rotate match rules and removes the original combinations of Or and Shifts on AArch64. This patch doesn't implement vector rotate due to the lack of corresponding vector instructions on AArch64. Test case below is an explanation for this patch. public static int test(int i) { int a = (i >>> 29) | (i << -29); int b = i << 3; int c = i >>> -3; int d = b | c; return a ^ d; } Before: lsl w12, w1, #3 lsr w10, w1, #29 add w11, w10, w12 orr w12, w12, w10 eor w0, w11, w12 After: ror w10, w1, #29 eor w0, w10, w10 Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1. ------------- Commit messages: - 8254872: Optimize Rotate on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/1199/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1199&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254872 Stats: 199 lines in 4 files changed: 29 ins; 119 del; 51 mod Patch: https://git.openjdk.java.net/jdk/pull/1199.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1199/head:pull/1199 PR: https://git.openjdk.java.net/jdk/pull/1199 From dongbohe at openjdk.java.net Fri Nov 13 10:57:26 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Fri, 13 Nov 2020 10:57:26 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v3] In-Reply-To: References: Message-ID: > 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on Dongbo He has updated the pull request incrementally with one additional commit since the last revision: rm 8253623 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/853/files - new: https://git.openjdk.java.net/jdk/pull/853/files/5cd059b5..12318bd4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/853/head:pull/853 PR: https://git.openjdk.java.net/jdk/pull/853 From thartmann at openjdk.java.net Fri Nov 13 11:03:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 11:03:05 GMT Subject: RFR: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes Message-ID: [JDK-8239072](https://bugs.openjdk.java.net/browse/JDK-8239072) added a default case to the switch statement in PhaseMacroExpand::expand_macro_nodes: https://hg.openjdk.java.net/jdk/jdk/rev/cf319f17c647#l3.67 This allows to merge the "expansion must have deleted one node from macro list" asserts and remove the duplicate asserts. Thanks, Tobias ------------- Commit messages: - 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes Changes: https://git.openjdk.java.net/jdk/pull/1200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1200&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256325 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1200/head:pull/1200 PR: https://git.openjdk.java.net/jdk/pull/1200 From shade at openjdk.java.net Fri Nov 13 11:23:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 11:23:03 GMT Subject: RFR: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 10:56:11 GMT, Tobias Hartmann wrote: > [JDK-8239072](https://bugs.openjdk.java.net/browse/JDK-8239072) added a default case to the switch statement in PhaseMacroExpand::expand_macro_nodes: > https://hg.openjdk.java.net/jdk/jdk/rev/cf319f17c647#l3.67 > > This allows to merge the "expansion must have deleted one node from macro list" asserts and remove the duplicate asserts. > > Thanks, > Tobias Looks pretty simple and good. While you are here, you could also capitalize the macro and drop redundant semicolon: debug_only(int old_macro_count = C->macro_count();); to: DEBUG_ONLY(int old_macro_count = C->macro_count();) ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1200 From thartmann at openjdk.java.net Fri Nov 13 12:00:31 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 12:00:31 GMT Subject: RFR: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes [v2] In-Reply-To: References: Message-ID: <9SK_x3OBfuzoPBmMFeOHOQU4g7knWwrT9rVlvMqPTSE=.f610a335-d425-4d04-a8fa-c633a27005c2@github.com> > [JDK-8239072](https://bugs.openjdk.java.net/browse/JDK-8239072) added a default case to the switch statement in PhaseMacroExpand::expand_macro_nodes: > https://hg.openjdk.java.net/jdk/jdk/rev/cf319f17c647#l3.67 > > This allows to merge the "expansion must have deleted one node from macro list" asserts and remove the duplicate asserts. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Uppercase DEBUG_ONLY ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1200/files - new: https://git.openjdk.java.net/jdk/pull/1200/files/c3dde8c5..db3af879 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1200&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1200&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1200/head:pull/1200 PR: https://git.openjdk.java.net/jdk/pull/1200 From redestad at openjdk.java.net Fri Nov 13 12:00:32 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 13 Nov 2020 12:00:32 GMT Subject: RFR: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes [v2] In-Reply-To: <9SK_x3OBfuzoPBmMFeOHOQU4g7knWwrT9rVlvMqPTSE=.f610a335-d425-4d04-a8fa-c633a27005c2@github.com> References: <9SK_x3OBfuzoPBmMFeOHOQU4g7knWwrT9rVlvMqPTSE=.f610a335-d425-4d04-a8fa-c633a27005c2@github.com> Message-ID: <9eugxq0XYbk3ulDsMtZ_ZwPe8vjCWdsUsleJxDAACIU=.6adcf42e-6249-4ec7-a4e5-8798ad0a2a7c@github.com> On Fri, 13 Nov 2020 11:57:52 GMT, Tobias Hartmann wrote: >> [JDK-8239072](https://bugs.openjdk.java.net/browse/JDK-8239072) added a default case to the switch statement in PhaseMacroExpand::expand_macro_nodes: >> https://hg.openjdk.java.net/jdk/jdk/rev/cf319f17c647#l3.67 >> >> This allows to merge the "expansion must have deleted one node from macro list" asserts and remove the duplicate asserts. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Uppercase DEBUG_ONLY Looks good and trivial. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1200 From thartmann at openjdk.java.net Fri Nov 13 12:00:33 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 13 Nov 2020 12:00:33 GMT Subject: RFR: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes [v2] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 11:20:27 GMT, Aleksey Shipilev wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Uppercase DEBUG_ONLY > > Looks pretty simple and good. > > While you are here, you could also capitalize the macro and drop redundant semicolon: > debug_only(int old_macro_count = C->macro_count();); > to: > DEBUG_ONLY(int old_macro_count = C->macro_count();) @shipilev, @cl4es thanks for the reviews. I've updated `debug_only` to `DEBUG_ONLY`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1200 From chagedorn at openjdk.java.net Fri Nov 13 12:06:10 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 13 Nov 2020 12:06:10 GMT Subject: RFR: 8255058: C1: assert(is_virtual()) failed: type check Message-ID: The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. Thanks, Christian ------------- Commit messages: - 8255058: C1: assert(is_virtual()) failed: type check Changes: https://git.openjdk.java.net/jdk/pull/1202/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1202&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255058 Stats: 74 lines in 2 files changed: 71 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1202.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1202/head:pull/1202 PR: https://git.openjdk.java.net/jdk/pull/1202 From neliasso at openjdk.java.net Fri Nov 13 12:47:02 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 13 Nov 2020 12:47:02 GMT Subject: RFR: 8255058: C1: assert(is_virtual()) failed: type check In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 11:59:14 GMT, Christian Hagedorn wrote: > The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 > In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 > As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. > > The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1202 From neliasso at openjdk.java.net Fri Nov 13 13:04:06 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 13 Nov 2020 13:04:06 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 14:55:15 GMT, Claes Redestad wrote: >> - Remove unused (and subtly broken) Dict deep copy constructors >> - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in >> - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict >> >> This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op >> Patch: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op >> >> (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) >> >> [1] when instrumenting with valgrind+callgrind > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into copy_shared_dict > - Clean-up, copyrights > - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup Marked as reviewed by neliasso (Reviewer). src/hotspot/share/libadt/dict.cpp line 82: > 80: > 81: // Deep copy into arena of choice > 82: Dict::Dict(const Dict &d, Arena *arena) Some minor formatting errors in this method: Arena * -> Arena* void * -> void* _cnt*2*sizeof -> _cnt * 2 * sizeof Otherwise looks good! ------------- PR: https://git.openjdk.java.net/jdk/pull/1186 From vlivanov at openjdk.java.net Fri Nov 13 14:04:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 13 Nov 2020 14:04:03 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v3] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 10:57:26 GMT, Dongbo He wrote: >> 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on > > Dongbo He has updated the pull request incrementally with one additional commit since the last revision: > > rm 8253623 src/hotspot/share/opto/vector.cpp line 245: > 243: > 244: uint first_ind = (sfpt->req() - sfpt->jvms()->scloff()); > 245: ciKlass* cik = vec_box->box_type()->is_oopptr()->klass(); Much better, thanks! I suggest to simplify it as follows: diff --git a/src/hotspot/share/opto/vector.cpp b/src/hotspot/share/opto/vector.cpp index 667ca0592e0..e194683be4a 100644 --- a/src/hotspot/share/opto/vector.cpp +++ b/src/hotspot/share/opto/vector.cpp @@ -244,12 +244,16 @@ void PhaseVector::scalarize_vbox_node(VectorBoxNode* vec_box) { while (safepoints.size() > 0) { SafePointNode* sfpt = safepoints.pop()->as_SafePoint(); + ciInstanceKlass* iklass = vec_box->box_type()->klass()->as_instance_klass(); + int n_fields = iklass->nof_nonstatic_fields(); + assert(n_fields == 1, "sanity"); + uint first_ind = (sfpt->req() - sfpt->jvms()->scloff()); Node* sobj = new SafePointScalarObjectNode(vec_box->box_type(), #ifdef ASSERT NULL, #endif // ASSERT - first_ind, /*n_fields=*/1); + first_ind, n_fields); sobj->init_req(0, C->root()); sfpt->add_req(vec_value); ``` ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From redestad at openjdk.java.net Fri Nov 13 14:40:19 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 13 Nov 2020 14:40:19 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v3] In-Reply-To: References: Message-ID: > - Remove unused (and subtly broken) Dict deep copy constructors > - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in > - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict > > This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op > Patch: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op > > (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) > > [1] when instrumenting with valgrind+callgrind Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into copy_shared_dict - Dict clean-up, remove dead code - Merge branch 'master' into copy_shared_dict - Clean-up, copyrights - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1186/files - new: https://git.openjdk.java.net/jdk/pull/1186/files/bcd6adb0..d7915366 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1186&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1186&range=01-02 Stats: 9462 lines in 141 files changed: 6048 ins; 1800 del; 1614 mod Patch: https://git.openjdk.java.net/jdk/pull/1186.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1186/head:pull/1186 PR: https://git.openjdk.java.net/jdk/pull/1186 From neliasso at openjdk.java.net Fri Nov 13 14:40:22 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 13 Nov 2020 14:40:22 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v3] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 14:37:37 GMT, Claes Redestad wrote: >> - Remove unused (and subtly broken) Dict deep copy constructors >> - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in >> - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict >> >> This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op >> Patch: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op >> >> (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) >> >> [1] when instrumenting with valgrind+callgrind > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into copy_shared_dict > - Dict clean-up, remove dead code > - Merge branch 'master' into copy_shared_dict > - Clean-up, copyrights > - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup Still good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1186 From darcy at openjdk.java.net Fri Nov 13 17:37:08 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Fri, 13 Nov 2020 17:37:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 20:39:14 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by darcy (Reviewer). test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > 1: /* > 2: * Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. Before pushing, [lease update the copyright year to "2011, 2020" (since this file was based on one from 2020). test/jdk/java/lang/Math/ExpCornerCaseTests.java line 28: > 26: * @bug 8255368 > 27: * @summary Tests corner cases of Math.exp > 28: * @author Xubo Zhang We don't use @author tags on new tests, but haven't removed from old tests. src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 593: > 591: jmp(L_2TAG_PACKET_2_0_2); > 592: cmpl(ecx, INT_MIN); > 593: jcc(Assembler::below, L_2TAG_PACKET_3_0_2); If all the changed instructions are not covered by the two test arguments, please add additional values to the test covering the other instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Fri Nov 13 17:53:16 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 13 Nov 2020 17:53:16 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/b23c8cba..704dfff2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From aph at openjdk.java.net Fri Nov 13 17:56:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 13 Nov 2020 17:56:57 GMT Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Fri, 13 Nov 2020 10:33:41 GMT, Eric Liu wrote: > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original > combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > > public static int test(int i) { > int a = (i >>> 29) | (i << -29); > int b = i << 3; > int c = i >>> -3; > int d = b | c; > return a ^ d; > } > > Before: > > lsl w12, w1, #3 > lsr w10, w1, #29 > add w11, w10, w12 > orr w12, w12, w10 > eor w0, w11, w12 > > After: > > ror w10, w1, #29 > eor w0, w10, w10 > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. Looks very nice in general, but please add one more case for the EOR (shifted register) instructions with rotation. At present we only do LSR, ASL, and ASR. You could add ROR to define(`ALL_SHIFT_KINDS', `BOTH_SHIFT_INSNS($1, $2, URShift, LSR) BOTH_SHIFT_INSNS($1, $2, RShift, ASR) BOTH_SHIFT_INSNS($1, $2, LShift, LSL)')dnl This is used in, for example, Java code for SHA 3. ------------- Changes requested by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1199 From john.r.rose at oracle.com Fri Nov 13 18:43:40 2020 From: john.r.rose at oracle.com (John Rose) Date: Fri, 13 Nov 2020 10:43:40 -0800 Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Nov 13, 2020, at 2:39 AM, Eric Liu wrote: > > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original > combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > > public static int test(int i) { > int a = (i >>> 29) | (i << -29); > int b = i << 3; > int c = i >>> -3; > int d = b | c; > return a ^ d; > } Because of shift-count masking, this parses to nodes equivalent to: public static int test(int i) { int a = (i >>> 29) | (i << 3); int d = (i << 3) | (i >>> 29); // not detected: a == d int r = a ^ d; // not detected: r == 0 return r; } If we were to work a little harder at canonicalizing commutative expressions in IGVN, we could detect that a==d. (See AddNode::Ideal.) It?s tempting to pull on this very long string, but it?s not clear when to stop, if not now. In this case the better road is to canonicalize both a and d to the same rotate node. But maybe there?s some benefit in reordering x|y|z and x^y^z when x and z could combine to a rotate node. (This isn?t your problem!) > Before: > > lsl w12, w1, #3 > lsr w10, w1, #29 > add w11, w10, w12 > orr w12, w12, w10 > eor w0, w11, w12 > > After: > > ror w10, w1, #29 > eor w0, w10, w10 Amazingly, w10^w10 does not GVN to zero! Your test appears to rely on that weakness. I think the weakness should be fixed in a separate investigation. Anyway, none of these remarks reflects on your patch. ? John From eric.c.liu at arm.com Mon Nov 16 02:12:31 2020 From: eric.c.liu at arm.com (Eric Liu) Date: Mon, 16 Nov 2020 02:12:31 +0000 Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com>, Message-ID: Hi Andrew, Thanks for your review. I will update those cases soon. B&R Eric ------------------------------------------------------------------------------- From: hotspot-compiler-dev on behalf of Andrew Haley Sent: 14 November 2020 1:56 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8254872: Optimize Rotate on AArch64 ? On Fri, 13 Nov 2020 10:33:41 GMT, Eric Liu wrote: > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, >??? so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original >??? combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > >???????? public static int test(int i) { >???????????? int a =? (i >>> 29) | (i << -29); >???????????? int b = i << 3; >???????????? int c = i >>> -3; >???????????? int d = b | c; >???????????? return a ^ d; >???????? } > > Before: > >???????? lsl???? w12, w1, #3 >???????? lsr???? w10, w1, #29 >???????? add???? w11, w10, w12 >???????? orr???? w12, w12, w10 >???????? eor???? w0, w11, w12 > > After: > >???????? ror???? w10, w1, #29 >???????? eor???? w0, w10, w10 > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. Looks very nice in general, but please add one more case for the EOR (shifted register) instructions with rotation. At present we only do LSR, ASL, and ASR. You could add ROR to define(`ALL_SHIFT_KINDS', `BOTH_SHIFT_INSNS($1, $2, URShift, LSR) BOTH_SHIFT_INSNS($1, $2, RShift, ASR) BOTH_SHIFT_INSNS($1, $2, LShift, LSL)')dnl This is used in, for example, Java code for SHA 3. ------------- Changes requested by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1199 From eric.c.liu at arm.com Mon Nov 16 04:14:47 2020 From: eric.c.liu at arm.com (Eric Liu) Date: Mon, 16 Nov 2020 04:14:47 +0000 Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com>, Message-ID: Hi John, Thanks for your review. I'd like to take some time to investigate the weakness. B&R Eric ----------------------------------------------------------------------------------- From: hotspot-compiler-dev on behalf of John Rose Sent: 14 November 2020 2:43 To: Eric Liu Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8254872: Optimize Rotate on AArch64 ? On Nov 13, 2020, at 2:39 AM, Eric Liu wrote: > > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, >?? so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original >?? combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > >??????? public static int test(int i) { >??????????? int a =? (i >>> 29) | (i << -29); >??????????? int b = i << 3; >??????????? int c = i >>> -3; >??????????? int d = b | c; >??????????? return a ^ d; >??????? } Because of shift-count masking, this parses to nodes equivalent to: ?????? public static int test(int i) { ?????????? int a =? (i >>> 29) | (i << 3); ?????????? int d = (i << 3) | (i >>> 29); ?????????? // not detected: a == d? ?????????? int r = a ^ d; ?????????? // not detected: r == 0 ?????????? return r; ?????? } If we were to work a little harder at canonicalizing commutative expressions in IGVN, we could detect that a==d.? (See AddNode::Ideal.)? It?s tempting to pull on this very long string, but it?s not clear when to stop, if not now. In this case the better road is to canonicalize both a and d to the same rotate node.? But maybe there?s some benefit in reordering x|y|z and x^y^z when x and z could combine to a rotate node.? (This isn?t your problem!) > Before: > >??????? lsl???? w12, w1, #3 >??????? lsr???? w10, w1, #29 >??????? add???? w11, w10, w12 >??????? orr???? w12, w12, w10 >??????? eor???? w0, w11, w12 > > After: > >??????? ror???? w10, w1, #29 >??????? eor???? w0, w10, w10 Amazingly, w10^w10 does not GVN to zero! Your test appears to rely on that weakness. I think the weakness should be fixed in a separate investigation. Anyway, none of these remarks reflects on your patch. ? John From github.com+10482586+erik1iu at openjdk.java.net Mon Nov 16 06:49:57 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Mon, 16 Nov 2020 06:49:57 GMT Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Fri, 13 Nov 2020 17:54:34 GMT, Andrew Haley wrote: >> This patch is a supplement for >> https://bugs.openjdk.java.net/browse/JDK-8248830. >> >> With the implementation of rotate node in IR, this patch: >> >> 1. canonicalizes RotateLeft into RotateRight when shift is a constant, >> so that GVN could identify the pre-existing node better. >> 2. implements scalar rotate match rules and removes the original >> combinations of Or and Shifts on AArch64. >> >> This patch doesn't implement vector rotate due to the lack of >> corresponding vector instructions on AArch64. >> >> Test case below is an explanation for this patch. >> >> public static int test(int i) { >> int a = (i >>> 29) | (i << -29); >> int b = i << 3; >> int c = i >>> -3; >> int d = b | c; >> return a ^ d; >> } >> >> Before: >> >> lsl w12, w1, #3 >> lsr w10, w1, #29 >> add w11, w10, w12 >> orr w12, w12, w10 >> eor w0, w11, w12 >> >> After: >> >> ror w10, w1, #29 >> eor w0, w10, w10 >> >> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, >> jdk::jdk_core, langtools::tier1. > > Looks very nice in general, but please add one more case for the EOR (shifted register) instructions with rotation. At present we only do LSR, ASL, and ASR. You could add ROR to > > define(`ALL_SHIFT_KINDS', > `BOTH_SHIFT_INSNS($1, $2, URShift, LSR) > BOTH_SHIFT_INSNS($1, $2, RShift, ASR) > BOTH_SHIFT_INSNS($1, $2, LShift, LSL)')dnl > > This is used in, for example, Java code for SHA 3. Hi Andrew, I just thought about this a bit more. > Looks very nice in general, but please add one more case for the EOR (shifted register) instructions with rotation. At present we only do LSR, ASL, and ASR. You could add ROR to > > define(`ALL_SHIFT_KINDS', `BOTH_SHIFT_INSNS($1, $2, URShift, LSR) > BOTH_SHIFT_INSNS($1, $2, RShift, ASR) > BOTH_SHIFT_INSNS($1, $2, LShift, LSL)')dnl > > This is used in, for example, Java code for SHA 3. I prefer to integrate this patch first. - I added those cases for EOR instructions in my local, they work fine in general but I suppose it still needs more strict regressions. - For another rule `dst (AddI (LShiftI src1 lshift) (URShiftI src2 rshift))`, I presume it can be transformed to Rotate in middle-end if `lshift + rshift = 0` but I didn't implement it in this patch. - @vnkozlov (Vladimir Kozlov) left over an issue(https://bugs.openjdk.java.net/browse/JDK-8252776), which asks for refactoring the test cases in TestRotate.java, It's also a good chance to add other new test cases. This patch is basically pure without regressions, and for above tasks I prefer to finish them in the next patch once for all. ------------- PR: https://git.openjdk.java.net/jdk/pull/1199 From chagedorn at openjdk.java.net Mon Nov 16 08:02:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 16 Nov 2020 08:02:57 GMT Subject: RFR: 8255058: C1: assert(is_virtual()) failed: type check In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 12:43:43 GMT, Nils Eliasson wrote: >> The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 >> In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 >> As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. >> >> The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. >> >> Thanks, >> Christian > > Looks good! @neliasso Thank you for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1202 From shade at openjdk.java.net Mon Nov 16 08:20:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 08:20:58 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 13:06:30 GMT, Aleksey Shipilev wrote: > Reproduces like this: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" > > STDOUT: > ### NanTest started > Written and read back float values match > 0x7F800001 0x7F800001 > STDERR: > java.lang.RuntimeException: Original and read back double values mismatch > 0xFFF0000000000001 0xFFF8000000000001 > > at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) > at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) > > After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. > > Additional testing: > - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux AArch64 Anyone? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From aph at openjdk.java.net Mon Nov 16 11:55:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 16 Nov 2020 11:55:59 GMT Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Fri, 13 Nov 2020 10:33:41 GMT, Eric Liu wrote: > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original > combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > > public static int test(int i) { > int a = (i >>> 29) | (i << -29); > int b = i << 3; > int c = i >>> -3; > int d = b | c; > return a ^ d; > } > > Before: > > lsl w12, w1, #3 > lsr w10, w1, #29 > add w11, w10, w12 > orr w12, w12, w10 > eor w0, w11, w12 > > After: > > ror w10, w1, #29 > eor w0, w10, w10 > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. > On Fri, 13 Nov 2020 17:54:34 GMT, Andrew Haley wrote: > >> Looks very nice in general, but please add one more case for the EOR (shifted register) instructions with rotation. At present we only do LSR, ASL, and ASR. You could add ROR to >> >> define(`ALL_SHIFT_KINDS', >> `BOTH_SHIFT_INSNS($1, $2, URShift, LSR) >> BOTH_SHIFT_INSNS($1, $2, RShift, ASR) >> BOTH_SHIFT_INSNS($1, $2, LShift, LSL)')dnl >> >> This is used in, for example, Java code for SHA 3. > I prefer to integrate this patch first. > - I added those cases for EOR instructions in my local, they work fine in general but I suppose it still needs more strict regressions. Perhaps so. > - For another rule `dst (AddI (LShiftI src1 lshift) (URShiftI src2 rshift))`, I presume it can be transformed to Rotate in middle-end if `lshift + rshift = 0` and src1 is the same register as src2. > but I didn't implement it in this patch. > - @vnkozlov (Vladimir Kozlov) left over an issue(https://bugs.openjdk.java.net/browse/JDK-8252776), which asks for refactoring the test cases in TestRotate.java, It's also a good chance to add other new test cases. I see. > This patch is basically pure without regressions, and for above tasks I prefer to finish them in the next patch once for all. I suppose that's OK, but it seems to me odd to do half the job. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1199 From shade at openjdk.java.net Mon Nov 16 12:12:08 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 12:12:08 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 Message-ID: arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. Additional testing: - [ ] Linux arm32 fastdebug tier1 ------------- Commit messages: - 8256386: arm32 tests fail with "bad AD file" after JDK-8223051 Changes: https://git.openjdk.java.net/jdk/pull/1220/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1220&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256386 Stats: 44 lines in 1 file changed: 44 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1220.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1220/head:pull/1220 PR: https://git.openjdk.java.net/jdk/pull/1220 From vladimir.x.ivanov at oracle.com Mon Nov 16 12:19:27 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 16 Nov 2020 15:19:27 +0300 Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: <53a217c4-fd28-dc7c-eb9c-78965373b367@oracle.com> Hi Xin, > the optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) > to substring(base, beg, end) | base.startsWith(prefix, beg). > > it reduces an use of substring. hopefully c2 optimizer can remove the used substring() It would be very helpful to see a more elaborate description of intended behavior to understand better what is the desired goal of the enhancement. Though it looks attractive to address the problem in the JIT-compiler, it poses some new challenges which makes proposed approach questionable. I understand your desire to rely on existing String-related optimizations, but coalescing multiple concatenations differs significantly from your case. Some of the concerns/questions I had while briefly looking through the patch: - You introduce a call to a method (String::startsWith(String, int)) which is not present in bytecode. It means (unless the method is called from different places) there won't be any profiling info present, and even if there is, it is unrelated to the call site being compiled. If the code is recompiled at some point (e.g., it hits an uncommon trap in startsWith(String,int) overload), there won't be any re-profiling happening. So, it can hit the very same condition on the consequent recompilations. - "s.startsWith(prefix)" call node is reused and rewired to call unrelated method "base.startsWith(prefix, beg)". Will the new target method be taken into account during inlining? I would be much more comfortable seeing fresh call populated using standard process (involving CallGenerators et al) which then substitutes the original node. That way you make it less fragile longer term. - "hopefully c2 optimizer can remove the used substring()" If everything is inlined in the end, it can happen, but it's fragile. Instead, you could teach C2 that the method is "pure" (no interesting side effects to care about) and cut the call early. It already happens for boxing methods (see LateInlineCallGenerator::_is_pure_call for details). Overall, if you want to keep the enhancement C2-specific, I'd suggest to look into intrinsifying String::startsWith(String, int) and not relying on its bytecodes at all. That way you would avoid fighting against the rest of the JVM in some situations. Best regards, Vladimir Ivanov > Commit messages: > - fix a regression test on x86_32 > - 8254807: Optimize startsWith() for String.substring() > - 8254807: Optimize startsWith() for String.substring() > - 8254807: Optimize startsWith() for String.substring() > - 8254807: Optimize startsWith() for String.substring() > - 8254807: Optimize startsWith() for String.substring() > - 8254807: Optimize startsWith() for String.substring() > > Changes: https://git.openjdk.java.net/jdk/pull/974/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=974&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8254807 > Stats: 538 lines in 15 files changed: 472 ins; 56 del; 10 mod > Patch: https://git.openjdk.java.net/jdk/pull/974.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/974/head:pull/974 > > PR: https://git.openjdk.java.net/jdk/pull/974 > From bulasevich at openjdk.java.net Mon Nov 16 13:27:09 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 16 Nov 2020 13:27:09 GMT Subject: RFR: 8254016: Test8237524 fails with -XX:-CompactStrings option Message-ID: The test compares two strings created from same byte array: Latin vs UTF16. The expected effect (strings are different) is valid only for COMPACT_STRINGS=true case (otherwise both strings are treated as UTF16). Particularly on ARM32 test fails because CompactStrings is off by default. ------------- Commit messages: - 8254016: Test8237524 requires +CompactStrings option Changes: https://git.openjdk.java.net/jdk/pull/1226/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1226&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254016 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1226.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1226/head:pull/1226 PR: https://git.openjdk.java.net/jdk/pull/1226 From shade at openjdk.java.net Mon Nov 16 13:36:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 13:36:03 GMT Subject: RFR: 8254016: Test8237524 fails with -XX:-CompactStrings option In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:22:08 GMT, Boris Ulasevich wrote: > The test compares two strings created from same byte array: Latin vs UTF16. The expected effect (strings are different) is valid only for COMPACT_STRINGS=true case (otherwise both strings are treated as UTF16). Particularly on ARM32 test fails because CompactStrings is off by default. Does this work when platform defaults to `-XX:-CompactStrings`? I.e. does it work for ARM32? I would prefer to `@require vm.opt.final.CompactStrings`, but seems like other tests already do `-XX:+CompactStrings` anyway, so this looks okay. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1226 From bulasevich at openjdk.java.net Mon Nov 16 13:41:06 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 16 Nov 2020 13:41:06 GMT Subject: RFR: 8254016: Test8237524 fails with -XX:-CompactStrings option In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:33:00 GMT, Aleksey Shipilev wrote: > Does this work when platform defaults to `-XX:-CompactStrings`? I.e. does it work for ARM32? Yes, I checked it on ARM32. thank you ------------- PR: https://git.openjdk.java.net/jdk/pull/1226 From bulasevich at openjdk.java.net Mon Nov 16 15:01:05 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 16 Nov 2020 15:01:05 GMT Subject: Integrated: 8254016: Test8237524 fails with -XX:-CompactStrings option In-Reply-To: References: Message-ID: <4esQoO1Njp9lZ-7imWr1l0H_fMTyosYMsfAOkoAK78o=.b618f761-2771-4b05-a649-5ffbf64c332d@github.com> On Mon, 16 Nov 2020 13:22:08 GMT, Boris Ulasevich wrote: > The test compares two strings created from same byte array: Latin vs UTF16. The expected effect (strings are different) is valid only for COMPACT_STRINGS=true case (otherwise both strings are treated as UTF16). Particularly on ARM32 test fails because CompactStrings is off by default. This pull request has now been integrated. Changeset: f611fdfe Author: Boris Ulasevich URL: https://git.openjdk.java.net/jdk/commit/f611fdfe Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8254016: Test8237524 fails with -XX:-CompactStrings option Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1226 From azeemj at openjdk.java.net Mon Nov 16 18:09:01 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Mon, 16 Nov 2020 18:09:01 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 10:12:31 GMT, Aleksey Shipilev wrote: > arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. > > Additional testing: > - [ ] Linux arm32 fastdebug tier1 Not a reviewer, but LGTM ------------- Marked as reviewed by azeemj (Author). PR: https://git.openjdk.java.net/jdk/pull/1220 From iignatyev at openjdk.java.net Mon Nov 16 18:33:13 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:33:13 GMT Subject: RFR: 8256414: add optimized build to submit workflow Message-ID: Hi all, Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? Thanks, -- Igor ------------- Commit messages: - add linux-x64-optimized build Changes: https://git.openjdk.java.net/jdk/pull/1233/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1233.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 PR: https://git.openjdk.java.net/jdk/pull/1233 From vlivanov at openjdk.java.net Mon Nov 16 18:38:10 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 16 Nov 2020 18:38:10 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Thanks a lot, Igor! Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1233 From shade at openjdk.java.net Mon Nov 16 18:43:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 18:43:03 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Looks fine, but wouldn't you like to add `--disable-precompiled-headers` as well? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1233 From kvn at openjdk.java.net Mon Nov 16 18:47:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 18:47:06 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 18:53:20 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:53:20 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: added --disable-precompiled-headers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1233/files - new: https://git.openjdk.java.net/jdk/pull/1233/files/bad0582f..f8189b0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1233.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 18:53:21 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:53:21 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:40:27 GMT, Aleksey Shipilev wrote: > Looks fine, but wouldn't you like to add `--disable-precompiled-headers` as well? sure, make sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From kvn at openjdk.java.net Mon Nov 16 19:00:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 19:00:07 GMT Subject: RFR: 8256205: Simplify compiler calling convention handling In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:35:16 GMT, Claes Redestad wrote: > Clean up and simplify some of the calling convention handling: > > - Remove Matcher::calling_convention/c_calling_convention and replace select few call sites with direct calls to SharedRuntime > - Remove unused is_outgoing args. Since the SPARC removal the is_outgoing has no effect on the calling_convention or return_value methods. > - Move in_preserved_stack_slots to SharedRuntime to match out_preserved_stack_slots. > > This has a tiny positive impact by reducing calls and improving inlining (at least gcc has a hard time inlining anything that goes in the .ad files, even when it's defined to the same class), but is mainly a cleanup effort. > > Testing: Oracle tier1-2 testing; S390, PPC and ARM32 builds Nice clean up. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1168 From shade at openjdk.java.net Mon Nov 16 19:01:07 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 19:01:07 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:53:20 GMT, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? >> >> Thanks, >> -- Igor > > Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: > > added --disable-precompiled-headers Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 19:34:06 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 19:34:06 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:58:37 GMT, Aleksey Shipilev wrote: >> Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: >> >> added --disable-precompiled-headers > > Marked as reviewed by shade (Reviewer). Thanks for the reviews, folks. the build looks green, integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 19:34:09 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 19:34:09 GMT Subject: Integrated: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor This pull request has now been integrated. Changeset: 68fd71d2 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/68fd71d2 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8256414: add optimized build to submit workflow add linux-x64-optimized to submit workflow Reviewed-by: vlivanov, shade, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From neliasso at openjdk.java.net Mon Nov 16 19:36:09 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 16 Nov 2020 19:36:09 GMT Subject: RFR: 8256205: Simplify compiler calling convention handling In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:35:16 GMT, Claes Redestad wrote: > Clean up and simplify some of the calling convention handling: > > - Remove Matcher::calling_convention/c_calling_convention and replace select few call sites with direct calls to SharedRuntime > - Remove unused is_outgoing args. Since the SPARC removal the is_outgoing has no effect on the calling_convention or return_value methods. > - Move in_preserved_stack_slots to SharedRuntime to match out_preserved_stack_slots. > > This has a tiny positive impact by reducing calls and improving inlining (at least gcc has a hard time inlining anything that goes in the .ad files, even when it's defined to the same class), but is mainly a cleanup effort. > > Testing: Oracle tier1-2 testing; S390, PPC and ARM32 builds Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1168 From redestad at openjdk.java.net Mon Nov 16 19:43:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 16 Nov 2020 19:43:03 GMT Subject: RFR: 8256205: Simplify compiler calling convention handling In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:57:38 GMT, Vladimir Kozlov wrote: >> Clean up and simplify some of the calling convention handling: >> >> - Remove Matcher::calling_convention/c_calling_convention and replace select few call sites with direct calls to SharedRuntime >> - Remove unused is_outgoing args. Since the SPARC removal the is_outgoing has no effect on the calling_convention or return_value methods. >> - Move in_preserved_stack_slots to SharedRuntime to match out_preserved_stack_slots. >> >> This has a tiny positive impact by reducing calls and improving inlining (at least gcc has a hard time inlining anything that goes in the .ad files, even when it's defined to the same class), but is mainly a cleanup effort. >> >> Testing: Oracle tier1-2 testing; S390, PPC and ARM32 builds > > Nice clean up. @vnkozlov @neliasso - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1168 From redestad at openjdk.java.net Mon Nov 16 19:43:06 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 16 Nov 2020 19:43:06 GMT Subject: Integrated: 8256205: Simplify compiler calling convention handling In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:35:16 GMT, Claes Redestad wrote: > Clean up and simplify some of the calling convention handling: > > - Remove Matcher::calling_convention/c_calling_convention and replace select few call sites with direct calls to SharedRuntime > - Remove unused is_outgoing args. Since the SPARC removal the is_outgoing has no effect on the calling_convention or return_value methods. > - Move in_preserved_stack_slots to SharedRuntime to match out_preserved_stack_slots. > > This has a tiny positive impact by reducing calls and improving inlining (at least gcc has a hard time inlining anything that goes in the .ad files, even when it's defined to the same class), but is mainly a cleanup effort. > > Testing: Oracle tier1-2 testing; S390, PPC and ARM32 builds This pull request has now been integrated. Changeset: 6e35bcbf Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/6e35bcbf Stats: 359 lines in 31 files changed: 54 ins; 255 del; 50 mod 8256205: Simplify compiler calling convention handling Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/1168 From dcubed at openjdk.java.net Mon Nov 16 20:34:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 16 Nov 2020 20:34:59 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: <3OXruv8DJOopa0weeyBUGCY8uw244AJFdLL8a4CkQ3M=.cc7ae99e-a8b2-4a9c-95bc-06167d365936@github.com> On Mon, 16 Nov 2020 19:29:59 GMT, Igor Ignatyev wrote: >> Marked as reviewed by shade (Reviewer). > > Thanks for the reviews, folks. the build looks green, integrating. @iignatev - did you also change Mach5? I don't have workflow builds enabled by default since I typically do Mach5 builds... ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 20:41:03 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 20:41:03 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 19:29:59 GMT, Igor Ignatyev wrote: >> Marked as reviewed by shade (Reviewer). > > Thanks for the reviews, folks. the build looks green, integrating. > @iignatev - did you also change Mach5? I don't have workflow builds enabled > by default since I typically do Mach5 builds... Hi Dan, no, not yet. I?m going to change jib profile and tier definitions by a separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From kvn at openjdk.java.net Mon Nov 16 20:44:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 20:44:04 GMT Subject: RFR: 8255058: C1: assert(is_virtual()) failed: type check In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 11:59:14 GMT, Christian Hagedorn wrote: > The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 > In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 > As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. > > The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1202 From lmesnik at openjdk.java.net Mon Nov 16 20:58:09 2020 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Mon, 16 Nov 2020 20:58:09 GMT Subject: RFR: 8256418: Jittester make build is broken. Message-ID: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> The Utils.java in lib depends on NetworkConfiguration.java now. So NetworkConfiguration.java should be added to the list. Verified locally. ------------- Commit messages: - 8256418: Jittester make build is broken. Changes: https://git.openjdk.java.net/jdk/pull/1238/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1238&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256418 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1238.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1238/head:pull/1238 PR: https://git.openjdk.java.net/jdk/pull/1238 From iignatyev at openjdk.java.net Mon Nov 16 21:02:01 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 21:02:01 GMT Subject: RFR: 8256418: Jittester make build is broken. In-Reply-To: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> References: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> Message-ID: On Mon, 16 Nov 2020 20:53:21 GMT, Leonid Mesnik wrote: > The Utils.java in lib depends on NetworkConfiguration.java now. So NetworkConfiguration.java should be added to the list. > > Verified locally. LGTM, but please update the copyright year before integrating it. -- Igor ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1238 From lmesnik at openjdk.java.net Mon Nov 16 21:41:14 2020 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Mon, 16 Nov 2020 21:41:14 GMT Subject: RFR: 8256418: Jittester make build is broken. [v2] In-Reply-To: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> References: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> Message-ID: > The Utils.java in lib depends on NetworkConfiguration.java now. So NetworkConfiguration.java should be added to the list. > > Verified locally. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: Fixed copyrights. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1238/files - new: https://git.openjdk.java.net/jdk/pull/1238/files/2bf55e28..69d1302e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1238&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1238&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1238.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1238/head:pull/1238 PR: https://git.openjdk.java.net/jdk/pull/1238 From dnsimon at openjdk.java.net Mon Nov 16 22:42:09 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 16 Nov 2020 22:42:09 GMT Subject: RFR: 8253228: [JVMCI] provide more info on fatal JVMCI errors Message-ID: There are a number of places in JVMCI that abort VM execution due to unexpected Java exceptions and other error conditions. Currently, a message and the exception stack (if any) is printed and then the VM is shutdown via the global `vm_exit` function. This PR changes the behavior in this scenario to raise a fatal VM error so that a hs-err log file is produced which provides more context. ------------- Commit messages: - create hs-err log on fatal JVMCI errors Changes: https://git.openjdk.java.net/jdk/pull/1240/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1240&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253228 Stats: 34 lines in 4 files changed: 0 ins; 15 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/1240.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1240/head:pull/1240 PR: https://git.openjdk.java.net/jdk/pull/1240 From kvn at openjdk.java.net Mon Nov 16 23:21:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 23:21:06 GMT Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Fri, 13 Nov 2020 10:33:41 GMT, Eric Liu wrote: > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original > combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > > public static int test(int i) { > int a = (i >>> 29) | (i << -29); > int b = i << 3; > int c = i >>> -3; > int d = b | c; > return a ^ d; > } > > Before: > > lsl w12, w1, #3 > lsr w10, w1, #29 > add w11, w10, w12 > orr w12, w12, w10 > eor w0, w11, w12 > > After: > > ror w10, w1, #29 > eor w0, w10, w10 > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. opto/ changes look good to me ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1199 From kvn at openjdk.java.net Mon Nov 16 23:24:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 23:24:03 GMT Subject: RFR: 8253228: [JVMCI] provide more info on fatal JVMCI errors In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 22:08:20 GMT, Doug Simon wrote: > There are a number of places in JVMCI that abort VM execution due to unexpected Java exceptions and other error conditions. Currently, a message and the exception stack (if any) is printed and then the VM is shutdown via the global `vm_exit` function. > This PR changes the behavior in this scenario to raise a fatal VM error so that a hs-err log file is produced which provides more context. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1240 From kvn at openjdk.java.net Mon Nov 16 23:26:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 23:26:04 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 10:12:31 GMT, Aleksey Shipilev wrote: > arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. > > Additional testing: > - [x] Linux arm32 fastdebug, selected failing tests > - [x] Linux arm32 fastdebug, tier1 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1220 From never at openjdk.java.net Tue Nov 17 00:15:02 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 17 Nov 2020 00:15:02 GMT Subject: RFR: 8253228: [JVMCI] provide more info on fatal JVMCI errors In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 22:08:20 GMT, Doug Simon wrote: > There are a number of places in JVMCI that abort VM execution due to unexpected Java exceptions and other error conditions. Currently, a message and the exception stack (if any) is printed and then the VM is shutdown via the global `vm_exit` function. > This PR changes the behavior in this scenario to raise a fatal VM error so that a hs-err log file is produced which provides more context. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1240 From iignatyev at openjdk.java.net Tue Nov 17 00:36:10 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 00:36:10 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 Message-ID: Hi all, [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? Thanks -- Igor cc-ing @dcubed-ojdk ------------- Commit messages: - 8256430: add linux-x64-optimized to tier1 Changes: https://git.openjdk.java.net/jdk/pull/1244/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256430 Stats: 12 lines in 1 file changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1244.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1244/head:pull/1244 PR: https://git.openjdk.java.net/jdk/pull/1244 From kvn at openjdk.java.net Tue Nov 17 00:45:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 00:45:05 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: References: Message-ID: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> On Thu, 12 Nov 2020 13:17:10 GMT, Aleksey Shipilev wrote: >> Reproduces like this: >> >> $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" >> >> STDOUT: >> ### NanTest started >> Written and read back float values match >> 0x7F800001 0x7F800001 >> STDERR: >> java.lang.RuntimeException: Original and read back double values mismatch >> 0xFFF0000000000001 0xFFF8000000000001 >> >> at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) >> at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) >> >> After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. >> >> Additional testing: >> - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux AArch64 > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Indents and comments Why to run test code if you know the result will not match? Why not just skip testing? I thought about using `@requires vm.cpu.features` checks but your check in main() is simpler and more clear. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1187 From kvn at openjdk.java.net Tue Nov 17 01:17:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 01:17:02 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From kvn at openjdk.java.net Tue Nov 17 00:47:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 00:47:04 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v3] In-Reply-To: References: Message-ID: <8V71AeAjINfWaYMw7j0QGGaRo_6Emx9OMJZIGudKcgU=.ad59885f-ffd0-4b82-a4d6-b95fce97a5c5@github.com> On Fri, 13 Nov 2020 14:40:19 GMT, Claes Redestad wrote: >> - Remove unused (and subtly broken) Dict deep copy constructors >> - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in >> - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict >> >> This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op >> Patch: >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op >> >> (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) >> >> [1] when instrumenting with valgrind+callgrind > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into copy_shared_dict > - Dict clean-up, remove dead code > - Merge branch 'master' into copy_shared_dict > - Clean-up, copyrights > - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1186 From kvn at openjdk.java.net Tue Nov 17 01:21:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 01:21:03 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:59:30 GMT, Vladimir Ivanov wrote: > Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). > > The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. > > As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1134 From dcubed at openjdk.java.net Tue Nov 17 01:43:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 17 Nov 2020 01:43:01 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: <3XYPJS0vv9aGfb613MCbkJw0VvYIdwvE4T2iwCXhDCg=.602bf1fa-f90b-496a-a9be-ddafb13e153a@github.com> On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From github.com+10482586+erik1iu at openjdk.java.net Tue Nov 17 01:59:07 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Tue, 17 Nov 2020 01:59:07 GMT Subject: Integrated: 8254872: Optimize Rotate on AArch64 In-Reply-To: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com> Message-ID: On Fri, 13 Nov 2020 10:33:41 GMT, Eric Liu wrote: > This patch is a supplement for > https://bugs.openjdk.java.net/browse/JDK-8248830. > > With the implementation of rotate node in IR, this patch: > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > so that GVN could identify the pre-existing node better. > 2. implements scalar rotate match rules and removes the original > combinations of Or and Shifts on AArch64. > > This patch doesn't implement vector rotate due to the lack of > corresponding vector instructions on AArch64. > > Test case below is an explanation for this patch. > > public static int test(int i) { > int a = (i >>> 29) | (i << -29); > int b = i << 3; > int c = i >>> -3; > int d = b | c; > return a ^ d; > } > > Before: > > lsl w12, w1, #3 > lsr w10, w1, #29 > add w11, w10, w12 > orr w12, w12, w10 > eor w0, w11, w12 > > After: > > ror w10, w1, #29 > eor w0, w10, w10 > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. This pull request has now been integrated. Changeset: 30a2ad55 Author: Eric Liu Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/30a2ad55 Stats: 199 lines in 4 files changed: 29 ins; 119 del; 51 mod 8254872: Optimize Rotate on AArch64 Reviewed-by: aph, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1199 From darcy at openjdk.java.net Tue Nov 17 02:08:05 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Tue, 17 Nov 2020 02:08:05 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 17:53:16 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > 1: /* > 2: * Copyright (c) 2011,2020 Oracle and/or its affiliates. All rights reserved. The year of the copyright syntax is "2011, 2020,"; no need for a re-review before pushing after that correction. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Tue Nov 17 02:25:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 02:25:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> On Fri, 13 Nov 2020 17:53:16 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: > 494: movl(Address(rsp, 64), tmp); > 495: lea(tmp, ExternalAddress(static_const_table)); > 496: movsd(xmm0, Address(rsp, 128)); Can you explain this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Tue Nov 17 02:25:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 02:25:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> References: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> Message-ID: <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> On Tue, 17 Nov 2020 02:19:05 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large > > src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: > >> 494: movl(Address(rsp, 64), tmp); >> 495: lea(tmp, ExternalAddress(static_const_table)); >> 496: movsd(xmm0, Address(rsp, 128)); > > Can you explain this change? Would be nice to add comment about what values are on stack. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From dongbohe at openjdk.java.net Tue Nov 17 04:38:19 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Tue, 17 Nov 2020 04:38:19 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v4] In-Reply-To: References: Message-ID: > 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on Dongbo He has updated the pull request incrementally with one additional commit since the last revision: Simplified code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/853/files - new: https://git.openjdk.java.net/jdk/pull/853/files/12318bd4..e72ffde4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=02-03 Stats: 13 lines in 1 file changed: 4 ins; 8 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/853/head:pull/853 PR: https://git.openjdk.java.net/jdk/pull/853 From dongbohe at openjdk.java.net Tue Nov 17 04:48:04 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Tue, 17 Nov 2020 04:48:04 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on In-Reply-To: References: Message-ID: On Mon, 26 Oct 2020 15:13:39 GMT, Tobias Hartmann wrote: >> 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on > > We shouldn't do "backports" from project specific issues to mainline. Please file a separate issue for this. Thanks! tier1 and tier2 passed ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From github.com+10482586+erik1iu at openjdk.java.net Tue Nov 17 06:19:10 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Tue, 17 Nov 2020 06:19:10 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 Message-ID: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> This patch fixed some potential risks in assembler_aarch64.hpp. According to the C standard, shift operation is undefined if the shift count greater than or equals to the length in bits of the promoted left operand. In assembler_aarch64.hpp, there are some utility functions for easily operating the encoded instructions. E.g. Instruction_aarch64::patch(address, int, int, uint64_t) All those functions use `(1U << nbits) - 1` to calculate mask which may have some potential risks if `nbits` equals 32. That would be an unexpected result if someone intends to deal with an entire instruction. To fix this issue, this patch simply uses `1ULL` to replace `1U`. ------------- Commit messages: - 8256387: Unexpected result if patching an entire instruction on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/1248/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1248&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256387 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/1248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1248/head:pull/1248 PR: https://git.openjdk.java.net/jdk/pull/1248 From shade at openjdk.java.net Tue Nov 17 07:12:07 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 17 Nov 2020 07:12:07 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> References: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> Message-ID: On Tue, 17 Nov 2020 00:42:20 GMT, Vladimir Kozlov wrote: > Why to run test code if you know the result will not match? Why not just skip testing? > I thought about using `@requires vm.cpu.features` checks but your check in main() is simpler and more clear. Yes, I thought about just skipping the test initially. But then I realized I would like to know how exactly the results do not match for the excluded cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Tue Nov 17 07:14:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 17 Nov 2020 07:14:01 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 23:23:34 GMT, Vladimir Kozlov wrote: >> arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. >> >> Additional testing: >> - [x] Linux arm32 fastdebug, selected failing tests >> - [x] Linux arm32 fastdebug, tier1 > > Good. @rwestrel, are you good with this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/1220 From redestad at openjdk.java.net Tue Nov 17 07:19:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 07:19:03 GMT Subject: RFR: 8256274: C2: Optimize copying of the shared type dictionary [v3] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 14:36:33 GMT, Nils Eliasson wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into copy_shared_dict >> - Dict clean-up, remove dead code >> - Merge branch 'master' into copy_shared_dict >> - Clean-up, copyrights >> - 8256274: C2: Optimize copying of the shared type dictionary on compilation setup > > Still good! @neliasso @vnkozlov - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1186 From redestad at openjdk.java.net Tue Nov 17 07:19:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 07:19:04 GMT Subject: Integrated: 8256274: C2: Optimize copying of the shared type dictionary In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 13:01:35 GMT, Claes Redestad wrote: > - Remove unused (and subtly broken) Dict deep copy constructors > - Add a fixed, deep copy constructor to Dict which also takes the arena to allocate the copy in > - Use this new deep copy constructor in Type::Initialize instead of copying by iterating over the shared dict > > This reduce the overhead of each C2 compilation by ~32k instructions[1], with a noticeable gain on targeted micros such as `trivialMath_repeat_c2`. Baseline: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 396.004 ? 3.061 ms/op > Patch: > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 10 354.639 ? 1.929 ms/op > > (Each fork does 2000 repeated compilations, so the improvement is somewhere around 20us/compilation) > > [1] when instrumenting with valgrind+callgrind This pull request has now been integrated. Changeset: 12285172 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/12285172 Stats: 187 lines in 3 files changed: 10 ins; 75 del; 102 mod 8256274: C2: Optimize copying of the shared type dictionary Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1186 From neliasso at openjdk.java.net Tue Nov 17 08:20:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 17 Nov 2020 08:20:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 05:31:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Review comments resolved > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 ok - looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From roland at openjdk.java.net Tue Nov 17 08:33:20 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 08:33:20 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v7] In-Reply-To: References: Message-ID: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 - Merge branch 'master' into JDK-8255150 - Merge branch 'master' into JDK-8255150 - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 - exclude compiler test when run with -Xcomp - CastLL should define carry_depency - intrinsic comments - Jorn's comments - Update headers and add intrinsic to Graal test ignore list - move compiler test and add bug to test - ... and 17 more: https://git.openjdk.java.net/jdk/compare/4553fa0b...feb32943 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1003/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1003&range=06 Stats: 897 lines in 30 files changed: 846 ins; 4 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/1003.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1003/head:pull/1003 PR: https://git.openjdk.java.net/jdk/pull/1003 From dlong at openjdk.java.net Tue Nov 17 08:43:09 2020 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 17 Nov 2020 08:43:09 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v7] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 08:33:20 GMT, Roland Westrelin wrote: >> This change add 3 new methods in Objects: >> >> public static long checkIndex(long index, long length) >> public static long checkFromToIndex(long fromIndex, long toIndex, long length) >> public static long checkFromIndexSize(long fromIndex, long size, long length) >> >> This mirrors the int utility methods that were added by JDK-8135248 >> with the same motivations. >> >> As is the case with the int checkIndex(), the long checkIndex() method >> is JIT compiled as an intrinsic. It allows the JIT to compile >> checkIndex to an unsigned comparison and properly recognize it as >> a range check that then becomes a candidate for the existing range check >> optimizations. This has proven to be important for panama's >> MemorySegment API and a prototype of this change (with some extra c2 >> improvements) showed that panama micro benchmark results improve >> significantly. >> >> This change includes: >> >> - the API change >> - the C2 intrinsic >> - tests for the API and the C2 intrinsic >> >> This is a joint work with Paul who reviewed and reworked the API change >> and filled the CSR. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 > - Merge branch 'master' into JDK-8255150 > - Merge branch 'master' into JDK-8255150 > - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 > - exclude compiler test when run with -Xcomp > - CastLL should define carry_depency > - intrinsic comments > - Jorn's comments > - Update headers and add intrinsic to Graal test ignore list > - move compiler test and add bug to test > - ... and 17 more: https://git.openjdk.java.net/jdk/compare/4553fa0b...feb32943 Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From thartmann at openjdk.java.net Tue Nov 17 08:54:19 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 08:54:19 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist Message-ID: `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. Thanks, Tobias ------------- Commit messages: - 8256385: C2: fatal error: modified node is not on IGVN._worklist Changes: https://git.openjdk.java.net/jdk/pull/1252/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1252&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256385 Stats: 55 lines in 2 files changed: 55 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1252.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1252/head:pull/1252 PR: https://git.openjdk.java.net/jdk/pull/1252 From chagedorn at openjdk.java.net Tue Nov 17 09:04:02 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 17 Nov 2020 09:04:02 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: Message-ID: <-nradKvt3P7nEjArYyNZ7nJe43RININkMNkId-7RCXc=.b6a891d4-b62f-4eea-bff3-8e1eb0ecf783@github.com> On Tue, 17 Nov 2020 08:49:32 GMT, Tobias Hartmann wrote: > `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. > > Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1252 From chagedorn at openjdk.java.net Tue Nov 17 09:05:10 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 17 Nov 2020 09:05:10 GMT Subject: RFR: 8255058: C1: assert(is_virtual()) failed: type check In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:41:16 GMT, Vladimir Kozlov wrote: >> The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 >> In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 >> https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 >> As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. >> >> The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. >> >> Thanks, >> Christian > > Good. @vnkozlov Thank you for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1202 From roland at openjdk.java.net Tue Nov 17 09:13:03 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 09:13:03 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: <-nradKvt3P7nEjArYyNZ7nJe43RININkMNkId-7RCXc=.b6a891d4-b62f-4eea-bff3-8e1eb0ecf783@github.com> References: <-nradKvt3P7nEjArYyNZ7nJe43RININkMNkId-7RCXc=.b6a891d4-b62f-4eea-bff3-8e1eb0ecf783@github.com> Message-ID: On Tue, 17 Nov 2020 09:00:50 GMT, Christian Hagedorn wrote: >> `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. >> >> Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. >> >> Thanks, >> Tobias > > Looks good to me. There's a call to _igvn.remove_dead_node(mm) below but the problem is that it's not always reached. I hit a similar problem while reworking the long counted loop support and fixed it by refactoring the code to: MergeMemNode* mm = NULL; #ifdef ASSERT if (mem->is_MergeMem()) { mm = mem->clone()->as_MergeMem(); for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { if (mms.alias_idx() != Compile::AliasIdxBot && loop != get_loop(ctrl_or_self(mms.memory()))) { mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); } } } #endif if (!no_side_effect_since_safepoint(C, x, mem, mm)) { safepoint = NULL; } else { assert(mm == NULL|| _igvn.transform(mm) == mem->as_MergeMem()->base_memory(), "all memory state should have been processed"); } #ifdef ASSERT if (mm != NULL) { _igvn.remove_dead_node(mm); } #endif with a new method: static bool no_side_effect_since_safepoint(Compile* C, Node* x, Node* mem, MergeMemNode* mm) { SafePointNode* safepoint = NULL; for (DUIterator_Fast imax, i = x->fast_outs(imax); i < imax; i++) { Node* u = x->fast_out(i); if (u->is_Phi() && u->bottom_type() == Type::MEMORY) { Node* m = u->in(LoopNode::LoopBackControl); if (u->adr_type() == TypePtr::BOTTOM) { if (m->is_MergeMem() && mem->is_MergeMem()) { if (m != mem DEBUG_ONLY(|| true)) { for (MergeMemStream mms(m->as_MergeMem(), mem->as_MergeMem()); mms.next_non_empty2(); ) { if (!mms.is_empty()) { if (mms.memory() != mms.memory2()) { return false; } #ifdef ASSERT if (mms.alias_idx() != Compile::AliasIdxBot) { mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); } #endif } } } } else if (mem->is_MergeMem()) { if (m != mem->as_MergeMem()->base_memory()) { return false; } } else { return false; } } else { if (mem->is_MergeMem()) { if (m != mem->as_MergeMem()->memory_at(C->get_alias_index(u->adr_type()))) { return false; } #ifdef ASSERT mm->set_memory_at(C->get_alias_index(u->adr_type()), mem->as_MergeMem()->base_memory()); #endif } else { if (m != mem) { return false; } } } } } return true; } ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From chagedorn at openjdk.java.net Tue Nov 17 09:19:10 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 17 Nov 2020 09:19:10 GMT Subject: Integrated: 8255058: C1: assert(is_virtual()) failed: type check In-Reply-To: References: Message-ID: <9BHRxxXGMv473UsIrqKqAT10bu2XCYx2uIimagXf8Bw=.a1ad007c-83ef-40e9-be24-eee171cfb84f@github.com> On Fri, 13 Nov 2020 11:59:14 GMT, Christian Hagedorn wrote: > The following code for handling phi functions of an exception entry block in the method `LinearScan::resolve_exception_edge` assumes that pinned `Constant` instructions (executing the else case) have a virtual operand and therefore an interval assigned: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_LinearScan.cpp#L1944-L1952 > In the testcase, however, this is not the case: A `Constant` instruction with a constant operand that is part of the long addition chain in the assignment for `iFld` (starting on L52) is pinned in `UseCountComputer::block_do`because it recursed too deeply: > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L388-L392 > https://github.com/openjdk/jdk/blob/05b824567c346a7d136c01d23f56a908e7efc6d7/src/hotspot/share/c1/c1_IR.cpp#L414-L423 > As a result, the else case is executed in `LinearScan::resolve_exception_edge` which results in this assertion failure because `vreg_num()` only works on virtual operands that belong to an interval. > > The fix is straight forward to also do a mapping to an interval for pinned `Constant` instructions with constant operands as we already do for non-pinned `Constant` instructions. > > Thanks, > Christian This pull request has now been integrated. Changeset: 5dbfae01 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/5dbfae01 Stats: 74 lines in 2 files changed: 71 ins; 1 del; 2 mod 8255058: C1: assert(is_virtual()) failed: type check Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1202 From redestad at openjdk.java.net Tue Nov 17 09:20:13 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 09:20:13 GMT Subject: RFR: 8256392: C2: Various Node cleanups Message-ID: - Unused methods: Node::walk, walk_, nop, lookup, rpop Node_Array::reset, sort - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) ------------- Commit messages: - Merge branch 'master' into node_cleanup - Merge branch 'master' into node_cleanup - Remove unused code - More cleanups - Merge branch 'master' into node_cleanup - Remove unused code in node Changes: https://git.openjdk.java.net/jdk/pull/1223/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1223&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256392 Stats: 116 lines in 3 files changed: 11 ins; 65 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/1223.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1223/head:pull/1223 PR: https://git.openjdk.java.net/jdk/pull/1223 From neliasso at openjdk.java.net Tue Nov 17 09:20:14 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 17 Nov 2020 09:20:14 GMT Subject: RFR: 8256392: C2: Various Node cleanups In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:22:26 GMT, Claes Redestad wrote: > - Unused methods: > Node::walk, walk_, nop, lookup, rpop > Node_Array::reset, sort > - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1223 From redestad at openjdk.java.net Tue Nov 17 09:28:19 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 09:28:19 GMT Subject: RFR: 8256392: C2: Various Node cleanups [v2] In-Reply-To: References: Message-ID: > - Unused methods: > Node::walk, walk_, nop, lookup, rpop > Node_Array::reset, sort > - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Add back Node::lookup: used by ppc ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1223/files - new: https://git.openjdk.java.net/jdk/pull/1223/files/4bd17404..c5f54790 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1223&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1223&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1223.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1223/head:pull/1223 PR: https://git.openjdk.java.net/jdk/pull/1223 From magnus.ihse.bursie at oracle.com Tue Nov 17 09:37:31 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 17 Nov 2020 10:37:31 +0100 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Hi Igor, There is a long-standing bug with the intent to remove optimized builds (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that it does not seem that popular, I wonder if it really is necessary to burden the submit workflow (and tier1 testing, as requested in https://bugs.openjdk.java.net/browse/JDK-8256430) with this. At the very least, I'd like to get some input from more Hotspot developers to hear if they think it is a worthy cause to spend our resources at. Otherwise, I believe a better way forward is to follow through on JDK-8183287, viz. to split up optimized builds into the two extra components it actually provides: enable diagnostic code in normal release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and enable tracing with INCLUDE_PRINT (https://bugs.openjdk.java.net/browse/JDK-8202283). /Magnus On 2020-11-16 19:33, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor > > ------------- > > Commit messages: > - add linux-x64-optimized build > > Changes: https://git.openjdk.java.net/jdk/pull/1233/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 > Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod > Patch: https://git.openjdk.java.net/jdk/pull/1233.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 > > PR: https://git.openjdk.java.net/jdk/pull/1233 From thartmann at openjdk.java.net Tue Nov 17 09:40:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 09:40:08 GMT Subject: Integrated: 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 10:56:11 GMT, Tobias Hartmann wrote: > [JDK-8239072](https://bugs.openjdk.java.net/browse/JDK-8239072) added a default case to the switch statement in PhaseMacroExpand::expand_macro_nodes: > https://hg.openjdk.java.net/jdk/jdk/rev/cf319f17c647#l3.67 > > This allows to merge the "expansion must have deleted one node from macro list" asserts and remove the duplicate asserts. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 6d878565 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/6d878565 Stats: 9 lines in 1 file changed: 0 ins; 4 del; 5 mod 8256325: Remove duplicate asserts in PhaseMacroExpand::expand_macro_nodes Reviewed-by: shade, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/1200 From thartmann at openjdk.java.net Tue Nov 17 09:44:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 09:44:05 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: <-nradKvt3P7nEjArYyNZ7nJe43RININkMNkId-7RCXc=.b6a891d4-b62f-4eea-bff3-8e1eb0ecf783@github.com> Message-ID: On Tue, 17 Nov 2020 09:09:59 GMT, Roland Westrelin wrote: >> Looks good to me. > > There's a call to _igvn.remove_dead_node(mm) below but the problem is that it's not always reached. I hit a similar problem while reworking the long counted loop support and fixed it by refactoring the code to: > > > MergeMemNode* mm = NULL; > #ifdef ASSERT > if (mem->is_MergeMem()) { > mm = mem->clone()->as_MergeMem(); > for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { > if (mms.alias_idx() != Compile::AliasIdxBot && loop != get_loop(ctrl_or_self(mms.memory()))) { > mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); > } > } > } > #endif > if (!no_side_effect_since_safepoint(C, x, mem, mm)) { > safepoint = NULL; > } else { > assert(mm == NULL|| _igvn.transform(mm) == mem->as_MergeMem()->base_memory(), "all memory state should have been processed"); > } > #ifdef ASSERT > if (mm != NULL) { > _igvn.remove_dead_node(mm); > } > #endif > with a new method: > static bool no_side_effect_since_safepoint(Compile* C, Node* x, Node* mem, MergeMemNode* mm) { > SafePointNode* safepoint = NULL; > for (DUIterator_Fast imax, i = x->fast_outs(imax); i < imax; i++) { > Node* u = x->fast_out(i); > if (u->is_Phi() && u->bottom_type() == Type::MEMORY) { > Node* m = u->in(LoopNode::LoopBackControl); > if (u->adr_type() == TypePtr::BOTTOM) { > if (m->is_MergeMem() && mem->is_MergeMem()) { > if (m != mem DEBUG_ONLY(|| true)) { > for (MergeMemStream mms(m->as_MergeMem(), mem->as_MergeMem()); mms.next_non_empty2(); ) { > if (!mms.is_empty()) { > if (mms.memory() != mms.memory2()) { > return false; > } > #ifdef ASSERT > if (mms.alias_idx() != Compile::AliasIdxBot) { > mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); > } > #endif > } > } > } > } else if (mem->is_MergeMem()) { > if (m != mem->as_MergeMem()->base_memory()) { > return false; > } > } else { > return false; > } > } else { > if (mem->is_MergeMem()) { > if (m != mem->as_MergeMem()->memory_at(C->get_alias_index(u->adr_type()))) { > return false; > } > #ifdef ASSERT > mm->set_memory_at(C->get_alias_index(u->adr_type()), mem->as_MergeMem()->base_memory()); > #endif > } else { > if (m != mem) { > return false; > } > } > } > } > } > return true; > } Thanks for the reviews! @rwestrel Yes, exactly. I was thinking about such a refactoring as well but then decided to go with the simple fix because the code is guarded by `#ifdef ASSERT` anyway. Do you intent to integrate that refactoring with another change? In that case I would like to use the simple fix here. ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From magnus.ihse.bursie at oracle.com Tue Nov 17 09:47:26 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 17 Nov 2020 10:47:26 +0100 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> References: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Message-ID: <282bed4c-56b5-3da2-1bbe-a9ff54edf91b@oracle.com> I see now that this PR was already integrated. I think that any change to the submit workflow (if they add additional testing, and not just fix bugs) is a non-trivial change which needs careful consideration. We have already had a huge influx of additional build platforms in a very short time. Each additional platform is subject to any kind of build issues, not all of which might be related to the actual patch, and we therefore need to weight the benefits of getting additional testing of build platforms to the risks that this might cause unnecessary road blocks for developers. Also, I believe it is good practice when changing build code to make sure that at least one reviewer is a member of the Build Group (https://openjdk.java.net/census#build). This is not something Skara can enforce, so it is dependent on the good will of committers (who should notify the correct set of reviewers), and of JDK Reviewers to specify if they believe additional reviewers from any particular area is needed. /Magnus On 2020-11-17 10:37, Magnus Ihse Bursie wrote: > Hi Igor, > > There is a long-standing bug with the intent to remove optimized > builds (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that > it does not seem that popular, I wonder if it really is necessary to > burden the submit workflow (and tier1 testing, as requested in > https://bugs.openjdk.java.net/browse/JDK-8256430) with this. > > At the very least, I'd like to get some input from more Hotspot > developers to hear if they think it is a worthy cause to spend our > resources at. > > Otherwise, I believe a better way forward is to follow through on > JDK-8183287, viz. to split up optimized builds into the two extra > components it actually provides: enable diagnostic code in normal > release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and > enable tracing with INCLUDE_PRINT > (https://bugs.openjdk.java.net/browse/JDK-8202283). > > /Magnus > > On 2020-11-16 19:33, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds >> `linux-x64-optimized` build to submit workflow so breakages of this >> build flavor would be easier to spot? >> >> Thanks, >> -- Igor >> >> ------------- >> >> Commit messages: >> ? - add linux-x64-optimized build >> >> Changes: https://git.openjdk.java.net/jdk/pull/1233/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 >> ?? Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod >> ?? Patch: https://git.openjdk.java.net/jdk/pull/1233.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >> pull/1233/head:pull/1233 >> >> PR: https://git.openjdk.java.net/jdk/pull/1233 > From roland at openjdk.java.net Tue Nov 17 09:48:07 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 09:48:07 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 08:49:32 GMT, Tobias Hartmann wrote: > `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. > > Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. > > Thanks, > Tobias Marked as reviewed by roland (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From roland at openjdk.java.net Tue Nov 17 09:48:08 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 09:48:08 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: <-nradKvt3P7nEjArYyNZ7nJe43RININkMNkId-7RCXc=.b6a891d4-b62f-4eea-bff3-8e1eb0ecf783@github.com> Message-ID: On Tue, 17 Nov 2020 09:09:59 GMT, Roland Westrelin wrote: >> Looks good to me. > > There's a call to _igvn.remove_dead_node(mm) below but the problem is that it's not always reached. I hit a similar problem while reworking the long counted loop support and fixed it by refactoring the code to: > > > MergeMemNode* mm = NULL; > #ifdef ASSERT > if (mem->is_MergeMem()) { > mm = mem->clone()->as_MergeMem(); > for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { > if (mms.alias_idx() != Compile::AliasIdxBot && loop != get_loop(ctrl_or_self(mms.memory()))) { > mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); > } > } > } > #endif > if (!no_side_effect_since_safepoint(C, x, mem, mm)) { > safepoint = NULL; > } else { > assert(mm == NULL|| _igvn.transform(mm) == mem->as_MergeMem()->base_memory(), "all memory state should have been processed"); > } > #ifdef ASSERT > if (mm != NULL) { > _igvn.remove_dead_node(mm); > } > #endif > with a new method: > static bool no_side_effect_since_safepoint(Compile* C, Node* x, Node* mem, MergeMemNode* mm) { > SafePointNode* safepoint = NULL; > for (DUIterator_Fast imax, i = x->fast_outs(imax); i < imax; i++) { > Node* u = x->fast_out(i); > if (u->is_Phi() && u->bottom_type() == Type::MEMORY) { > Node* m = u->in(LoopNode::LoopBackControl); > if (u->adr_type() == TypePtr::BOTTOM) { > if (m->is_MergeMem() && mem->is_MergeMem()) { > if (m != mem DEBUG_ONLY(|| true)) { > for (MergeMemStream mms(m->as_MergeMem(), mem->as_MergeMem()); mms.next_non_empty2(); ) { > if (!mms.is_empty()) { > if (mms.memory() != mms.memory2()) { > return false; > } > #ifdef ASSERT > if (mms.alias_idx() != Compile::AliasIdxBot) { > mm->set_memory_at(mms.alias_idx(), mem->as_MergeMem()->base_memory()); > } > #endif > } > } > } > } else if (mem->is_MergeMem()) { > if (m != mem->as_MergeMem()->base_memory()) { > return false; > } > } else { > return false; > } > } else { > if (mem->is_MergeMem()) { > if (m != mem->as_MergeMem()->memory_at(C->get_alias_index(u->adr_type()))) { > return false; > } > #ifdef ASSERT > mm->set_memory_at(C->get_alias_index(u->adr_type()), mem->as_MergeMem()->base_memory()); > #endif > } else { > if (m != mem) { > return false; > } > } > } > } > } > return true; > } > @rwestrel Yes, exactly. I was thinking about such a refactoring as well but then decided to go with the simple fix because the code is guarded by `#ifdef ASSERT` anyway. Do you intent to integrate that refactoring with another change? In that case I would like to use the simple fix here. Using the simple fix is fine. I'll reactor the code in a subsequent change. ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From thartmann at openjdk.java.net Tue Nov 17 09:49:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 09:49:05 GMT Subject: RFR: 8256392: C2: Various Node cleanups [v2] In-Reply-To: References: Message-ID: <0V-19SQBnVxoMAzrFrgzvzR7yTlqVfo6xk7cQherz8E=.1566273b-9def-4c31-8890-222ae99bce1a@github.com> On Tue, 17 Nov 2020 09:28:19 GMT, Claes Redestad wrote: >> - Unused methods: >> Node::walk, walk_, nop, lookup, rpop >> Node_Array::reset, sort >> - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Add back Node::lookup: used by ppc Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1223 From thartmann at openjdk.java.net Tue Nov 17 09:53:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 09:53:05 GMT Subject: RFR: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 09:44:54 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. >> >> Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. >> >> Thanks, >> Tobias > > Marked as reviewed by roland (Reviewer). Sounds good, thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From jvernee at openjdk.java.net Tue Nov 17 10:11:06 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 17 Nov 2020 10:11:06 GMT Subject: RFR: 8255150: Add utility methods to check long indexes and ranges [v7] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 08:33:20 GMT, Roland Westrelin wrote: >> This change add 3 new methods in Objects: >> >> public static long checkIndex(long index, long length) >> public static long checkFromToIndex(long fromIndex, long toIndex, long length) >> public static long checkFromIndexSize(long fromIndex, long size, long length) >> >> This mirrors the int utility methods that were added by JDK-8135248 >> with the same motivations. >> >> As is the case with the int checkIndex(), the long checkIndex() method >> is JIT compiled as an intrinsic. It allows the JIT to compile >> checkIndex to an unsigned comparison and properly recognize it as >> a range check that then becomes a candidate for the existing range check >> optimizations. This has proven to be important for panama's >> MemorySegment API and a prototype of this change (with some extra c2 >> improvements) showed that panama micro benchmark results improve >> significantly. >> >> This change includes: >> >> - the API change >> - the C2 intrinsic >> - tests for the API and the C2 intrinsic >> >> This is a joint work with Paul who reviewed and reworked the API change >> and filled the CSR. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 > - Merge branch 'master' into JDK-8255150 > - Merge branch 'master' into JDK-8255150 > - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8255150 > - exclude compiler test when run with -Xcomp > - CastLL should define carry_depency > - intrinsic comments > - Jorn's comments > - Update headers and add intrinsic to Graal test ignore list > - move compiler test and add bug to test > - ... and 17 more: https://git.openjdk.java.net/jdk/compare/4553fa0b...feb32943 Marked as reviewed by jvernee (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From thartmann at openjdk.java.net Tue Nov 17 10:14:06 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 10:14:06 GMT Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken In-Reply-To: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: On Mon, 9 Nov 2020 13:35:35 GMT, Vladimir Ivanov wrote: > Vector box allocation is a multi-step process: it involves 2 allocations (typed vector instance + primitive array) and 2 initializing stores (vector value store into primitive array and field initialization in typed vector instance). If deoptimization happens at any of the aforementioned allocation points the result is broken: either wrong instance is put on stack (primitive array instead of typed vector) or vector initializing store is missing. > > There are 2 ways to fix the problem: > - piggy-back on rematerialization; > - reexecute the bytecode which allocates the instance. > > I chose the latter option because it's simpler to implement. (Rematerialization requires some adjustment of JVM state associated with each allocation to record vector type and value.) > > The downside is there shouldn't be any side effects present. It's not a problem right now, because boxing happens only at vector intrinsics use sites and the only intrinsic which has any side effects is vector store operation (it doesn't produce vectors, hence, no boxing needed). > > The actual fix is small: adding `PreserveReexecuteState` in `LibraryCallKit::box_vector` is enough for the problem to go away. The rest is cleanups/refactorings. > > Testing: > - [x] jdk/incubator/vector tests w/ -XX:+DeoptimizeALot & -XX:UseAVX={3,2,1,0} > - [ ] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1126 From neliasso at openjdk.java.net Tue Nov 17 10:29:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 17 Nov 2020 10:29:08 GMT Subject: RFR: 8256392: C2: Various Node cleanups [v2] In-Reply-To: References: Message-ID: <6OejIIUWgX_UThU7lyhSJtjfnklYKsxKlopEd18pLpQ=.3ddf0b95-abdb-4e2d-a0bb-9b39ff94d3ce@github.com> On Tue, 17 Nov 2020 09:28:19 GMT, Claes Redestad wrote: >> - Unused methods: >> Node::walk, walk_, nop, lookup, rpop >> Node_Array::reset, sort >> - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Add back Node::lookup: used by ppc Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1223 From roland at openjdk.java.net Tue Nov 17 10:41:07 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 10:41:07 GMT Subject: Integrated: 8255150: Add utility methods to check long indexes and ranges In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 10:47:18 GMT, Roland Westrelin wrote: > This change add 3 new methods in Objects: > > public static long checkIndex(long index, long length) > public static long checkFromToIndex(long fromIndex, long toIndex, long length) > public static long checkFromIndexSize(long fromIndex, long size, long length) > > This mirrors the int utility methods that were added by JDK-8135248 > with the same motivations. > > As is the case with the int checkIndex(), the long checkIndex() method > is JIT compiled as an intrinsic. It allows the JIT to compile > checkIndex to an unsigned comparison and properly recognize it as > a range check that then becomes a candidate for the existing range check > optimizations. This has proven to be important for panama's > MemorySegment API and a prototype of this change (with some extra c2 > improvements) showed that panama micro benchmark results improve > significantly. > > This change includes: > > - the API change > - the C2 intrinsic > - tests for the API and the C2 intrinsic > > This is a joint work with Paul who reviewed and reworked the API change > and filled the CSR. This pull request has now been integrated. Changeset: a7422ac2 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/a7422ac2 Stats: 897 lines in 30 files changed: 846 ins; 4 del; 47 mod 8255150: Add utility methods to check long indexes and ranges Co-authored-by: Paul Sandoz Reviewed-by: jvernee, dlong, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1003 From vlivanov at openjdk.java.net Tue Nov 17 10:49:04 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 17 Nov 2020 10:49:04 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From vlivanov at openjdk.java.net Tue Nov 17 10:50:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 17 Nov 2020 10:50:03 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 04:38:19 GMT, Dongbo He wrote: >> 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on > > Dongbo He has updated the pull request incrementally with one additional commit since the last revision: > > Simplified code Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/853 From vladimir.x.ivanov at oracle.com Tue Nov 17 11:02:18 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 17 Nov 2020 14:02:18 +0300 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> References: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Message-ID: <58ab29bb-9170-78cd-97d1-87505ecb36bb@oracle.com> > There is a long-standing bug with the intent to remove optimized builds > (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that it does > not seem that popular, I wonder if it really is necessary to burden the > submit workflow (and tier1 testing, as requested in > https://bugs.openjdk.java.net/browse/JDK-8256430) with this. > > At the very least, I'd like to get some input from more Hotspot > developers to hear if they think it is a worthy cause to spend our > resources at. > > Otherwise, I believe a better way forward is to follow through on > JDK-8183287, viz. to split up optimized builds into the two extra > components it actually provides: enable diagnostic code in normal > release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and > enable tracing with INCLUDE_PRINT > (https://bugs.openjdk.java.net/browse/JDK-8202283). I find !PRODUCT vs ASSERT distinction confusing, but irrespective of the way the relevant code is guarded (!PRODUCT or INCLUDE_PRINT), it has to be built regularly to avoid the rot. So, once the bugs you mentioned are addressed, optimized build can be replaced with release build + tracing configuration. Regarding the most appropriate tier to put it, I don't think it has to be part of tier1. IMO later tiers are fine as well. But having it in tier1 doesn't look like a significant waste of resources. I don't think there's a notion of tiers in submit workflow, so I'm strongly in favor of having optimized configuration built there. Best regards, Vladimir Ivanov > On 2020-11-16 19:33, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds >> `linux-x64-optimized` build to submit workflow so breakages of this >> build flavor would be easier to spot? >> >> Thanks, >> -- Igor >> >> ------------- >> >> Commit messages: >> ? - add linux-x64-optimized build >> >> Changes: https://git.openjdk.java.net/jdk/pull/1233/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 >> ?? Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod >> ?? Patch: https://git.openjdk.java.net/jdk/pull/1233.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >> pull/1233/head:pull/1233 >> >> PR: https://git.openjdk.java.net/jdk/pull/1233 > From dnsimon at openjdk.java.net Tue Nov 17 11:24:05 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 17 Nov 2020 11:24:05 GMT Subject: RFR: 8253228: [JVMCI] provide more info on fatal JVMCI errors In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:12:04 GMT, Tom Rodriguez wrote: >> There are a number of places in JVMCI that abort VM execution due to unexpected Java exceptions and other error conditions. Currently, a message and the exception stack (if any) is printed and then the VM is shutdown via the global `vm_exit` function. >> This PR changes the behavior in this scenario to raise a fatal VM error so that a hs-err log file is produced which provides more context. > > Marked as reviewed by never (Reviewer). Thanks @tkrodriguez and @vnkozlov for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1240 From dnsimon at openjdk.java.net Tue Nov 17 11:24:06 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 17 Nov 2020 11:24:06 GMT Subject: Integrated: 8253228: [JVMCI] provide more info on fatal JVMCI errors In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 22:08:20 GMT, Doug Simon wrote: > There are a number of places in JVMCI that abort VM execution due to unexpected Java exceptions and other error conditions. Currently, a message and the exception stack (if any) is printed and then the VM is shutdown via the global `vm_exit` function. > This PR changes the behavior in this scenario to raise a fatal VM error so that a hs-err log file is produced which provides more context. This pull request has now been integrated. Changeset: adb8561a Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/adb8561a Stats: 34 lines in 4 files changed: 0 ins; 15 del; 19 mod 8253228: [JVMCI] provide more info on fatal JVMCI errors Reviewed-by: kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/1240 From redestad at openjdk.java.net Tue Nov 17 11:33:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 11:33:14 GMT Subject: RFR: 8256453: C2: Reduce State footprint Message-ID: State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344 Much of this is due to the two 32-bit int arrays _cost and _rule. The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space. Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object. Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root ------------- Commit messages: - Fix 'comparison is always true' build issue on newer compiler - More cleanup - Clean up State::dump - Add sanity assertion - Clean up - Merge branch 'master' into state_opts - Merge branch 'master' into state_opts - Merge _rule and _valid, encapsulate _rule and _cost, simplify DFA productions - Merge branch 'master' into state_opts - uint16_t _rule Changes: https://git.openjdk.java.net/jdk/pull/1253/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1253&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256453 Stats: 111 lines in 3 files changed: 18 ins; 37 del; 56 mod Patch: https://git.openjdk.java.net/jdk/pull/1253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1253/head:pull/1253 PR: https://git.openjdk.java.net/jdk/pull/1253 From vlivanov at openjdk.java.net Tue Nov 17 12:06:08 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 17 Nov 2020 12:06:08 GMT Subject: RFR: 8256392: C2: Various Node cleanups [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 09:28:19 GMT, Claes Redestad wrote: >> - Unused methods: >> Node::walk, walk_, nop, lookup, rpop >> Node_Array::reset, sort >> - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Add back Node::lookup: used by ppc Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1223 From redestad at openjdk.java.net Tue Nov 17 12:09:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 12:09:04 GMT Subject: RFR: 8256392: C2: Various Node cleanups [v2] In-Reply-To: <6OejIIUWgX_UThU7lyhSJtjfnklYKsxKlopEd18pLpQ=.3ddf0b95-abdb-4e2d-a0bb-9b39ff94d3ce@github.com> References: <6OejIIUWgX_UThU7lyhSJtjfnklYKsxKlopEd18pLpQ=.3ddf0b95-abdb-4e2d-a0bb-9b39ff94d3ce@github.com> Message-ID: On Tue, 17 Nov 2020 10:25:59 GMT, Nils Eliasson wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Add back Node::lookup: used by ppc > > Looks good. @neliasso @TobiHartmann @iwanowww - thank you for reviewing! Just waiting for the PPC build to turn green ------------- PR: https://git.openjdk.java.net/jdk/pull/1223 From shade at openjdk.java.net Tue Nov 17 12:14:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 17 Nov 2020 12:14:03 GMT Subject: Integrated: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 10:12:31 GMT, Aleksey Shipilev wrote: > arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. > > Additional testing: > - [x] Linux arm32 fastdebug, selected failing tests > - [x] Linux arm32 fastdebug, tier1 This pull request has now been integrated. Changeset: 3dcde557 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/3dcde557 Stats: 44 lines in 1 file changed: 44 ins; 0 del; 0 mod 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 Reviewed-by: azeemj, kvn, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/1220 From shade at openjdk.java.net Tue Nov 17 12:14:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 17 Nov 2020 12:14:02 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 12:09:25 GMT, Roland Westrelin wrote: >> arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. >> >> Additional testing: >> - [x] Linux arm32 fastdebug, selected failing tests >> - [x] Linux arm32 fastdebug, tier1 > > Looks good to me Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1220 From roland at openjdk.java.net Tue Nov 17 12:14:01 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 12:14:01 GMT Subject: RFR: 8256386: ARM32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 10:12:31 GMT, Aleksey Shipilev wrote: > arm32 fails similarly to x86_32, that was fixed with [JDK-8255224](https://bugs.openjdk.java.net/browse/JDK-8255224), and probably for the same reason: missing `CMovL` rules. There are lots of `tier1` test failures with the similar crash message, see bug for the example. The fix is to add rules matching `flagsReg*U*L`, similarly to as it was done for x86_32. > > Additional testing: > - [x] Linux arm32 fastdebug, selected failing tests > - [x] Linux arm32 fastdebug, tier1 Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1220 From redestad at openjdk.java.net Tue Nov 17 12:28:05 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 12:28:05 GMT Subject: Integrated: 8256392: C2: Various Node cleanups In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:22:26 GMT, Claes Redestad wrote: > - Unused methods: > Node::walk, walk_, nop, lookup, rpop > Node_Array::reset, sort > - Disallowing 0 as values to OptoNodeListSize and OptoBlockListSize can simplify some logic: e.g., Node_Array::_max > 0 becomes an invariant (with reset unused and removed) This pull request has now been integrated. Changeset: 654ad274 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/654ad274 Stats: 114 lines in 3 files changed: 11 ins; 63 del; 40 mod 8256392: C2: Various Node cleanups Reviewed-by: neliasso, thartmann, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1223 From roland at openjdk.java.net Tue Nov 17 12:38:23 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 12:38:23 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v4] In-Reply-To: References: Message-ID: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - fix - test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1073/files - new: https://git.openjdk.java.net/jdk/pull/1073/files/bd85cdfc..15fbccd8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1073&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1073&range=02-03 Stats: 13157 lines in 238 files changed: 8705 ins; 2419 del; 2033 mod Patch: https://git.openjdk.java.net/jdk/pull/1073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1073/head:pull/1073 PR: https://git.openjdk.java.net/jdk/pull/1073 From neliasso at openjdk.java.net Tue Nov 17 12:40:01 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 17 Nov 2020 12:40:01 GMT Subject: RFR: 8256453: C2: Reduce State footprint In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 09:00:46 GMT, Claes Redestad wrote: > State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344 > > Much of this is due to the two 32-bit int arrays _cost and _rule. > > The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space. > > Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object. > > Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1253 From roland at openjdk.java.net Tue Nov 17 13:52:08 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 13:52:08 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: References: <5gcJ-y5dzLhf1AAx2Hl4I5Lo3EBbu6zvdia5d_qzuAU=.c0fcfef9-6429-4481-bb7d-9a43227476ef@github.com> Message-ID: <7xRg4xNeDVIQI8KOYK_-7nnwBGBZSKaxu3KL-rpECHs=.bce2af90-9de9-4908-821d-30691b610695@github.com> On Tue, 17 Nov 2020 13:47:25 GMT, Roland Westrelin wrote: >> hi, @rwestrel >> >> Can we have an assertion to make this loop invariant more prominent? >> `I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. ` > >> Can we have an assertion to make this loop invariant more prominent? >> `I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. ` > > Thanks for the suggestion. Actually I reworked the fix so it has nothing to do with the previous one. I pushed a new fix that's different from the previous one. I realized that NeverBranchNode::Ideal() should remove the NeverBranch once the barrier is expanded but we don't run igvn after barrier expansion which is a mistake. ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From roland at openjdk.java.net Tue Nov 17 13:52:08 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 13:52:08 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: <5gcJ-y5dzLhf1AAx2Hl4I5Lo3EBbu6zvdia5d_qzuAU=.c0fcfef9-6429-4481-bb7d-9a43227476ef@github.com> References: <5gcJ-y5dzLhf1AAx2Hl4I5Lo3EBbu6zvdia5d_qzuAU=.c0fcfef9-6429-4481-bb7d-9a43227476ef@github.com> Message-ID: On Fri, 13 Nov 2020 06:54:18 GMT, Xin Liu wrote: > Can we have an assertion to make this loop invariant more prominent? > `I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. ` Thanks for the suggestion. Actually I reworked the fix so it has nothing to do with the previous one. ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From thartmann at openjdk.java.net Tue Nov 17 15:17:09 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 15:17:09 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long Message-ID: The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. Thanks, Tobias ------------- Commit messages: - 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long Changes: https://git.openjdk.java.net/jdk/pull/1260/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1260&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256478 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1260.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1260/head:pull/1260 PR: https://git.openjdk.java.net/jdk/pull/1260 From roland at openjdk.java.net Tue Nov 17 15:43:04 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 17 Nov 2020 15:43:04 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:11:36 GMT, Tobias Hartmann wrote: > The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. > > Thanks, > Tobias Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1260 From chagedorn at openjdk.java.net Tue Nov 17 15:43:05 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 17 Nov 2020 15:43:05 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:11:36 GMT, Tobias Hartmann wrote: > The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1260 From thartmann at openjdk.java.net Tue Nov 17 15:55:04 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 15:55:04 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:37:02 GMT, Roland Westrelin wrote: >> The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. >> >> Thanks, >> Tobias > > Looks good to me @rwestrel, @chhagedorn thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1260 From kvn at openjdk.java.net Tue Nov 17 16:59:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 16:59:09 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: References: Message-ID: <-wbkNUqI5H7p6_vngi66Mpmc3FfAU1_Q057tqLIiPyc=.3cd41585-3517-4e1c-8aca-def0fc224fd7@github.com> On Tue, 17 Nov 2020 15:11:36 GMT, Tobias Hartmann wrote: > The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. > > Thanks, > Tobias Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1260 From kvn at openjdk.java.net Tue Nov 17 17:14:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 17:14:05 GMT Subject: RFR: 8256453: C2: Reduce State footprint In-Reply-To: References: Message-ID: <_uUe0M4_3yIHLztTAwMbWnpMiSAcFGku9xF5uzPzfCc=.56bda859-222d-4577-b52c-03cfc4ed6aeb@github.com> On Tue, 17 Nov 2020 09:00:46 GMT, Claes Redestad wrote: > State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344 > > Much of this is due to the two 32-bit int arrays _cost and _rule. > > The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space. > > Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object. > > Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root Nice. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1253 From thartmann at openjdk.java.net Tue Nov 17 17:16:09 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 17 Nov 2020 17:16:09 GMT Subject: RFR: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: <-wbkNUqI5H7p6_vngi66Mpmc3FfAU1_Q057tqLIiPyc=.3cd41585-3517-4e1c-8aca-def0fc224fd7@github.com> References: <-wbkNUqI5H7p6_vngi66Mpmc3FfAU1_Q057tqLIiPyc=.3cd41585-3517-4e1c-8aca-def0fc224fd7@github.com> Message-ID: On Tue, 17 Nov 2020 16:56:05 GMT, Vladimir Kozlov wrote: >> The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. >> >> Thanks, >> Tobias > > Marked as reviewed by kvn (Reviewer). Thanks @vnkozlov ! ------------- PR: https://git.openjdk.java.net/jdk/pull/1260 From jbhateja at openjdk.java.net Tue Nov 17 17:25:07 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 17 Nov 2020 17:25:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 08:17:44 GMT, Nils Eliasson wrote: > ok - looks good! Hi @neliasso, thanks for your comments and review approval. Hi @vnkozlov, kindly let me know if there are any other comments from your end. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Tue Nov 17 18:12:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 18:12:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 05:31:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Review comments resolved > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 Changes requested by kvn (Reviewer). src/hotspot/share/opto/macroArrayCopy.cpp line 215: > 213: const_len = lty->get_con() << shift; > 214: } else if ((lty = _igvn.type(length)->isa_int()) && lty->is_con()) { > 215: const_len = lty->get_con() << shift; isa_int() may return NULL (common case if input is TOP). I suggest to refactor this code to check for that. And, please, don't use assignment inside checks. src/hotspot/share/opto/macroArrayCopy.cpp line 207: > 205: Node* orig_mem = *mem; > 206: Node* is_lt64bytes_tp = NULL; > 207: Node* is_lt64bytes_fp = NULL; It is difficult distinguish between _tp and _fp. Please, use whole words: false, true. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From iignatyev at openjdk.java.net Tue Nov 17 19:31:16 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 19:31:16 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: build only hotspot for optimized ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1244/files - new: https://git.openjdk.java.net/jdk/pull/1244/files/05c735be..2d861d14 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1244.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1244/head:pull/1244 PR: https://git.openjdk.java.net/jdk/pull/1244 From github.com+70893615+jasontatton-aws at openjdk.java.net Tue Nov 17 19:37:14 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Tue, 17 Nov 2020 19:37:14 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly Message-ID: Hi all, Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. Here is an example output:- before: # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 after: # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 Testing: I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. ------------- Commit messages: - 8033441 Changes: https://git.openjdk.java.net/jdk/pull/1272/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1272&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8033441 Stats: 83 lines in 2 files changed: 81 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1272/head:pull/1272 PR: https://git.openjdk.java.net/jdk/pull/1272 From erikj at openjdk.java.net Tue Nov 17 19:55:12 2020 From: erikj at openjdk.java.net (Erik Joelsson) Date: Tue, 17 Nov 2020 19:55:12 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:31:16 GMT, Igor Ignatyev wrote: >> Hi all, >> >> >> [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? >> >> Thanks >> -- Igor >> >> cc-ing @dcubed-ojdk > > Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: > > build only hotspot for optimized Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From iignatyev at openjdk.java.net Tue Nov 17 20:04:08 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 20:04:08 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:52:49 GMT, Erik Joelsson wrote: >> Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: >> >> build only hotspot for optimized > > Marked as reviewed by erikj (Reviewer). folks, thanks for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From iignatyev at openjdk.java.net Tue Nov 17 20:04:09 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 20:04:09 GMT Subject: Integrated: 8256430: add linux-x64-optimized to regular testing In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk This pull request has now been integrated. Changeset: d9dbd5de Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/d9dbd5de Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod 8256430: add linux-x64-optimized to regular testing Reviewed-by: kvn, dcubed, vlivanov, erikj ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From xliu at openjdk.java.net Tue Nov 17 20:06:13 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 17 Nov 2020 20:06:13 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:31:25 GMT, Jason Tatton wrote: > Hi all, > > Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. > > A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. > > Here is an example output:- > before: > # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 > > after: > # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 > > Testing: > I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. test/hotspot/jtreg/compiler/arguments/TestPrintOptoAssemblyLineNumbers.java line 29: > 27: * @summary Test to ensure that line numbers are now present with the -XX:+PrintOptoAssembly command line option > 28: * > 29: * @requires vm.debug == true the flag `PrintOptoAssembly` comes from c2. IMHO, you need `@requires vm.compiler2.enabled` instead of vm.debug == true. ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From kvn at openjdk.java.net Tue Nov 17 20:24:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 20:24:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 18:08:58 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Review comments resolved >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - JDK-8252848 : Review comments resolved >> - JDK-8252848: Review comments resolution. >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 > > Changes requested by kvn (Reviewer). I ran tier1-tier4 with latest changes and got failures in TestArrayCopyDisjoint.java and TestArrayCopyConjoint.java tests: java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyDisjoint Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 57 actual = 70 fromPos = 1324 toPos = 1353 java.lang.Error: Fail at compiler.arraycopy.TestArrayCopyDisjoint.validate(TestArrayCopyDisjoint.java:95) at compiler.arraycopy.TestArrayCopyDisjoint.testByte_constant_LT64B(TestArrayCopyDisjoint.java:162) at compiler.arraycopy.TestArrayCopyDisjoint.main(TestArrayCopyDisjoint.java:207) java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyConjoint elapsed time (seconds): 7.464 Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 109 actual = 111 fromPos = 1120 toPos = 1122 java.lang.Error: Fail at compiler.arraycopy.TestArrayCopyConjoint.validate(TestArrayCopyConjoint.java:124) at compiler.arraycopy.TestArrayCopyConjoint.testByte_constant_LT64B(TestArrayCopyConjoint.java:192) at compiler.arraycopy.TestArrayCopyConjoint.main(TestArrayCopyConjoint.java:240) ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 20:49:07 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 20:49:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> References: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> Message-ID: On Tue, 17 Nov 2020 02:21:49 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: >> >>> 494: movl(Address(rsp, 64), tmp); >>> 495: lea(tmp, ExternalAddress(static_const_table)); >>> 496: movsd(xmm0, Address(rsp, 128)); >> >> Can you explain this change? > > Would be nice to add comment about what values are on stack. movdqu moves 128 bits from the memory, while movsd moves 64 bits. movsd is what's needed for double precision calculation. In this case however, no harm was done even using movdqu, as the subsequent vunpcklpd would broadcast only the lower 64bits. Still it is safe to change to movsd to begin with an example of stack is c05ec00000000000. movqqu would move 0x00000000e719ee40c05ec00000000000 ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 20:49:10 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 20:49:10 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 02:05:14 GMT, Joe Darcy wrote: >> Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large > > test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > >> 1: /* >> 2: * Copyright (c) 2011,2020 Oracle and/or its affiliates. All rights reserved. > > The year of the copyright syntax is "2011, 2020,"; no need for a re-review before pushing after that correction. Thanks. changed ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 21:04:22 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 21:04:22 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: Message-ID: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: fixed copyright syntax ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/704dfff2..5932c732 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Tue Nov 17 21:14:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 21:14:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> Message-ID: <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> On Tue, 17 Nov 2020 21:04:22 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > fixed copyright syntax Okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/894 From sviswanathan at openjdk.java.net Tue Nov 17 21:42:06 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 17 Nov 2020 21:42:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 21:10:49 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed copyright syntax > > Okay Hi Vladimir, Please let me know the next steps on this. Looks like running tests need approval. Xubo is a first time contributor. Best Regards, Sandhya ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Tue Nov 17 23:25:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 23:25:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 21:39:10 GMT, Sandhya Viswanathan wrote: >> Okay > > Hi Vladimir, > Please let me know the next steps on this. Looks like running tests need approval. > Xubo is a first time contributor. > Best Regards, > Sandhya I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Wed Nov 18 00:09:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 18 Nov 2020 00:09:06 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 20:21:04 GMT, Vladimir Kozlov wrote: >> Changes requested by kvn (Reviewer). > > I ran tier1-tier4 with latest changes and got failures in TestArrayCopyDisjoint.java and TestArrayCopyConjoint.java tests: > java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyDisjoint > Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 57 actual = 70 fromPos = 1324 toPos = 1353 > java.lang.Error: Fail > at compiler.arraycopy.TestArrayCopyDisjoint.validate(TestArrayCopyDisjoint.java:95) > at compiler.arraycopy.TestArrayCopyDisjoint.testByte_constant_LT64B(TestArrayCopyDisjoint.java:162) > at compiler.arraycopy.TestArrayCopyDisjoint.main(TestArrayCopyDisjoint.java:207) > > java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyConjoint > > elapsed time (seconds): 7.464 > Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 109 actual = 111 fromPos = 1120 toPos = 1122 > java.lang.Error: Fail > at compiler.arraycopy.TestArrayCopyConjoint.validate(TestArrayCopyConjoint.java:124) > at compiler.arraycopy.TestArrayCopyConjoint.testByte_constant_LT64B(TestArrayCopyConjoint.java:192) > at compiler.arraycopy.TestArrayCopyConjoint.main(TestArrayCopyConjoint.java:240) Forgot to say that failure was on Windows with only avx512f, avx512cd. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dongbohe at openjdk.java.net Wed Nov 18 01:13:06 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Wed, 18 Nov 2020 01:13:06 GMT Subject: RFR: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 10:47:43 GMT, Vladimir Ivanov wrote: >> Dongbo He has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified code > > Looks good. > We shouldn't do "backports" from project specific issues to mainline. Please file a separate issue for this. Thanks! thank you! I have created a new issue https://bugs.openjdk.java.net/browse/JDK-8255448 ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From dongbohe at openjdk.java.net Wed Nov 18 01:13:07 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Wed, 18 Nov 2020 01:13:07 GMT Subject: Integrated: 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on In-Reply-To: References: Message-ID: On Sun, 25 Oct 2020 06:43:16 GMT, Dongbo He wrote: > 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on This pull request has now been integrated. Changeset: ef3ddb1d Author: Dongbo He Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/ef3ddb1d Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8255448: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on Co-authored-by: Huang Wang Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From kvn at openjdk.java.net Wed Nov 18 01:32:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 18 Nov 2020 01:32:03 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 23:22:46 GMT, Vladimir Kozlov wrote: >> Hi Vladimir, >> Please let me know the next steps on this. Looks like running tests need approval. >> Xubo is a first time contributor. >> Best Regards, >> Sandhya > > I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. > Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. My testing passed. But it did not test 32-bit as I said. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From sviswanathan at openjdk.java.net Wed Nov 18 01:36:06 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 18 Nov 2020 01:36:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: <3HWl7aNxx6xI8qsAXl2C6txeBewZtnVjKr5fq-AtDLc=.51cf5ea4-6c48-45df-a855-bab2f115997f@github.com> On Wed, 18 Nov 2020 01:29:13 GMT, Vladimir Kozlov wrote: >> I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. >> Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. > > My testing passed. But it did not test 32-bit as I said. Thanks a lot. I am doing the 32-bit tier1 testing. I will integrate the patch once the testing completes. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From xliu at openjdk.java.net Wed Nov 18 02:11:07 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 18 Nov 2020 02:11:07 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: <7xRg4xNeDVIQI8KOYK_-7nnwBGBZSKaxu3KL-rpECHs=.bce2af90-9de9-4908-821d-30691b610695@github.com> References: <5gcJ-y5dzLhf1AAx2Hl4I5Lo3EBbu6zvdia5d_qzuAU=.c0fcfef9-6429-4481-bb7d-9a43227476ef@github.com> <7xRg4xNeDVIQI8KOYK_-7nnwBGBZSKaxu3KL-rpECHs=.bce2af90-9de9-4908-821d-30691b610695@github.com> Message-ID: On Tue, 17 Nov 2020 13:49:34 GMT, Roland Westrelin wrote: >>> Can we have an assertion to make this loop invariant more prominent? >>> `I think the cleaner fix is to preserve the invariant that the never branch is always right after the loop head in an infinite loop. ` >> >> Thanks for the suggestion. Actually I reworked the fix so it has nothing to do with the previous one. > > I pushed a new fix that's different from the previous one. I realized that NeverBranchNode::Ideal() should remove the NeverBranch once the barrier is expanded but we don't run igvn after barrier expansion which is a mistake. yes, looks good to me. Happy to see a more elegant way to solve this problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From sviswanathan at openjdk.java.net Wed Nov 18 04:37:04 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 18 Nov 2020 04:37:04 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: <-6l9LeNyjb28a9nsoZ5VclAw2JfIQNePlgUXm32tXd8=.e894b00e-aa24-4088-bcf6-6ae913d3da8c@github.com> On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number @xbzhang99 32-bit tier1 testing passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 18 04:52:07 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 18 Nov 2020 04:52:07 GMT Subject: Integrated: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number This pull request has now been integrated. Changeset: c0892148 Author: Xubo Zhang Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/c0892148 Stats: 72 lines in 2 files changed: 64 ins; 0 del; 8 mod 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms Reviewed-by: darcy, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From thartmann at openjdk.java.net Wed Nov 18 08:05:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 18 Nov 2020 08:05:08 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 12:38:23 GMT, Roland Westrelin wrote: >> This is a Shenandoah bug but the proposed fix is in shared code. >> >> In an infinite loop, a barrier is located right after the loop head >> and above the never branch. When the barrier is expanded, control flow >> is added between the loop and the never branch. During loop >> verification the assert fires because it doesn't expect any control >> flow between the never branch and the loop head. >> >> While it would have been nice to fix this Shenandoah issue in >> Shenandoah code, I think the cleaner fix is to preserve the invariant >> that the never branch is always right after the loop head in an >> infinite loop. In the proposed patch, this is achieved by moving all >> uses of the loop head to the never branch when it's constructed. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - fix > - test The reworked fix looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1073 From rkennke at openjdk.java.net Wed Nov 18 08:59:09 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 18 Nov 2020 08:59:09 GMT Subject: RFR: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah [v4] In-Reply-To: References: Message-ID: <0Z4Kw5n0Db87jhFM59XmeOMEOcIY6PO2WzqI6Vp-Tlg=.4651e1c6-5b2f-4135-9250-f12c6e65f7bf@github.com> On Tue, 17 Nov 2020 12:38:23 GMT, Roland Westrelin wrote: >> This is a Shenandoah bug but the proposed fix is in shared code. >> >> In an infinite loop, a barrier is located right after the loop head >> and above the never branch. When the barrier is expanded, control flow >> is added between the loop and the never branch. During loop >> verification the assert fires because it doesn't expect any control >> flow between the never branch and the loop head. >> >> While it would have been nice to fix this Shenandoah issue in >> Shenandoah code, I think the cleaner fix is to preserve the invariant >> that the never branch is always right after the loop head in an >> infinite loop. In the proposed patch, this is achieved by moving all >> uses of the loop head to the never branch when it's constructed. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - fix > - test Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1073 From roland at openjdk.java.net Wed Nov 18 09:28:04 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 18 Nov 2020 09:28:04 GMT Subject: Integrated: 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 08:44:03 GMT, Roland Westrelin wrote: > This is a Shenandoah bug but the proposed fix is in shared code. > > In an infinite loop, a barrier is located right after the loop head > and above the never branch. When the barrier is expanded, control flow > is added between the loop and the never branch. During loop > verification the assert fires because it doesn't expect any control > flow between the never branch and the loop head. > > While it would have been nice to fix this Shenandoah issue in > Shenandoah code, I think the cleaner fix is to preserve the invariant > that the never branch is always right after the loop head in an > infinite loop. In the proposed patch, this is achieved by moving all > uses of the loop head to the never branch when it's constructed. This pull request has now been integrated. Changeset: 655bb619 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/655bb619 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod 8255936: "parsing found no loops but there are some" assertion failure with Shenandoah Reviewed-by: thartmann, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/1073 From xliu at openjdk.java.net Wed Nov 18 10:14:05 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 18 Nov 2020 10:14:05 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() [v2] In-Reply-To: References: <5XrnxBftqMeq-7XmbKlLDjhdBCZVolKCDPc9POdPubs=.f5591f89-4d87-4124-8cee-1f116a14a38f@github.com> Message-ID: On Wed, 11 Nov 2020 08:07:20 GMT, Xin Liu wrote: >> here is the result of updated micro benchmark. >> DoubeleBytes group is the utf-16 string. SingleByte group the ascii string. >> without the optimization, the throughput is just 1/4~1/5 than hand-crafted code. >> >> With the optimization, the throughput can reach over 80% of hand-crafted code. The reason it can't reach 100% throughput of hand-crafted equivalence is that it has to generate stricter boundary checks. >> >> // without OptimizeSubstring >> Benchmark (substrLength) Mode Cnt Score Error Units >> SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 40847.644 ? 1269.274 ops/ms >> SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 39284.961 ? 155.109 ops/ms >> SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 12370.385 ? 622.280 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 60633.544 ? 1989.512 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 60613.490 ? 1024.846 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 30212.033 ? 1106.752 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154666.457 ? 7.132 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154659.583 ? 7.663 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154665.357 ? 6.414 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162833.972 ? 8.170 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162834.059 ? 7.862 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162456.046 ? 217.304 ops/ms >> >> >> // with OptimizeSubstring >> Benchmark (substrLength) Mode Cnt Score Error Units >> SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 119789.181 ? 3374.123 ops/ms >> SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 123740.637 ? 31.982 ops/ms >> SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 123701.525 ? 68.741 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 134529.257 ? 6.331 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 134517.222 ? 6.373 ops/ms >> SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 134527.929 ? 4.660 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154668.784 ? 6.990 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154567.971 ? 69.286 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154630.115 ? 36.157 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162852.894 ? 5.933 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162850.630 ? 5.990 ops/ms >> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162841.513 ? 6.441 ops/ms > > hello, > > May I ping to review this patch? > > Many business-oriented Java applications manipulate strings a lot. Comparing to arithmetic scalar, string as oop is more expensive. furthermore, many string objects need to be allocated on heap, so they increase workload of GC. > > By analyzing the construction of string object, I found that one source is String.substring(). My target is to eliminate substring as many as possible. This is the first attempt for me to enable substring optimization. If it works out, I will apply the api-substitution approach on other APIs such as String.charAt, StringBuilder::append, and even String.split(). > > thanks you in advance. hi, Vladimir, Thank you for taking time to review it. I would like to call out that I plan to optimize substring(). My single aim to cut off a use of substring in this PR. This is a gateway patch. I plan to apply a series of API substitutions for substrings. eg. StringBuilder::append, String.split and String.charAt etc. > _Mailing list message from [Vladimir Ivanov](mailto:vladimir.x.ivanov at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > Hi Xin, > > > the optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) > > to substring(base, beg, end) | base.startsWith(prefix, beg). > > it reduces an use of substring. hopefully c2 optimizer can remove the used substring() > > It would be very helpful to see a more elaborate description of intended > behavior to understand better what is the desired goal of the enhancement. > > Though it looks attractive to address the problem in the JIT-compiler, > it poses some new challenges which makes proposed approach questionable. > I understand your desire to rely on existing String-related > optimizations, but coalescing multiple concatenations differs > significantly from your case. > > Some of the concerns/questions I had while briefly looking through the > patch: > > - You introduce a call to a method (String::startsWith(String, int)) > which is not present in bytecode. It means (unless the method is called > from different places) there won't be any profiling info present, and > even if there is, it is unrelated to the call site being compiled. If > the code is recompiled at some point (e.g., it hits an uncommon trap in > startsWith(String,int) overload), there won't be any re-profiling > happening. So, it can hit the very same condition on the consequent > recompilations. > Thank you to raise this! I didn't consider the profiling data before. it is an issue. Maybe it can explain why I observe only 80% performance of hand-craft code. TBH, I haven't had a clear and solid answer for it yet. The transformation actually won't alter control-flow. Perhaps, I can find a way to amend methoddata from the old method. Or, is that possible I request to reprofile it after I modify some bytecodes? > - "s.startsWith(prefix)" call node is reused and rewired to call > unrelated method "base.startsWith(prefix, beg)". Will the new target > method be taken into account during inlining? I would be much more > comfortable seeing fresh call populated using standard process > (involving CallGenerators et al) which then substitutes the original > node. That way you make it less fragile longer term. > I can make that happen. actually, I generated a brand-new call-generator in early revision of this patch. I drop it because it requires more code. > - "hopefully c2 optimizer can remove the used substring()" > If everything is inlined in the end, it can happen, but it's fragile. > Instead, you could teach C2 that the method is "pure" (no interesting > side effects to care about) and cut the call early. It already happens > for boxing methods (see LateInlineCallGenerator::_is_pure_call for > details). > I do have the exact logic in early revision. I remarked substring() as late-inlining candidate and pure. I delete it because I found substring() is "always" inlined and c2 optimizer can do the deadcode elimination job if nobody uses it. I understand your concern. I can improve it in the future. > Overall, if you want to keep the enhancement C2-specific, I'd suggest to > look into intrinsifying String::startsWith(String, int) and not relying > on its bytecodes at all. That way you would avoid fighting against the > rest of the JVM in some situations. > Because this optimization isn't one-off thing for startsWith, I can't intrinsify it. I'd like to explore an extensible approach. > Best regards, > Vladimir Ivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From redestad at openjdk.java.net Wed Nov 18 10:35:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 18 Nov 2020 10:35:03 GMT Subject: RFR: 8256453: C2: Reduce State footprint In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 12:37:27 GMT, Nils Eliasson wrote: >> State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344 >> >> Much of this is due to the two 32-bit int arrays _cost and _rule. >> >> The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space. >> >> Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object. >> >> Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root > > Looks good. @neliasso @vnkozlov - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1253 From redestad at openjdk.java.net Wed Nov 18 10:35:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 18 Nov 2020 10:35:04 GMT Subject: Integrated: 8256453: C2: Reduce State footprint In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 09:00:46 GMT, Claes Redestad wrote: > State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344 > > Much of this is due to the two 32-bit int arrays _cost and _rule. > > The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space. > > Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object. > > Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root This pull request has now been integrated. Changeset: f7f34474 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/f7f34474 Stats: 111 lines in 3 files changed: 18 ins; 37 del; 56 mod 8256453: C2: Reduce State footprint Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1253 From thartmann at openjdk.java.net Wed Nov 18 11:56:04 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 18 Nov 2020 11:56:04 GMT Subject: Integrated: 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:11:36 GMT, Tobias Hartmann wrote: > The `RotateLeftNode::Ideal` transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) should ignore a TOP `in(1)` input which will then be handled by `RotateLeftNode::Value`. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 5bcf898b Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/5bcf898b Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8256478: C2 compilation fails with assert(t1->isa_long()) failed: Type must be a long Reviewed-by: roland, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1260 From thartmann at openjdk.java.net Wed Nov 18 11:57:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 18 Nov 2020 11:57:08 GMT Subject: Integrated: 8256385: C2: fatal error: modified node is not on IGVN._worklist In-Reply-To: References: Message-ID: <6zHA6tNK0ymonU1eCcEJocqf2KxNWSSsS91W1VQKhsI=.4aaff3db-6576-49ca-a12d-26f8337082d2@github.com> On Tue, 17 Nov 2020 08:49:32 GMT, Tobias Hartmann wrote: > `PhaseIdealLoop::find_safepoint` creates a temporary MergeMemNode that is not removed if we bail out from the optimization early (see `return NULL` statements). The fix is to simply add the MergeMem to the worklist to make sure it is always reclaimed by IGVN. > > Interestingly this code path was not triggered by any of our tests but only with a test case generated by the Java Fuzzer. I've added a simplified version of that test case. > > Thanks, > Tobias This pull request has now been integrated. Changeset: f504f419 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/f504f419 Stats: 55 lines in 2 files changed: 55 ins; 0 del; 0 mod 8256385: C2: fatal error: modified node is not on IGVN._worklist Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/1252 From nils.eliasson at oracle.com Wed Nov 18 13:33:45 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 18 Nov 2020 14:33:45 +0100 Subject: Fwd: RFR: 8256552: Let ReplayCompiles set UseDebuggerErgo In-Reply-To: References: Message-ID: Hi, The first mail ended up on the wrong list. I have fixed the labels, but I still need a review. Regards, Nils -------- Forwarded Message -------- Subject: RFR: 8256552: Let ReplayCompiles set UseDebuggerErgo Date: Wed, 18 Nov 2020 12:55:10 GMT From: Nils Eliasson To: hotspot-runtime-dev at openjdk.java.net When replaying compiles a lot of unnecessary threads are started. The new excellent UseDebuggerErgo flag is a perfect match for replay. ------------- Commit messages: - ReplayCompiles sets UseDebuggerErgo Changes: https://git.openjdk.java.net/jdk/pull/1289/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1289&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256552 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1289.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1289/head:pull/1289 PR: https://git.openjdk.java.net/jdk/pull/1289 From vlivanov at openjdk.java.net Wed Nov 18 13:44:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 18 Nov 2020 13:44:03 GMT Subject: RFR: 8256552: Let ReplayCompiles set UseDebuggerErgo In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 12:48:45 GMT, Nils Eliasson wrote: > When replaying compiles a lot of unnecessary threads are started. The new excellent UseDebuggerErgo flag is a perfect match for replay. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1289 From redestad at openjdk.java.net Wed Nov 18 14:26:05 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 18 Nov 2020 14:26:05 GMT Subject: RFR: 8256552: Let ReplayCompiles set UseDebuggerErgo In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 12:48:45 GMT, Nils Eliasson wrote: > When replaying compiles a lot of unnecessary threads are started. The new excellent UseDebuggerErgo flag is a perfect match for replay. Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1289 From shade at openjdk.java.net Wed Nov 18 17:27:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 18 Nov 2020 17:27:59 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> References: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> Message-ID: On Tue, 17 Nov 2020 00:42:20 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Indents and comments > > Why to run test code if you know the result will not match? Why not just skip testing? > I thought about using `@requires vm.cpu.features` checks but your check in main() is simpler and more clear. Do you still want me to make changes, @vnkozlov? I can throw `SkippedException` instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From dcubed at openjdk.java.net Wed Nov 18 17:40:12 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 18 Nov 2020 17:40:12 GMT Subject: RFR: 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing In-Reply-To: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> References: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> Message-ID: On Wed, 18 Nov 2020 17:33:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing. @mcimadamore and Vladimir Ivanov, this ProblemListing should interest you folks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1297 From dcubed at openjdk.java.net Wed Nov 18 17:40:11 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 18 Nov 2020 17:40:11 GMT Subject: RFR: 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing Message-ID: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> A trivial fix to ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing. ------------- Commit messages: - 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing Changes: https://git.openjdk.java.net/jdk/pull/1297/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1297&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256567 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1297/head:pull/1297 PR: https://git.openjdk.java.net/jdk/pull/1297 From mcimadamore at openjdk.java.net Wed Nov 18 17:55:00 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 18 Nov 2020 17:55:00 GMT Subject: RFR: 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing In-Reply-To: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> References: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> Message-ID: <3KSak6Y3nJsdiueB19VdzgY-ZKZ0iC2ZPFAC59r8OGE=.1a42f488-8d4c-4249-a61d-ad0ab6396c6e@github.com> On Wed, 18 Nov 2020 17:33:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing. Looks good to me - we discovered some deep issues involving var handles which were only accidentally triggered by the recent Foreign Memory Access push. ------------- Marked as reviewed by mcimadamore (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1297 From dcubed at openjdk.java.net Wed Nov 18 18:06:06 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 18 Nov 2020 18:06:06 GMT Subject: Integrated: 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing In-Reply-To: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> References: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> Message-ID: On Wed, 18 Nov 2020 17:33:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing. This pull request has now been integrated. Changeset: c9c15733 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/c9c15733 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing Reviewed-by: mcimadamore ------------- PR: https://git.openjdk.java.net/jdk/pull/1297 From dcubed at openjdk.java.net Wed Nov 18 18:06:05 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 18 Nov 2020 18:06:05 GMT Subject: RFR: 8256567: ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing In-Reply-To: <3KSak6Y3nJsdiueB19VdzgY-ZKZ0iC2ZPFAC59r8OGE=.1a42f488-8d4c-4249-a61d-ad0ab6396c6e@github.com> References: <1b6N3fJAqTuO1F78o8eGTl53rQMDwErM4LOmDp6iQCA=.b03e719f-bcde-4be9-afef-1b4533f500d7@github.com> <3KSak6Y3nJsdiueB19VdzgY-ZKZ0iC2ZPFAC59r8OGE=.1a42f488-8d4c-4249-a61d-ad0ab6396c6e@github.com> Message-ID: On Wed, 18 Nov 2020 17:52:31 GMT, Maurizio Cimadamore wrote: >> A trivial fix to ProblemList java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java for Xcomp testing. > > Looks good to me - we discovered some deep issues involving var handles which were only accidentally triggered by the recent Foreign Memory Access push. @mcimadamore - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1297 From lmesnik at openjdk.java.net Wed Nov 18 18:09:05 2020 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Wed, 18 Nov 2020 18:09:05 GMT Subject: Integrated: 8256418: Jittester make build is broken. In-Reply-To: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> References: <06x7ziZhkDpKTPn7i_nJrMejqTDlHtSgv5pX9NLLtZ0=.9ebadf28-effd-4c93-8015-9912d2196b51@github.com> Message-ID: <8YGRtRoa4kP9NThj3zR5JdosV0Wv3If2RMw0N1KiO_I=.3995e496-194a-4d63-9034-5fca79f73397@github.com> On Mon, 16 Nov 2020 20:53:21 GMT, Leonid Mesnik wrote: > The Utils.java in lib depends on NetworkConfiguration.java now. So NetworkConfiguration.java should be added to the list. > > Verified locally. This pull request has now been integrated. Changeset: 300cbaa6 Author: Leonid Mesnik URL: https://git.openjdk.java.net/jdk/commit/300cbaa6 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8256418: Jittester make build is broken. Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/1238 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Nov 18 21:42:23 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 18 Nov 2020 21:42:23 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v2] In-Reply-To: References: Message-ID: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> > Hi all, > > Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. > > A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. > > Here is an example output:- > before: > # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 > > after: > # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 > > Testing: > I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: refactored linenumber output to print_method_with_lineno method and tweaked unit test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1272/files - new: https://git.openjdk.java.net/jdk/pull/1272/files/8a842e27..468f1cbf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1272&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1272&range=00-01 Stats: 29 lines in 3 files changed: 13 ins; 13 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1272/head:pull/1272 PR: https://git.openjdk.java.net/jdk/pull/1272 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Nov 18 21:42:23 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 18 Nov 2020 21:42:23 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:31:25 GMT, Jason Tatton wrote: > Hi all, > > Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. > > A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. > > Here is an example output:- > before: > # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 > > after: > # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 > > Testing: > I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. > Two code snippets are same logic. Maybe it's a good idea to have a private member function. > you can treat this as an optional suggestion. > Thanks Xin, I have refactored the code to use a private member function as advised ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Nov 18 21:42:24 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 18 Nov 2020 21:42:24 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 20:03:10 GMT, Xin Liu wrote: >> Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: >> >> refactored linenumber output to print_method_with_lineno method and tweaked unit test > > test/hotspot/jtreg/compiler/arguments/TestPrintOptoAssemblyLineNumbers.java line 29: > >> 27: * @summary Test to ensure that line numbers are now present with the -XX:+PrintOptoAssembly command line option >> 28: * >> 29: * @requires vm.debug == true > > the flag `PrintOptoAssembly` comes from c2. > IMHO, you need `@requires vm.compiler2.enabled` instead of vm.debug == true. Thanks for raising this, I found that without `vm.debug == true` the test will fail if run on a non debug build (as the expected output is only available on the debug build). But I have adjusted the `@requires` to include `vm.compiler2.enabled` ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From neliasso at openjdk.java.net Wed Nov 18 21:50:01 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 18 Nov 2020 21:50:01 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: Message-ID: <7t8UOVLtgcnt5mZpLgqEJGBDefo9eYj1ESU7LRLREPE=.89523f6b-da8b-4733-bd2f-267dbd1254ef@github.com> On Thu, 12 Nov 2020 06:02:34 GMT, Xin Liu wrote: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic In general - I like it a lot. I will take it for a spin and go through the PR. A heads up is that this PR will clash a bit with https://github.com/openjdk/jdk/pull/1276 which adds validation of the compile commands. Your change to compilerOracle.*pp seems well contained so there should be no problem to merge them. ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From xliu at openjdk.java.net Wed Nov 18 23:57:01 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 18 Nov 2020 23:57:01 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 21:39:10 GMT, Jason Tatton wrote: >> Hi all, >> >> Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. >> >> A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. >> >> Here is an example output:- >> before: >> # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 >> >> after: >> # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 >> >> Testing: >> I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. > >> Two code snippets are same logic. Maybe it's a good idea to have a private member function. >> you can treat this as an optional suggestion. >> > > Thanks Xin, I have refactored the code to use a private member function as advised We do need vm.debug == true in the test because `PrintOptoAseembly` prints nothing in release build. Your patch looks good to me, but we still need a review to approve it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From xliu at openjdk.java.net Thu Nov 19 00:22:05 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 19 Nov 2020 00:22:05 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <7t8UOVLtgcnt5mZpLgqEJGBDefo9eYj1ESU7LRLREPE=.89523f6b-da8b-4733-bd2f-267dbd1254ef@github.com> References: <7t8UOVLtgcnt5mZpLgqEJGBDefo9eYj1ESU7LRLREPE=.89523f6b-da8b-4733-bd2f-267dbd1254ef@github.com> Message-ID: On Wed, 18 Nov 2020 21:47:40 GMT, Nils Eliasson wrote: >> 8247732: validate user-input intrinsic_ids in ControlIntrinsic > > In general - I like it a lot. I will take it for a spin and go through the PR. > > A heads up is that this PR will clash a bit with https://github.com/openjdk/jdk/pull/1276 which adds validation of the compile commands. Your change to compilerOracle.*pp seems well contained so there should be no problem to merge them. Hi, Nils, Thank you for reviewing the lengthy PR! The major part is to extend the testing framework to include ControlIntrinsic. It may be also useful to test other compiler directives whose arguments are ccstr/ccstrlist. I am watching JDK-8256508. I will update this PR accordingly if it needs. thanks, --lx ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From jiefu at openjdk.java.net Thu Nov 19 02:32:02 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 19 Nov 2020 02:32:02 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v2] In-Reply-To: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> References: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> Message-ID: On Wed, 18 Nov 2020 21:42:23 GMT, Jason Tatton wrote: >> Hi all, >> >> Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. >> >> A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. >> >> Here is an example output:- >> before: >> # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 >> >> after: >> # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 >> >> Testing: >> I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > refactored linenumber output to print_method_with_lineno method and tweaked unit test It's helpful for debugging. LGTM Thanks. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/1272 From dongbo at openjdk.java.net Thu Nov 19 02:46:15 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 19 Nov 2020 02:46:15 GMT Subject: RFR: 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 Message-ID: In aarch64-asmtest.py, register `r18` is not generated for all the GeneralRegister instructions, except `LoadStoreOp` and `LoadStorePairOp`. So when adding new instructions to the script before `LoadStoreOp` or `LoadStorePairOp`, assembler smoke test may fail with undefined register `r18`, like: Creating javadoc element list .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp: In function 'void entry(CodeBuffer*)': .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:580:37: error: 'r18' was not declared in this scope; did you mean 'z18'? 580 | __ ldp(r28, r22, Address(__ pre(r18, 16))); // ldp x28, x22, [x18, #16]! | ^~~ | z18 gmake[3]: *** [lib/CompileJvm.gmk:143: .../jdk/build/linux-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/assembler_aarch64.o] Error 1 gmake[3]: *** Waiting for unfinished jobs.... Compiling 89 properties into resource bundles for java.desktop This excludes `r18` from the random list of `LoadStoreOp` and `LoadStorePairOp`. ------------- Commit messages: - 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 Changes: https://git.openjdk.java.net/jdk/pull/1304/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1304&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256375 Stats: 32 lines in 2 files changed: 0 ins; 0 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/1304.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1304/head:pull/1304 PR: https://git.openjdk.java.net/jdk/pull/1304 From github.com+670087+jrziviani at openjdk.java.net Thu Nov 19 03:48:01 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Thu, 19 Nov 2020 03:48:01 GMT Subject: RFR: 8248191: [PPC64] Replace lxvd2x/stxvd2x with lxvx/stxvx for Power10 In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 13:57:29 GMT, Ziviani wrote: > Doesn't this mean we use a different Byte order on Power 9 than on Power 10 on little endian? Unfortunately, the person is on vacation this week. But I read the docs and I didn't find any change on the byte order. ------------- PR: https://git.openjdk.java.net/jdk/pull/1086 From xgong at openjdk.java.net Thu Nov 19 05:27:08 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Thu, 19 Nov 2020 05:27:08 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler Message-ID: Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, and use the unsigned right shift instead. [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. ------------- Commit messages: - 8256436: AArch64: Fix undefined behavior for signed right shift in assembler Changes: https://git.openjdk.java.net/jdk/pull/1307/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1307&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256436 Stats: 78 lines in 3 files changed: 8 ins; 0 del; 70 mod Patch: https://git.openjdk.java.net/jdk/pull/1307.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1307/head:pull/1307 PR: https://git.openjdk.java.net/jdk/pull/1307 From neliasso at openjdk.java.net Thu Nov 19 07:48:01 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 19 Nov 2020 07:48:01 GMT Subject: Integrated: 8256552: Let ReplayCompiles set UseDebuggerErgo In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 12:48:45 GMT, Nils Eliasson wrote: > When replaying compiles a lot of unnecessary threads are started. The new excellent UseDebuggerErgo flag is a perfect match for replay. This pull request has now been integrated. Changeset: 8e241b52 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/8e241b52 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8256552: Let ReplayCompiles set UseDebuggerErgo Reviewed-by: vlivanov, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/1289 From rcastanedalo at openjdk.java.net Thu Nov 19 09:13:17 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 09:13:17 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") Message-ID: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. ------------- Commit messages: - 6232281: -XX:-UseLoopSafepoints causes assert(v_false,"Parse::remove_useless_nodes missed this node") Changes: https://git.openjdk.java.net/jdk/pull/1311/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1311&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-6232281 Stats: 46 lines in 2 files changed: 45 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1311/head:pull/1311 PR: https://git.openjdk.java.net/jdk/pull/1311 From rcastanedalo at openjdk.java.net Thu Nov 19 09:13:17 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 09:13:17 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") In-Reply-To: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 09:05:06 GMT, Roberto Casta?eda Lozano wrote: > Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. Check for nodes missed by remove_useless_nodes() only if PhaseRemoveUseless has actually been run. This makes it possible to use -XX:-UseLoopSafepoints without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From rcastanedalo at openjdk.java.net Thu Nov 19 09:13:17 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 09:13:17 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 09:07:30 GMT, Roberto Casta?eda Lozano wrote: >> Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. > > Check for nodes missed by remove_useless_nodes() only if PhaseRemoveUseless has > actually been run. This makes it possible to use -XX:-UseLoopSafepoints without > crashing trivially, although implicit assumptions in other parts of C2 about the > existence of loop safepoints might lead to more subtle failures for more complex > methods. Tested on hs-tier{1,2,3}. ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From aph at openjdk.java.net Thu Nov 19 09:16:05 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 19 Nov 2020 09:16:05 GMT Subject: RFR: 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 02:40:47 GMT, Dong Bo wrote: > In aarch64-asmtest.py, register `r18` is not generated for all the GeneralRegister instructions, except `LoadStoreOp` and `LoadStorePairOp`. > So when adding new instructions to the script before `LoadStoreOp` or `LoadStorePairOp`, > assembler smoke test may fail with undefined register `r18`, like: > Creating javadoc element list > .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp: In function 'void entry(CodeBuffer*)': > .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:580:37: error: 'r18' was not declared in this scope; did you mean 'z18'? > 580 | __ ldp(r28, r22, Address(__ pre(r18, 16))); // ldp x28, x22, [x18, #16]! > | ^~~ > | z18 > gmake[3]: *** [lib/CompileJvm.gmk:143: .../jdk/build/linux-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/assembler_aarch64.o] Error 1 > gmake[3]: *** Waiting for unfinished jobs.... > Compiling 89 properties into resource bundles for java.desktop > > This excludes `r18` from the random list of `LoadStoreOp` and `LoadStorePairOp`. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1304 From burban at openjdk.java.net Thu Nov 19 09:23:07 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Thu, 19 Nov 2020 09:23:07 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 Message-ID: Fix this warning: C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? Thanks to @magicus to bring that to my attention. ------------- Commit messages: - 8256633: Fix product build on Windows+Arm64 Changes: https://git.openjdk.java.net/jdk/pull/1312/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1312&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256633 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1312/head:pull/1312 PR: https://git.openjdk.java.net/jdk/pull/1312 From ihse at openjdk.java.net Thu Nov 19 09:23:07 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 19 Nov 2020 09:23:07 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:17:58 GMT, Bernhard Urban-Forster wrote: > Fix this warning: > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? > > Thanks to @magicus to bring that to my attention. Looks good and trivial. (And a good reminder why if statements without curly braces seldom are a good thing...) ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From shade at openjdk.java.net Thu Nov 19 09:33:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 09:33:05 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:17:58 GMT, Bernhard Urban-Forster wrote: > Fix this warning: > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? > > Thanks to @magicus to bring that to my attention. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 509: > 507: if (_ext.shift() > 0) { > 508: assert(_ext.shift() == (int)size, "bad shift"); > 509: } D'oh. So this is semantically the same as: assert(_ext.shift() <= 0 || _ext.shift() == (int)size, "bad shift"); ...or, if we expect shift to be non-negative: assert(_ext.shift() == 0 || _ext.shift() == (int)size, "bad shift"); ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From neliasso at openjdk.java.net Thu Nov 19 09:48:07 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 19 Nov 2020 09:48:07 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") In-Reply-To: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 09:05:06 GMT, Roberto Casta?eda Lozano wrote: > Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1311 From dongbo at openjdk.java.net Thu Nov 19 09:51:05 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 19 Nov 2020 09:51:05 GMT Subject: RFR: 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:13:42 GMT, Andrew Haley wrote: >> In aarch64-asmtest.py, register `r18` is not generated for all the GeneralRegister instructions, except `LoadStoreOp` and `LoadStorePairOp`. >> So when adding new instructions to the script before `LoadStoreOp` or `LoadStorePairOp`, >> assembler smoke test may fail with undefined register `r18`, like: >> Creating javadoc element list >> .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp: In function 'void entry(CodeBuffer*)': >> .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:580:37: error: 'r18' was not declared in this scope; did you mean 'z18'? >> 580 | __ ldp(r28, r22, Address(__ pre(r18, 16))); // ldp x28, x22, [x18, #16]! >> | ^~~ >> | z18 >> gmake[3]: *** [lib/CompileJvm.gmk:143: .../jdk/build/linux-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/assembler_aarch64.o] Error 1 >> gmake[3]: *** Waiting for unfinished jobs.... >> Compiling 89 properties into resource bundles for java.desktop >> >> This excludes `r18` from the random list of `LoadStoreOp` and `LoadStorePairOp`. > > Marked as reviewed by aph (Reviewer). Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1304 From dongbo at openjdk.java.net Thu Nov 19 09:51:06 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 19 Nov 2020 09:51:06 GMT Subject: Integrated: 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 02:40:47 GMT, Dong Bo wrote: > In aarch64-asmtest.py, register `r18` is not generated for all the GeneralRegister instructions, except `LoadStoreOp` and `LoadStorePairOp`. > So when adding new instructions to the script before `LoadStoreOp` or `LoadStorePairOp`, > assembler smoke test may fail with undefined register `r18`, like: > Creating javadoc element list > .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp: In function 'void entry(CodeBuffer*)': > .../jdk/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:580:37: error: 'r18' was not declared in this scope; did you mean 'z18'? > 580 | __ ldp(r28, r22, Address(__ pre(r18, 16))); // ldp x28, x22, [x18, #16]! > | ^~~ > | z18 > gmake[3]: *** [lib/CompileJvm.gmk:143: .../jdk/build/linux-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/assembler_aarch64.o] Error 1 > gmake[3]: *** Waiting for unfinished jobs.... > Compiling 89 properties into resource bundles for java.desktop > > This excludes `r18` from the random list of `LoadStoreOp` and `LoadStorePairOp`. This pull request has now been integrated. Changeset: 6702910b Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/6702910b Stats: 32 lines in 2 files changed: 0 ins; 0 del; 32 mod 8256375: AArch64: aarch64-asmtest.py may generate undefined register r18 Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1304 From eric.c.liu at arm.com Thu Nov 19 09:50:40 2020 From: eric.c.liu at arm.com (Eric Liu) Date: Thu, 19 Nov 2020 09:50:40 +0000 Subject: RFR: 8254872: Optimize Rotate on AArch64 In-Reply-To: References: <2GaTFehVFqrasp9rGmhjCWuGoZuRc9pl3jLki-Pt6_o=.bdccf9be-7c20-4f07-856f-0999b4ecf7e0@github.com>, Message-ID: Hi John, I thought about this a bit more. > From: hotspot-compiler-dev on behalf of John Rose > Sent: 14 November 2020 2:43 > To: Eric Liu > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR: 8254872: Optimize Rotate on AArch64 > ? > On Nov 13, 2020, at 2:39 AM, Eric Liu wrote: > > > > This patch is a supplement for > > https://bugs.openjdk.java.net/browse/JDK-8248830. > > > > With the implementation of rotate node in IR, this patch: > > > > 1. canonicalizes RotateLeft into RotateRight when shift is a constant, > >?? so that GVN could identify the pre-existing node better. > > 2. implements scalar rotate match rules and removes the original > >?? combinations of Or and Shifts on AArch64. > > > > This patch doesn't implement vector rotate due to the lack of > > corresponding vector instructions on AArch64. > > > > Test case below is an explanation for this patch. > > > >??????? public static int test(int i) { > >??????????? int a =? (i >>> 29) | (i << -29); > >??????????? int b = i << 3; > >??????????? int c = i >>> -3; > >??????????? int d = b | c; > >??????????? return a ^ d; > >??????? } > > Because of shift-count masking, this parses to nodes > equivalent to: > > ?????? public static int test(int i) { > ?????????? int a =? (i >>> 29) | (i << 3); > ?????????? int d = (i << 3) | (i >>> 29); > ?????????? // not detected: a == d? > ?????????? int r = a ^ d; > ?????????? // not detected: r == 0 > ?????????? return r; > ?????? } > > If we were to work a little harder at canonicalizing > commutative expressions in IGVN, we could detect > that a==d.? (See AddNode::Ideal.)? It?s tempting to > pull on this very long string, but it?s not clear when > to stop, if not now. > > In this case the better road is to canonicalize both > a and d to the same rotate node.? But maybe there?s > some benefit in reordering x|y|z and x^y^z when > x and z could combine to a rotate node.? (This isn?t > your problem!) > > > Before: > > > >??????? lsl???? w12, w1, #3 > >??????? lsr???? w10, w1, #29 > >??????? add???? w11, w10, w12 > >??????? orr???? w12, w12, w10 > >??????? eor???? w0, w11, w12 > > > > After: > > > >??????? ror???? w10, w1, #29 > >??????? eor???? w0, w10, w10 > > Amazingly, w10^w10 does not GVN to zero! I think this was cased by the lack of Ideal on Xor node. Perhaps we can add some rules for it: 1). x ^ x ==> 0, this can solve the above issue. 2). Const ^ x ==> x ^ Const, so that GVN could replace with the pre-existed node. 3). x ^ y ==> x, if y is constant zero. 4). x ^ y ==> ~x, if y is constant bit mask value. > > Your test appears to rely on that weakness. > I think the weakness should be fixed in a > separate investigation. > > Anyway, none of these remarks reflects > on your patch. > > ? John Thanks, Eric From thartmann at openjdk.java.net Thu Nov 19 10:36:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 19 Nov 2020 10:36:05 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") In-Reply-To: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 09:05:06 GMT, Roberto Casta?eda Lozano wrote: > Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. Looks good to me. test/hotspot/jtreg/compiler/arguments/TestDisableUseLoopSafepoints.java line 32: > 30: * that other parts of C2 assume that this option is enabled and might > 31: * fail in more subtle ways otherwise. > 32: * @run main/othervm -Xcomp -Xbatch -XX:-TieredCompilation `-Xcomp` implies `-Xbatch` test/hotspot/jtreg/compiler/arguments/TestDisableUseLoopSafepoints.java line 31: > 29: * missed this node" message when UseLoopSafepoints is disabled. Note > 30: * that other parts of C2 assume that this option is enabled and might > 31: * fail in more subtle ways otherwise. Please make sure that this test does not trigger any of these other failures when executed in the CI. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1311 From thartmann at openjdk.java.net Thu Nov 19 11:07:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 19 Nov 2020 11:07:05 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:46:00 GMT, Vladimir Ivanov wrote: > Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. > > Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. > > (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) > > Testing (with other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Looks reasonable to me. src/hotspot/share/prims/vectorSupport.cpp line 96: > 94: case T_DOUBLE: arr->bool_at_put(index, (*(jlong*)addr) != 0); break; > 95: > 96: default: assert(false, "unsupported: %s", type2name(elem_bt)); Why did you replace `fatal` by `assert`? ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1136 From thartmann at openjdk.java.net Thu Nov 19 11:39:03 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 19 Nov 2020 11:39:03 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v3] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 13:31:06 GMT, Christian Hagedorn wrote: >> The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): >> https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 >> >> We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). >> >> My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. >> ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update comments and invariant selection in offset_plus_k Otherwise looks good to me. src/hotspot/share/opto/superword.cpp line 73: > 71: _lpt(NULL), // loop tree node > 72: _lp(NULL), // CountedLoopNode > 73: _pre_loop_head(NULL), // Pre loop CountedLoopNode Is this field really needed? Can't we just retrieve the pre-loop head via `_pre_loop_end->loopnode()`? ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/954 From rcastanedalo at openjdk.java.net Thu Nov 19 12:44:18 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 12:44:18 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: > Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Remove redundant -Xbatch option in test case - Remove confusing comment in test case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1311/files - new: https://git.openjdk.java.net/jdk/pull/1311/files/5074f818..4129529c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1311&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1311&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1311/head:pull/1311 PR: https://git.openjdk.java.net/jdk/pull/1311 From rcastanedalo at openjdk.java.net Thu Nov 19 12:44:19 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 12:44:19 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 10:30:12 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove redundant -Xbatch option in test case >> - Remove confusing comment in test case > > test/hotspot/jtreg/compiler/arguments/TestDisableUseLoopSafepoints.java line 32: > >> 30: * that other parts of C2 assume that this option is enabled and might >> 31: * fail in more subtle ways otherwise. >> 32: * @run main/othervm -Xcomp -Xbatch -XX:-TieredCompilation > > `-Xcomp` implies `-Xbatch` Thanks, updated! ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From rcastanedalo at openjdk.java.net Thu Nov 19 12:58:11 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Nov 2020 12:58:11 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 10:33:30 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove redundant -Xbatch option in test case >> - Remove confusing comment in test case > > Looks good to me. Thanks for reviewing Nils and Tobias! I have updated the test with the feedback from Tobias. > test/hotspot/jtreg/compiler/arguments/TestDisableUseLoopSafepoints.java line 31: > >> 29: * missed this node" message when UseLoopSafepoints is disabled. Note >> 30: * that other parts of C2 assume that this option is enabled and might >> 31: * fail in more subtle ways otherwise. > > Please make sure that this test does not trigger any of these other failures when executed in the CI. The last part of the comment referred to the case `-XX:-UseLoopSafepoints` is used on more complex methods than the tested one. I have just removed it, as it was confusing and probably doing more harm than good. The test succeeds for all CI debug configurations and is correctly skipped for all CI release configurations. ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From thartmann at openjdk.java.net Thu Nov 19 13:09:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 19 Nov 2020 13:09:08 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 12:44:18 GMT, Roberto Casta?eda Lozano wrote: >> Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Remove redundant -Xbatch option in test case > - Remove confusing comment in test case Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From roland at openjdk.java.net Thu Nov 19 13:58:15 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 19 Nov 2020 13:58:15 GMT Subject: RFR: 8256655: rework long counted loop handling Message-ID: Currently the transformation of a long counted loop into a loop nest with an inner int counted loop is performed in 2 steps: 1- recognize the counted loop shape and build the loop nest 2- transform the inner loop into a counted loop I propose changing this to a 3 steps process: 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) 2- build the loop nest 3- transform the inner loop into a counted loop The benefits are: - the logic is cleaner because step 1 and 2 are now separated - Simple optimizations (loop iv type, empty loop elimination, parallel iv) can be implemented for LongCountedLoop by refactoring existing code 1- above is achieved by refactoring the PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the kind of counted loop, int or long). 2- is the existing loop nest construction logic. But now that it takes a LongCountedLoop as input, the shape of the loop is known to be that of a canonicalized counted loop. As a result, the loop nest construction is simpler. This change also refactors PhiNode::Value() so that it works for both CountedLoop and LongCountedLoop. I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type node) and: jlong init_p = (jlong)init_t->_lo + stride_con; if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) return false; // cyclic loop or this loop trips only once to: if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { because if the loop has a single iteration transforming it to a CountedLoop should allow the backedge to be optimized out. ------------- Commit messages: - fix trailing whitespace - fix comment - long counted loop refactoring Changes: https://git.openjdk.java.net/jdk/pull/1316/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256655 Stats: 967 lines in 25 files changed: 508 ins; 211 del; 248 mod Patch: https://git.openjdk.java.net/jdk/pull/1316.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1316/head:pull/1316 PR: https://git.openjdk.java.net/jdk/pull/1316 From thartmann at openjdk.java.net Thu Nov 19 14:32:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 19 Nov 2020 14:32:08 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v2] In-Reply-To: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> References: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> Message-ID: On Wed, 18 Nov 2020 21:42:23 GMT, Jason Tatton wrote: >> Hi all, >> >> Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. >> >> A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. >> >> Here is an example output:- >> before: >> # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 >> >> after: >> # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 >> >> Testing: >> I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > refactored linenumber output to print_method_with_lineno method and tweaked unit test I've submitted some testing and `TestPrintOptoAssemblyLineNumbers` fails when being executed with `-XX:CompileThreshold=100`. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1272 From chagedorn at openjdk.java.net Thu Nov 19 17:06:20 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 19 Nov 2020 17:06:20 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: > The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): > https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 > > We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). > > My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. > ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Remove _pre_loop_head field - Update comments and invariant selection in offset_plus_k - Check dominance with pre loop head instead of tail - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance ------------- Changes: https://git.openjdk.java.net/jdk/pull/954/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=03 Stats: 102 lines in 2 files changed: 51 ins; 7 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/954.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/954/head:pull/954 PR: https://git.openjdk.java.net/jdk/pull/954 From chagedorn at openjdk.java.net Thu Nov 19 17:06:21 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 19 Nov 2020 17:06:21 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v3] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 11:35:49 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Otherwise looks good to me. @TobiHartmann Thanks for your review! > src/hotspot/share/opto/superword.cpp line 73: > >> 71: _lpt(NULL), // loop tree node >> 72: _lp(NULL), // CountedLoopNode >> 73: _pre_loop_head(NULL), // Pre loop CountedLoopNode > > Is this field really needed? Can't we just retrieve the pre-loop head via `_pre_loop_end->loopnode()`? You're right, it is not really needed. I removed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From kvn at openjdk.java.net Thu Nov 19 17:49:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 17:49:07 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 12:44:18 GMT, Roberto Casta?eda Lozano wrote: >> Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Remove redundant -Xbatch option in test case > - Remove confusing comment in test case Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1311 From kvn at openjdk.java.net Thu Nov 19 18:02:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:02:15 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 17:06:20 GMT, Christian Hagedorn wrote: >> The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): >> https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 >> >> We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). >> >> My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. >> ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Remove _pre_loop_head field > - Update comments and invariant selection in offset_plus_k > - Check dominance with pre loop head instead of tail > - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance Unfortunately with forced push I can't see incremental update. Based on last update comments changes are reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/954 From kvn at openjdk.java.net Thu Nov 19 18:13:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:13:01 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:46:00 GMT, Vladimir Ivanov wrote: > Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. > > Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. > > (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) > > Testing (with other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 How this passed testing with obvious bug in changes? src/hotspot/share/prims/vectorSupport.cpp line 94: > 92: case T_FLOAT: arr->bool_at_put(index, (*(jint*)addr) != 0); break; > 93: case T_LONG: // fall-through > 94: case T_DOUBLE: arr->bool_at_put(index, (*(jlong*)addr) != 0); break; Why you push `bool` value for everything?!!! ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1136 From kvn at openjdk.java.net Thu Nov 19 18:19:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:19:10 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:46:00 GMT, Vladimir Ivanov wrote: > Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. > > Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. > > (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) > > Testing (with other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Changes requested by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From kvn at openjdk.java.net Thu Nov 19 18:19:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:19:13 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 18:07:58 GMT, Vladimir Kozlov wrote: >> Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. >> >> Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. >> >> (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) >> >> Testing (with other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > src/hotspot/share/prims/vectorSupport.cpp line 94: > >> 92: case T_FLOAT: arr->bool_at_put(index, (*(jint*)addr) != 0); break; >> 93: case T_LONG: // fall-through >> 94: case T_DOUBLE: arr->bool_at_put(index, (*(jlong*)addr) != 0); break; > > Why you push `bool` value for everything?!!! Okay. After reading code more I understand that array is boolean type for case when `is_mask` is `true`. So changes are correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From kvn at openjdk.java.net Thu Nov 19 18:19:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:19:13 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 18:14:36 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/prims/vectorSupport.cpp line 94: >> >>> 92: case T_FLOAT: arr->bool_at_put(index, (*(jint*)addr) != 0); break; >>> 93: case T_LONG: // fall-through >>> 94: case T_DOUBLE: arr->bool_at_put(index, (*(jlong*)addr) != 0); break; >> >> Why you push `bool` value for everything?!!! > > Okay. After reading code more I understand that array is boolean type for case when `is_mask` is `true`. So changes are correct. May be you need separate method to avoid this confusion. The code is separate anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From kvn at openjdk.java.net Thu Nov 19 19:42:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 19:42:06 GMT Subject: RFR: 8256655: rework long counted loop handling In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 13:38:20 GMT, Roland Westrelin wrote: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. Good. src/hotspot/share/opto/cfgnode.cpp line 1097: > 1095: const TypeInteger* lo = phase->type(init)->isa_integer(l->bt()); > 1096: const TypeInteger* hi = phase->type(limit)->isa_integer(l->bt()); > 1097: const TypeInteger* stride_t = phase->type(stride)->isa_integer(l->bt()); Do we have assert somewhere which checks that all loop's values (init, limit, stride) have the same type? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1316 From xliu at openjdk.java.net Thu Nov 19 21:23:06 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 19 Nov 2020 21:23:06 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 17:46:06 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove redundant -Xbatch option in test case >> - Remove confusing comment in test case > > Good. LGTM. Thanks for fixing it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From github.com+70893615+jasontatton-aws at openjdk.java.net Thu Nov 19 21:50:20 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Thu, 19 Nov 2020 21:50:20 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v3] In-Reply-To: References: Message-ID: > Hi all, > > Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. > > A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. > > Here is an example output:- > before: > # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 > > after: > # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 > > Testing: > I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. Jason Tatton has updated the pull request incrementally with two additional commits since the last revision: - fix whitespace error - adjusted test to check for line number information only if c2 optimizer output is present on console output ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1272/files - new: https://git.openjdk.java.net/jdk/pull/1272/files/468f1cbf..40aa2473 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1272&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1272&range=01-02 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1272/head:pull/1272 PR: https://git.openjdk.java.net/jdk/pull/1272 From github.com+70893615+jasontatton-aws at openjdk.java.net Thu Nov 19 21:50:20 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Thu, 19 Nov 2020 21:50:20 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v2] In-Reply-To: References: <88Lw-mWf9G1KjNFQKAffwehUIiP99WG7Ly6y2PTMS9k=.75a73bce-43ef-47f2-b467-64a1a1c68300@github.com> Message-ID: On Thu, 19 Nov 2020 14:29:00 GMT, Tobias Hartmann wrote: > I've submitted some testing and TestPrintOptoAssemblyLineNumbers fails when being executed with -XX:CompileThreshold=100. Thanks for raising this. I have adjusted the test case such that it will only check for line number information on c2 optimizer output (from -XX:+PrintOptoAssembly) if that output is present on the console output. This way the test will be effectively skipped in cases where command line options such as `-XX:CompileThreshold=100` are used ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From iignatyev at openjdk.java.net Fri Nov 20 00:09:11 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 20 Nov 2020 00:09:11 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW Message-ID: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Hi all, Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. testing: * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` Cheers, -- Igor ------------- Commit messages: - 8256569: Add C2 compiler stress flags to CTW Changes: https://git.openjdk.java.net/jdk/pull/1332/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1332&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256569 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1332.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1332/head:pull/1332 PR: https://git.openjdk.java.net/jdk/pull/1332 From kvn at openjdk.java.net Fri Nov 20 00:29:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 00:29:02 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: On Fri, 20 Nov 2020 00:04:31 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1332 From xgong at openjdk.java.net Fri Nov 20 06:42:03 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 20 Nov 2020 06:42:03 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 05:21:39 GMT, Xiaohong Gong wrote: > Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, > and use the unsigned right shift instead. > > [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 > > Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. Hi, could anyone please take a look at this small PR? Thanks very much! ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From chagedorn at openjdk.java.net Fri Nov 20 07:16:06 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 20 Nov 2020 07:16:06 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 17:59:27 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Remove _pre_loop_head field >> - Update comments and invariant selection in offset_plus_k >> - Check dominance with pre loop head instead of tail >> - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance > > Unfortunately with forced push I can't see incremental update. > Based on last update comments changes are reasonable. @vnkozlov Thanks for reviewing again! Yes, that's unfortunate, I wasn't aware of that. I will try to do a merge next time. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From xgong at openjdk.java.net Fri Nov 20 07:19:09 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 20 Nov 2020 07:19:09 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max Message-ID: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Currently the Arm SVE implementation for integer (byte,short,int,long) vector min/max is missing. This is needed for VectorAPI. We need to add them all to avoid the "bad AD file". ------------- Commit messages: - 8256614: AArch64: Add SVE backend implementation for integer min/max Changes: https://git.openjdk.java.net/jdk/pull/1337/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1337&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256614 Stats: 135 lines in 3 files changed: 76 ins; 31 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/1337.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1337/head:pull/1337 PR: https://git.openjdk.java.net/jdk/pull/1337 From chagedorn at openjdk.java.net Fri Nov 20 07:22:03 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 20 Nov 2020 07:22:03 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 07:13:23 GMT, Christian Hagedorn wrote: >> Unfortunately with forced push I can't see incremental update. >> Based on last update comments changes are reasonable. > > @vnkozlov Thanks for reviewing again! Yes, that's unfortunate, I wasn't aware of that. I will try to do a merge next time. This was the new commit: https://github.com/openjdk/jdk/pull/954/commits/bcab62592ecb032b0ff7754576d3a5b6125a82d3, the other commits are identical. ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From shade at openjdk.java.net Fri Nov 20 07:25:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 07:25:06 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: On Fri, 20 Nov 2020 00:04:31 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor Looks good to me! ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1332 From thartmann at openjdk.java.net Fri Nov 20 07:58:04 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 07:58:04 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: On Fri, 20 Nov 2020 00:04:31 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor Looks good! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1332 From thartmann at openjdk.java.net Fri Nov 20 08:00:04 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 08:00:04 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 17:06:20 GMT, Christian Hagedorn wrote: >> The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): >> https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 >> >> We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). >> >> My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. >> ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove _pre_loop_head field > - Update comments and invariant selection in offset_plus_k > - Check dominance with pre loop head instead of tail > - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance Thanks for making these changes. Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/954 From thartmann at openjdk.java.net Fri Nov 20 08:17:03 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 08:17:03 GMT Subject: RFR: 8256655: rework long counted loop handling In-Reply-To: References: Message-ID: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> On Thu, 19 Nov 2020 13:38:20 GMT, Roland Westrelin wrote: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. This fails to build on Windows: ` [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ')' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ';' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2059: syntax error: ')'` ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1316 From thartmann at openjdk.java.net Fri Nov 20 08:24:12 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 08:24:12 GMT Subject: RFR: 8256719: C1 flags that should have expired are still present Message-ID: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> [JDK-8235673](https://bugs.openjdk.java.net/browse/JDK-8235673) removed some flags from C1 and made them C2 only. They expired in JDK 16 and should simply be removed. Thanks, Tobias ------------- Commit messages: - 8256719: C1 flags that should have expired are still present Changes: https://git.openjdk.java.net/jdk/pull/1338/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1338&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256719 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1338.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1338/head:pull/1338 PR: https://git.openjdk.java.net/jdk/pull/1338 From neliasso at openjdk.java.net Fri Nov 20 08:28:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 08:28:11 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Thu, 19 Nov 2020 12:44:18 GMT, Roberto Casta?eda Lozano wrote: >> Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Remove redundant -Xbatch option in test case > - Remove confusing comment in test case Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From shade at openjdk.java.net Fri Nov 20 08:29:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 08:29:03 GMT Subject: RFR: 8256719: C1 flags that should have expired are still present In-Reply-To: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> References: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> Message-ID: On Fri, 20 Nov 2020 08:18:38 GMT, Tobias Hartmann wrote: > [JDK-8235673](https://bugs.openjdk.java.net/browse/JDK-8235673) removed some flags from C1 and made them C2 only. They expired in JDK 16 and should simply be removed. > > Thanks, > Tobias Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1338 From rcastanedalo at openjdk.java.net Fri Nov 20 08:30:04 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 20 Nov 2020 08:30:04 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: <-jKb9GbQqIbHcSJCs5LXLYUSFVs1DZWa818tNANHFBw=.9a8711e1-aa5c-4227-92a8-b34ada4e6ff7@github.com> On Fri, 20 Nov 2020 00:04:31 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor Thanks for doing this Igor, not a reviewer but looks good to me! ------------- PR: https://git.openjdk.java.net/jdk/pull/1332 From rcastanedalo at openjdk.java.net Fri Nov 20 08:35:04 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 20 Nov 2020 08:35:04 GMT Subject: RFR: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") [v2] In-Reply-To: References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: On Fri, 20 Nov 2020 08:25:10 GMT, Nils Eliasson wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove redundant -Xbatch option in test case >> - Remove confusing comment in test case > > Marked as reviewed by neliasso (Reviewer). Thanks for reviewing Vladimir and Xin! ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From neliasso at openjdk.java.net Fri Nov 20 08:35:05 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 08:35:05 GMT Subject: RFR: 8256719: C1 flags that should have expired are still present In-Reply-To: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> References: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> Message-ID: On Fri, 20 Nov 2020 08:18:38 GMT, Tobias Hartmann wrote: > [JDK-8235673](https://bugs.openjdk.java.net/browse/JDK-8235673) removed some flags from C1 and made them C2 only. They expired in JDK 16 and should simply be removed. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1338 From neliasso at openjdk.java.net Fri Nov 20 08:51:06 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 08:51:06 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 06:02:34 GMT, Xin Liu wrote: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic src/hotspot/share/compiler/compilerDirectives.hpp line 198: > 196: if (vmIntrinsics::_none == vmIntrinsics::find_id(*iter)) { > 197: _bad = NEW_C_HEAP_ARRAY(char, strlen(*iter) + 1, mtCompiler); > 198: strncpy(_bad, *iter, strlen(*iter) + 1); This doesn't compile. Using strlen as an argument to strncpy is disallowed. > "warning: 'char* __builtin_strncpy(char*, const char*, long unsigned int)' specified bound depends on the length of the source argument [-Wstringop-overflow=]" Do a min between strlen and the maximum allowed length. Fix this for both uses of the string length (row 197 and 198). ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From roland at openjdk.java.net Fri Nov 20 08:59:21 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 08:59:21 GMT Subject: RFR: 8256655: rework long counted loop handling [v2] In-Reply-To: References: Message-ID: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: build fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1316/files - new: https://git.openjdk.java.net/jdk/pull/1316/files/fa2bedf0..dd6943b6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1316.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1316/head:pull/1316 PR: https://git.openjdk.java.net/jdk/pull/1316 From roland at openjdk.java.net Fri Nov 20 08:59:22 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 08:59:22 GMT Subject: RFR: 8256655: rework long counted loop handling [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 19:26:41 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> build fixes > > src/hotspot/share/opto/cfgnode.cpp line 1097: > >> 1095: const TypeInteger* lo = phase->type(init)->isa_integer(l->bt()); >> 1096: const TypeInteger* hi = phase->type(limit)->isa_integer(l->bt()); >> 1097: const TypeInteger* stride_t = phase->type(stride)->isa_integer(l->bt()); > > Do we have assert somewhere which checks that all loop's values (init, limit, stride) have the same type? Thanks for reviewing it. > Do we have assert somewhere which checks that all loop's values (init, limit, stride) have the same type? isa_integer() is passed l->bt() which is T_INT for a CountedLoop and T_LONG for LongCountedLoop. This causes isa_integer() to either call isa_int() or isa_long(). So yes, they should all be the same type and the type that matches the loop's type. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From aph at openjdk.java.net Fri Nov 20 09:03:03 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 20 Nov 2020 09:03:03 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 06:39:02 GMT, Xiaohong Gong wrote: >> Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, >> and use the unsigned right shift instead. >> >> [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 >> >> Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. > > Hi, could anyone please take a look at this small PR? Thanks very much! I'm very skeptical about the need for this. Every AArch64 compiler of which I'm aware treats signed right shift as well defined, and any incoming AArch64 compiler would need to do so in order to be compatible. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From aph at openjdk.java.net Fri Nov 20 09:06:08 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 20 Nov 2020 09:06:08 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 05:21:39 GMT, Xiaohong Gong wrote: > Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, > and use the unsigned right shift instead. > > [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 > > Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. Changes requested by aph (Reviewer). src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3212: > 3210: } else if (T != B && imm16 <= 32512 && imm16 >= -32768 && (imm16 & 0xff) == 0) { > 3211: sh = 1; > 3212: imm = (imm >> 8); I'm very skeptical about the need for this. Every AArch64 compiler of which I'm aware treats signed right shift as well defined, and any incoming AArch64 compiler would need to do so in order to be compatible. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3219: > 3217: imm &= mask; > 3218: f(0b00100101, 31, 24), f(T, 23, 22), f(0b11100011, 21, 14); > 3219: f(sh, 13), f(imm, 12, 5), rf(Zd, 0); imm here is a "signed immediate in the range -128 to 127," and it would be perverse to treat it as an unsigned value in the assembler. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From roland at openjdk.java.net Fri Nov 20 09:48:09 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 09:48:09 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well Message-ID: This was reported by Paul with the vector API. There are 2 issues: - CastII nodes (added by Objects.checkIndex()) gets in the way of the pattern matching performed by range check elimination - By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) some CastII can be eliminated which improves address computation code. ------------- Commit messages: - step over CastII in range checks & push CastII thru add Changes: https://git.openjdk.java.net/jdk/pull/1342/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1342&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256730 Stats: 181 lines in 5 files changed: 116 ins; 55 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1342.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1342/head:pull/1342 PR: https://git.openjdk.java.net/jdk/pull/1342 From neliasso at openjdk.java.net Fri Nov 20 09:51:13 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 09:51:13 GMT Subject: RFR: 8256508: Improve CompileCommand flag Message-ID: The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. This: -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 Is superseded by: -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 Attention check: Did you spot the error in the old command? In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. ------------- Commit messages: - Make tests debug only - Fix typos - remove typo - Fix CompilerConfigFileWarning test - Fix test - Merge branch 'master' of https://github.com/openjdk/jdk into improve_compile_command - Fixed messages and help text - Clean up error reporting - fix CheckCompileCommandOption test - Fixing tests - ... and 5 more: https://git.openjdk.java.net/jdk/compare/f7517386...92eec9d6 Changes: https://git.openjdk.java.net/jdk/pull/1276/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256508 Stats: 943 lines in 22 files changed: 462 ins; 151 del; 330 mod Patch: https://git.openjdk.java.net/jdk/pull/1276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1276/head:pull/1276 PR: https://git.openjdk.java.net/jdk/pull/1276 From mdoerr at openjdk.java.net Fri Nov 20 09:52:06 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 20 Nov 2020 09:52:06 GMT Subject: RFR: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 15:14:39 GMT, Martin Doerr wrote: >> Perhaps the following patch might help. >> >> Still for say 512 conversion test on my mac that has no AVX512 support the test runs (including compilation) in about 60s. With the patch it reduces to about 40s. >> >> If you run jtreg in verbose mode, `-va` it should output individual test times. Perhaps some are taking longer than others? >> >> diff --git a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java >> index d1303bfd295..1754af2110a 100644 >> --- a/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java >> +++ b/test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java >> @@ -551,7 +551,8 @@ abstract class AbstractVectorConversionTest { >> int m = Math.max(dst_species_len,src_species_len) / Math.min(src_species_len,dst_species_len); >> >> int [] parts = getPartsArray(m, is_contracting_conv); >> - for (int ic = 0; ic < INVOC_COUNT; ic++) { >> + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); >> + for (int ic = 0; ic < count; ic++) { >> for (int i=0, j=0; i < in_len; i += src_species_len, j+= dst_species_len) { >> int part = parts[i % parts.length]; >> var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); >> @@ -592,7 +593,8 @@ abstract class AbstractVectorConversionTest { >> int m = Math.max(dst_vector_size,src_vector_size) / Math.min(dst_vector_size, src_vector_size); >> >> int [] parts = getPartsArray(m, is_contracting_conv); >> - for (int ic = 0; ic < INVOC_COUNT; ic++) { >> + int count = invocationCount(INVOC_COUNT, SPECIES, OSPECIES); >> + for (int ic = 0; ic < count; ic++) { >> for (int i = 0, j=0; i < in_len; i += src_vector_lane_cnt, j+= dst_vector_lane_cnt) { >> int part = parts[i % parts.length]; >> var av = Vector64ConversionTests.vectorFactory(unboxed_a, i, SPECIES); >> @@ -609,4 +611,15 @@ abstract class AbstractVectorConversionTest { >> } >> assertResultsEquals(boxed_res, boxed_ref, dst_vector_lane_cnt); >> } >> + >> + static int invocationCount(int c, VectorSpecies... species) { >> + return Arrays.asList(species).stream().allMatch(AbstractVectorConversionTest::leqPreferred) >> + ? c >> + : Math.min(c, c / 100); >> + } >> + >> + static boolean leqPreferred(VectorSpecies species) { >> + VectorSpecies preferred = VectorSpecies.ofPreferred(species.elementType()); >> + return species.length() <= preferred.length(); >> + } >> } > > Hi Paul, > your proposal helps a bit. Test has passed on some machines, but not on all ones. > "contracting_conversion_scalar" is still very prominent on PPC. Do we need to implement intrinsics to get this fast? > > In addition, there's another timeout in AddTest.java on x86: > https://bugs.openjdk.java.net/browse/JDK-8255915 > > So it seems like there's more work to do. > Suggestions? Situation has improved with 8256581: Refactor vector conversion tests. I'm closing this PR. We may need a new one to increase some timeout values for specific platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From mdoerr at openjdk.java.net Fri Nov 20 09:52:08 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 20 Nov 2020 09:52:08 GMT Subject: Withdrawn: 8255959: Timeouts in VectorConversion tests In-Reply-To: References: Message-ID: <951GU-986f_hPFzZ7yYH3LmMzDil1Gw-pz3OcrQ2mFo=.0c6dd54a-a5c4-4923-95ce-25768e400a91@github.com> On Thu, 5 Nov 2020 16:59:57 GMT, Martin Doerr wrote: > We observed many timeouts in the following test/jdk/jdk/incubator/vector tests: > Vector128ConversionTests.java > Vector256ConversionTests.java > Vector512ConversionTests.java > Vector64ConversionTests.java > VectorMaxConversionTests.java > Some machines don't support vector instructions or fewer of them and C2 uses slower alternatives. > > Maybe there are options to make the tests faster, but I just propose to use a larger timeout value for now. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1079 From adinn at openjdk.java.net Fri Nov 20 10:03:03 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 20 Nov 2020 10:03:03 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Message-ID: On Fri, 20 Nov 2020 07:13:18 GMT, Xiaohong Gong wrote: > We need to add them all to avoid the "bad AD file" crash. I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? Is there a before and after test case for the problem this fixes? ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From burban at openjdk.java.net Fri Nov 20 10:17:06 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 20 Nov 2020 10:17:06 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:30:11 GMT, Aleksey Shipilev wrote: >> Fix this warning: >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? >> >> Thanks to @magicus to bring that to my attention. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 509: > >> 507: if (_ext.shift() > 0) { >> 508: assert(_ext.shift() == (int)size, "bad shift"); >> 509: } > > D'oh. So this is semantically the same as: > > assert(_ext.shift() <= 0 || _ext.shift() == (int)size, "bad shift"); > > ...or, if we expect shift to be non-negative: > > assert(_ext.shift() == 0 || _ext.shift() == (int)size, "bad shift"); Indeed. Are you suggesting to change it? I usually try to avoid changing code style when doing a fix of such a nature. ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From shade at openjdk.java.net Fri Nov 20 10:17:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 10:17:06 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 10:11:09 GMT, Bernhard Urban-Forster wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 509: >> >>> 507: if (_ext.shift() > 0) { >>> 508: assert(_ext.shift() == (int)size, "bad shift"); >>> 509: } >> >> D'oh. So this is semantically the same as: >> >> assert(_ext.shift() <= 0 || _ext.shift() == (int)size, "bad shift"); >> >> ...or, if we expect shift to be non-negative: >> >> assert(_ext.shift() == 0 || _ext.shift() == (int)size, "bad shift"); > > Indeed. Are you suggesting to change it? I usually try to avoid changing code style when doing a fix of such a nature. I feels odd to have empty method body when asserts are disabled. So yes, I suggest to subsume the check into the assert itself. ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From xgong at openjdk.java.net Fri Nov 20 10:22:05 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 20 Nov 2020 10:22:05 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Message-ID: <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> On Fri, 20 Nov 2020 10:00:27 GMT, Andrew Dinn wrote: > > We need to add them all to avoid the "bad AD file" crash. > > I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? > > Is there a before and after test case for the problem this fixes? Hi @adinn , thanks for looking at this PR. Currently only the VectorAPI can generate the `MinVNode/MaxVNode` for integer types. I found this issue when I tried to use the related API to do some other investigation work. And actually there is the jtreg tests (eg: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L2403) for it. I think the reason that we didn't find the issue is the loop count `INVOC_COUNT` , which is too small that didn't make the method hot enough to be compiled with C2. Currently the value is set to 100: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); When I use a larger loop count like 10000, the test can crash with `bad AD file`. So this issue can be reproduced by adding `-Djdk.incubator.vector.test.loop-iterations=10000` when running the tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From xgong at openjdk.java.net Fri Nov 20 10:35:06 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 20 Nov 2020 10:35:06 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: Message-ID: <_c9PhWNGhNcq0rqLT11A_KVXKv7iVzfBXvt9HG9pr24=.be9a9c5b-1380-4735-b3e7-7f8460cf8b20@github.com> On Thu, 19 Nov 2020 09:03:08 GMT, Andrew Haley wrote: >> Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, >> and use the unsigned right shift instead. >> >> [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 >> >> Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3212: > >> 3210: } else if (T != B && imm16 <= 32512 && imm16 >= -32768 && (imm16 & 0xff) == 0) { >> 3211: sh = 1; >> 3212: imm = (imm >> 8); > > I'm very skeptical about the need for this. Every AArch64 compiler of which I'm aware treats signed right shift as well defined, and any incoming AArch64 compiler would need to do so in order to be compatible. Hi @theRealAph , thanks for looking at this PR, and thanks for your comment here. Yes, I agree that the compilers we know like GCC/LLVM can make sure the behavior is defined on AArch64. And actually we didn't met any issues here. However, I'm not quite sure whether other compilers can guarantee it. This is just used to avoid the undefined behavior in future. So do you think we need to fix it here? I can abandon this patch if this is not valuable. Thanks! > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3219: > >> 3217: imm &= mask; >> 3218: f(0b00100101, 31, 24), f(T, 23, 22), f(0b11100011, 21, 14); >> 3219: f(sh, 13), f(imm, 12, 5), rf(Zd, 0); > > imm here is a "signed immediate in the range -128 to 127," and it would be perverse to treat it as an unsigned value in the assembler. Hi @theRealAph , thanks for the comment. `imm` here is an unsigned immediate and it has been masked with `0xff` before, which is the same with the implementation of `sf`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From luhenry at microsoft.com Fri Nov 20 11:01:17 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 20 Nov 2020 11:01:17 +0000 Subject: Some questions on intrinsic for UTF8 to UTF16 decoding Message-ID: Hi, I've started to implement an intrinsic to vectorize the decoding (and soon encoding) of UTF-8 to UTF-16. My current work in progress is at [1]. However, I'm running into limitations in my knowledge of Hotspot, and I am seeking your advice and know-how. The first thing I'm running into is how to pass parameters to the intrinsic by reference. AFAIK there is no way to do such a thing in Java code. My hope is then that when creating the call to the intrinsic in library_call.cpp, we can pass the address of the variable instead of the value. But I don't know if it's even possible, and if it is, how to do so. The second thing I'm running into is not so much a technical limitation, but a question that, I am sure, is going to be raised during the review. This vectorization depends on a lookup table, but this lookup table can grow quite big (32768 elements, each of a size of 64-72 bytes, so ~2MB). I understand that this is much bigger than anything currently existing for any of the intrinsics, so I'm currently trying to figure out how I can reduce drastically the size of this table (compaction, lazy building, etc.). But first, I would like to hear your ideas as it may be an issue that was already faced in the past and for which a better solution was found. Thank you, -- Ludovic [1] https://github.com/openjdk/jdk/compare/master...luhenry:vectorUTF8 From burban at openjdk.java.net Fri Nov 20 11:04:15 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 20 Nov 2020 11:04:15 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 [v2] In-Reply-To: References: Message-ID: > Fix this warning: > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? > > Thanks to @magicus to bring that to my attention. Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: review feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1312/files - new: https://git.openjdk.java.net/jdk/pull/1312/files/d2d85317..4927d0d0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1312&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1312&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1312/head:pull/1312 PR: https://git.openjdk.java.net/jdk/pull/1312 From shade at openjdk.java.net Fri Nov 20 11:04:16 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 11:04:16 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:01:45 GMT, Bernhard Urban-Forster wrote: >> Fix this warning: >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? >> >> Thanks to @magicus to bring that to my attention. > > Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: > > review feedback Looks fine. Not sure how sensible are negative shifts, but new code follows the current semantics exactly. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1312 From burban at openjdk.java.net Fri Nov 20 11:04:17 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 20 Nov 2020 11:04:17 GMT Subject: Integrated: 8256633: Fix product build on Windows+Arm64 In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:17:58 GMT, Bernhard Urban-Forster wrote: > Fix this warning: > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error > C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? > > Thanks to @magicus to bring that to my attention. This pull request has now been integrated. Changeset: f5766287 Author: Bernhard Urban-Forster Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/f5766287 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8256633: Fix product build on Windows+Arm64 Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From burban at openjdk.java.net Fri Nov 20 11:04:16 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 20 Nov 2020 11:04:16 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 10:14:03 GMT, Aleksey Shipilev wrote: >> Indeed. Are you suggesting to change it? I usually try to avoid changing code style when doing a fix of such a nature. > > I feels odd to have empty method body when asserts are disabled. So yes, I suggest to subsume the check into the assert itself. Fair, I updated the PR. Thanks for the suggestion ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From aph at openjdk.java.net Fri Nov 20 10:47:06 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 20 Nov 2020 10:47:06 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: <_c9PhWNGhNcq0rqLT11A_KVXKv7iVzfBXvt9HG9pr24=.be9a9c5b-1380-4735-b3e7-7f8460cf8b20@github.com> References: <_c9PhWNGhNcq0rqLT11A_KVXKv7iVzfBXvt9HG9pr24=.be9a9c5b-1380-4735-b3e7-7f8460cf8b20@github.com> Message-ID: On Fri, 20 Nov 2020 10:30:50 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3212: >> >>> 3210: } else if (T != B && imm16 <= 32512 && imm16 >= -32768 && (imm16 & 0xff) == 0) { >>> 3211: sh = 1; >>> 3212: imm = (imm >> 8); >> >> I'm very skeptical about the need for this. Every AArch64 compiler of which I'm aware treats signed right shift as well defined, and any incoming AArch64 compiler would need to do so in order to be compatible. > > Hi @theRealAph , thanks for looking at this PR, and thanks for your comment here. Yes, I agree that the compilers we know like GCC/LLVM can make sure the behavior is defined on AArch64. And actually we didn't met any issues here. However, I'm not quite sure whether other compilers can guarantee it. This is just used to avoid the undefined behavior in future. So do you think we need to fix it here? I can abandon this patch if this is not valuable. Thanks! The compilers do guarantee it. Here's GCC, for example: 4.5 Integers GCC supports only two?s complement integer types, and all bit patterns are ordinary values. Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ?>>? acts on negative numbers by sign extension. >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3219: >> >>> 3217: imm &= mask; >>> 3218: f(0b00100101, 31, 24), f(T, 23, 22), f(0b11100011, 21, 14); >>> 3219: f(sh, 13), f(imm, 12, 5), rf(Zd, 0); >> >> imm here is a "signed immediate in the range -128 to 127," and it would be perverse to treat it as an unsigned value in the assembler. > > Hi @theRealAph , thanks for the comment. `imm` here is an unsigned immediate and it has been masked with `0xff` before, which is the same with the implementation of `sf`. Yes, I'm saying don't do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From roland at openjdk.java.net Fri Nov 20 12:00:10 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 12:00:10 GMT Subject: RFR: 8256655: rework long counted loop handling [v2] In-Reply-To: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> References: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> Message-ID: On Fri, 20 Nov 2020 08:13:55 GMT, Tobias Hartmann wrote: > This fails to build on Windows: > ` [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ')' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ';' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2059: syntax error: ')'` Build issues should be fixed now. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From adinn at openjdk.java.net Fri Nov 20 12:05:03 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 20 Nov 2020 12:05:03 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> Message-ID: On Fri, 20 Nov 2020 10:19:40 GMT, Xiaohong Gong wrote: >>> We need to add them all to avoid the "bad AD file" crash. >> >> I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? >> >> Is there a before and after test case for the problem this fixes? > >> > We need to add them all to avoid the "bad AD file" crash. >> >> I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? >> >> Is there a before and after test case for the problem this fixes? > > Hi @adinn , thanks for looking at this PR. Currently only the VectorAPI can generate the `MinVNode/MaxVNode` for integer types. I found this issue when I tried to use the related API to do some other investigation work. And actually there is the jtreg tests (eg: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L2403) for it. I think the reason that we didn't find the issue is the loop count `INVOC_COUNT` , which is too small that didn't make the method hot enough to be compiled with C2. Currently the value is set to 100: > static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); > When I use a larger loop count like 10000, the test can crash with `bad AD file`. So this issue can be reproduced by adding `-Djdk.incubator.vector.test.loop-iterations=10000` when running the tests. Hi @XiaohongGong . Thanks for clarifying that. Your code changes look ok to me but it would be good if we could also change the default setting for this loop count to ensure that this case gets tested when SVE hw is present. iterations=100 appears to be defaulted in file config.sh in the same dir as the test program. Do you know if using iterations=100 is actually triggering C2 compilation for any case (SVE or other, including on Intel)? If not then we really need to increase the default count to a higher value for all cases. If this is just an SVE-specific thing then we could maybe add some special case processing to the config script to detect an arch where SVE is present and set a higher value. Perhaps @iwanowww might be able to comment on what would be a suitable setting for this counter. ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From thartmann at openjdk.java.net Fri Nov 20 12:05:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 12:05:05 GMT Subject: RFR: 8256655: rework long counted loop handling [v2] In-Reply-To: References: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> Message-ID: <9m4nY3umpkpmP8bsWAnCWs6GJmkH3QLo0T6oVURrTCg=.2b0ec719-d311-426d-8130-f866cc078fdd@github.com> On Fri, 20 Nov 2020 11:57:13 GMT, Roland Westrelin wrote: >> This fails to build on Windows: >> ` >> [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier >> [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ')' before identifier 'iters_limit' >> [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ';' before identifier 'iters_limit' >> [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier >> [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2059: syntax error: ')'` > >> This fails to build on Windows: >> ` [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ')' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2146: syntax error: missing ';' before identifier 'iters_limit' [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2065: 'ulong': undeclared identifier [2020-11-20T08:11:03,704Z] ./open/src/hotspot/share/opto/loopnode.cpp(821): error C2059: syntax error: ')'` > > Build issues should be fixed now. Did some quick sanity testing and `compiler/regalloc/TestC1OverlappingRegisterHint` fails with `assert(init_n->get_int() + cl->stride_con() >= cl->limit()->get_int()) failed: should be one iteration`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From thartmann at openjdk.java.net Fri Nov 20 12:08:07 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Nov 2020 12:08:07 GMT Subject: RFR: 8033441: print line numbers with -XX:+PrintOptoAssembly [v3] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 21:50:20 GMT, Jason Tatton wrote: >> Hi all, >> >> Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. >> >> A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. >> >> Here is an example output:- >> before: >> # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 >> >> after: >> # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 >> >> Testing: >> I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. > > Jason Tatton has updated the pull request incrementally with two additional commits since the last revision: > > - fix whitespace error > - adjusted test to check for line number information only if c2 optimizer output is present on console output Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1272 From rcastanedalo at openjdk.java.net Fri Nov 20 12:11:03 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 20 Nov 2020 12:11:03 GMT Subject: Integrated: 6232281: -XX:-UseLoopSafepoints causes assert(v_false, "Parse::remove_useless_nodes missed this node") In-Reply-To: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> References: <-4RDn-dVEQLmW_k5K-wg3mmY7MTW80LRXKIAyxAqQtQ=.6cc6eb9c-7aaf-4b74-bda5-2a4fa29eb757@github.com> Message-ID: <8ubH1nYXfwH_5mZ5wtm55tfsHiyZcAQBmB-fwXooayY=.6276621b-9845-4edf-9346-200fc260a1f6@github.com> On Thu, 19 Nov 2020 09:05:06 GMT, Roberto Casta?eda Lozano wrote: > Check for nodes missed by `remove_useless_nodes()` only if PhaseRemoveUseless has actually been run. This makes it possible to use `-XX:-UseLoopSafepoints` without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. This pull request has now been integrated. Changeset: eb35ade9 Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/eb35ade9 Stats: 44 lines in 2 files changed: 43 ins; 0 del; 1 mod 6232281: -XX:-UseLoopSafepoints causes assert(v_false,"Parse::remove_useless_nodes missed this node") Check for nodes missed by remove_useless_nodes() only if PhaseRemoveUseless has actually been run. This makes it possible to use -XX:-UseLoopSafepoints without crashing trivially, although implicit assumptions in other parts of C2 about the existence of loop safepoints might lead to more subtle failures for more complex methods. Reviewed-by: neliasso, thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1311 From shade at openjdk.java.net Fri Nov 20 12:31:29 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 12:31:29 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes Message-ID: JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. Supporting this directly in compilers would improve nanobenchmark fidelity. Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. C1 code is platform-independent, and it adds the new node which is then lowered to nothing. C2 code is more complicated. I tried to introduce new node and hook arguments there, but failed. There seems to be no way to model the effects we are after: consume the value, but have no observable side effects. Roland suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. This drags the blackhole through C2 as if it has call-like side effects, and then emits nothing. On the downside, it requires fiddling with arch-specific code in every .ad. ------------- Commit messages: - 8252505: C1/C2 compiler support for blackholes Changes: https://git.openjdk.java.net/jdk/pull/1203/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252505 Stats: 1428 lines in 38 files changed: 1425 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From redestad at openjdk.java.net Fri Nov 20 12:35:02 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 20 Nov 2020 12:35:02 GMT Subject: RFR: 8256508: Improve CompileCommand flag In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:41:23 GMT, Nils Eliasson wrote: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. A good usability improvement! I've messed up some minor details in these commands in one way or another more times than I've gotten things right on the first try and been left wondering why my commands have no effect. This should help avoid a significant portion of easy mistakes. I think the error in the old compile command is that int needs to be intx (and of course there's no java/util/String)? ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1276 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Nov 20 12:54:05 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 20 Nov 2020 12:54:05 GMT Subject: Integrated: 8033441: print line numbers with -XX:+PrintOptoAssembly In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:31:25 GMT, Jason Tatton wrote: > Hi all, > > Here is a patch which adds line number information to the `-XX:+PrintOptoAssembly` cmd line option output. > > A new unit test is provided for this functionality. Note that the build must have debugging enabled for the test to be relevant and enabled. > > Here is an example output:- > before: > # Tryme::foo @ bci:3 L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff340089c0 > > after: > # Tryme::foo @ bci:3 (line 12) L[0]=_ STK[0]=#NULL STK[1]=#Ptr0x0000ffff1008fd80 > > Testing: > I have run tier 1/2 on linux on x86 and aarch64. With a `--enable-debug` build. This pull request has now been integrated. Changeset: b99fd4c7 Author: jasontatton-aws Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/b99fd4c7 Stats: 93 lines in 3 files changed: 88 ins; 3 del; 2 mod 8033441: print line numbers with -XX:+PrintOptoAssembly Reviewed-by: jiefu, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1272 From redestad at openjdk.java.net Fri Nov 20 13:03:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 20 Nov 2020 13:03:01 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 13:23:44 GMT, Aleksey Shipilev wrote: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. I tried to introduce new node and hook arguments there, but failed. There seems to be no way to model the effects we are after: consume the value, but have no observable side effects. Roland suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. This drags the blackhole through C2 as if it has call-like side effects, and then emits nothing. On the downside, it requires fiddling with arch-specific code in every .ad. Looks like a reasonable enhancement. Should the `BlackholeCommand` be a new option instead of a new top-level command? #1276 is about to improve the structure and usability of `CompileCommand=option` a lot so I suspect it'll be about as straightforward implementation-wise and not that much worse to use (`-XX:CompileCommand=Blackhole,,true`). When possible I think predicates such as `supports_blackhole` should be modelled as `static const bool` fields. Most compilers can't inline methods defined in `Matcher`, even from code in `matcher.cpp`. See `Matcher::misaligned_doubles_ok` et.c. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From roland at openjdk.java.net Fri Nov 20 13:32:25 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 13:32:25 GMT Subject: RFR: 8256655: rework long counted loop handling [v3] In-Reply-To: References: Message-ID: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - assert fix - Merge branch 'master' into JDK-8256655 - build fixes - fix trailing whitespace - fix comment - long counted loop refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1316/files - new: https://git.openjdk.java.net/jdk/pull/1316/files/dd6943b6..be681d51 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=01-02 Stats: 99224 lines in 359 files changed: 45138 ins; 42982 del; 11104 mod Patch: https://git.openjdk.java.net/jdk/pull/1316.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1316/head:pull/1316 PR: https://git.openjdk.java.net/jdk/pull/1316 From roland at openjdk.java.net Fri Nov 20 13:32:25 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 13:32:25 GMT Subject: RFR: 8256655: rework long counted loop handling [v3] In-Reply-To: <9m4nY3umpkpmP8bsWAnCWs6GJmkH3QLo0T6oVURrTCg=.2b0ec719-d311-426d-8130-f866cc078fdd@github.com> References: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> <9m4nY3umpkpmP8bsWAnCWs6GJmkH3QLo0T6oVURrTCg=.2b0ec719-d311-426d-8130-f866cc078fdd@github.com> Message-ID: On Fri, 20 Nov 2020 12:02:35 GMT, Tobias Hartmann wrote: > Did some quick sanity testing and `compiler/regalloc/TestC1OverlappingRegisterHint` fails with `assert(init_n->get_int() + cl->stride_con() >= cl->limit()->get_int()) failed: should be one iteration`. Thanks. That assert is broken AFAICT. It doesn't seem to account for a downward loop (which would seem to indicate that IdealLoopTree::do_one_iteration_loop() never triggered). I pushed a fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From roland at openjdk.java.net Fri Nov 20 13:35:24 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 13:35:24 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v2] In-Reply-To: References: Message-ID: > This was reported by Paul with the vector API. There are 2 issues: > > - CastII nodes (added by Objects.checkIndex()) gets in the way of the > pattern matching performed by range check elimination > > - By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) > some CastII can be eliminated which improves address computation code. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - build fix - Merge branch 'master' into JDK-8256730 - step over CastII in range checks & push CastII thru add ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1342/files - new: https://git.openjdk.java.net/jdk/pull/1342/files/e70ea12b..76e6348d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1342&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1342&range=00-01 Stats: 71 lines in 9 files changed: 51 ins; 7 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/1342.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1342/head:pull/1342 PR: https://git.openjdk.java.net/jdk/pull/1342 From redestad at openjdk.java.net Fri Nov 20 13:53:13 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 20 Nov 2020 13:53:13 GMT Subject: RFR: 8256738: Compiler interface clean-up Message-ID: This patch removes dead code in the ci area. There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method ------------- Commit messages: - instanceOop include needed for assert - Compiler interface clean-up Changes: https://git.openjdk.java.net/jdk/pull/1345/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1345&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256738 Stats: 216 lines in 21 files changed: 0 ins; 211 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1345.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1345/head:pull/1345 PR: https://git.openjdk.java.net/jdk/pull/1345 From burban at openjdk.java.net Fri Nov 20 14:04:09 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 20 Nov 2020 14:04:09 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 10:52:53 GMT, Aleksey Shipilev wrote: >> Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > > Looks fine. Not sure how sensible are negative shifts, but new code follows the current semantics exactly. Thank you Aleksey and Magnus. ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From aph at openjdk.java.net Fri Nov 20 14:29:06 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 20 Nov 2020 14:29:06 GMT Subject: RFR: 8256633: Fix product build on Windows+Arm64 [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:04:15 GMT, Bernhard Urban-Forster wrote: >> Fix this warning: >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): error C2220: the following warning is treated as an error >> C:\work\openjdk-jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(508): warning C4390: ';': empty controlled statement found; is this the intent? >> >> Thanks to @magicus to bring that to my attention. > > Bernhard Urban-Forster has updated the pull request incrementally with one additional commit since the last revision: > > review feedback src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 510: > 508: assert(_ext.shift() == (int)size, "bad shift"); > 509: } > 510: i->f(_ext.shift() > 0, 12); IMO, if this is to be fixed it should be fixed properly, viz: --- a/src/hotspot/share/utilities/debug.hpp +++ b/src/hotspot/share/utilities/debug.hpp @@ -45,7 +45,7 @@ bool handle_assert_poison_fault(const void* ucVoid, const void* faulting_address // assertions #ifndef ASSERT -#define vmassert(p, ...) +#define vmassert(p, ...) do { } while (0) #else ------------- PR: https://git.openjdk.java.net/jdk/pull/1312 From neliasso at openjdk.java.net Fri Nov 20 14:51:05 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 14:51:05 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 09:42:03 GMT, Roland Westrelin wrote: > By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) some CastII can be eliminated which improves address computation code. Wasn't there a recent bugfix that did the opposite? I think it was in a different context but I will try to find out. ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From zgu at openjdk.java.net Fri Nov 20 14:51:13 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 20 Nov 2020 14:51:13 GMT Subject: RFR: 8249144: Potential memory leak in TypedMethodOptionMatcher Message-ID: TypedMethodOptionMatcher owns ccstr value (vs. os::strdup_check_oom()), but never frees it in destructor. It does not appear to a real issue so far, because TypedMethodOptionMatcher seems immortal. Given it releases _option string, it should also release ccstr value. This patch has been reviewed https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-July/042356.html, but I lost track of it during repo migration. ------------- Commit messages: - 8249144: Potential memory leak in TypedMethodOptionMatcher Changes: https://git.openjdk.java.net/jdk/pull/1353/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1353&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8249144 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1353.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1353/head:pull/1353 PR: https://git.openjdk.java.net/jdk/pull/1353 From iignatyev at openjdk.java.net Fri Nov 20 15:05:17 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 20 Nov 2020 15:05:17 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW [v2] In-Reply-To: <-jKb9GbQqIbHcSJCs5LXLYUSFVs1DZWa818tNANHFBw=.9a8711e1-aa5c-4227-92a8-b34ada4e6ff7@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> <-jKb9GbQqIbHcSJCs5LXLYUSFVs1DZWa818tNANHFBw=.9a8711e1-aa5c-4227-92a8-b34ada4e6ff7@github.com> Message-ID: On Fri, 20 Nov 2020 08:26:52 GMT, Roberto Casta?eda Lozano wrote: >> Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: >> >> updated copyright year > > Thanks for doing this Igor, not a reviewer but looks good to me! thanks for the reviews, folks. I've updated the copyright year before `/integrate`-ing ------------- PR: https://git.openjdk.java.net/jdk/pull/1332 From iignatyev at openjdk.java.net Fri Nov 20 15:05:16 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 20 Nov 2020 15:05:16 GMT Subject: RFR: 8256569: Add C2 compiler stress flags to CTW [v2] In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: updated copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1332/files - new: https://git.openjdk.java.net/jdk/pull/1332/files/9201384b..f470020f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1332&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1332&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1332.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1332/head:pull/1332 PR: https://git.openjdk.java.net/jdk/pull/1332 From iignatyev at openjdk.java.net Fri Nov 20 15:05:17 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 20 Nov 2020 15:05:17 GMT Subject: Integrated: 8256569: Add C2 compiler stress flags to CTW In-Reply-To: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> References: <7ed2QocmAIZMdttM5UXKqxpwUJ6K1VVZpydgm3Byc8E=.e7e66a25-4ad0-4a4c-a788-641473bbb15a@github.com> Message-ID: <3zk7Wpem63QE1MRdoR014LUdkmyUOy8_q5oUlAsTJlY=.b23e735a-49ad-4796-8abe-8d810a96e4ff@github.com> On Fri, 20 Nov 2020 00:04:31 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small patch which adds `-XX:+StressLCM`, `-XX:+StressGCM`, and `-XX:+StressIGVN` flags to the CTW test library? these flags have been found instrumental in increasing the probability to find C2 bugs. The patch also specifies `-XX:StressSeed` flag to keep CTW testing deterministic and reproducible. > > testing: > * [x] `test/hotspot/jtreg/applications/ctw/modules/java_base.java` on `macos-x64` > * [x] `test/hotspot/jtreg/testlibrary_tests/ctw/` on `macos-x64` > > Cheers, > -- Igor This pull request has now been integrated. Changeset: ff00c591 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/ff00c591 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod 8256569: Add C2 compiler stress flags to CTW Reviewed-by: kvn, shade, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1332 From neliasso at openjdk.java.net Fri Nov 20 15:08:07 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 15:08:07 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 14:48:29 GMT, Nils Eliasson wrote: >> By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) some CastII can be eliminated which improves address computation code. > Wasn't there a recent bugfix that did the opposite? I think it was in a different context but I will try to find out. False alarm. It was a completely different problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From neliasso at openjdk.java.net Fri Nov 20 15:18:06 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 20 Nov 2020 15:18:06 GMT Subject: RFR: 8256738: Compiler interface clean-up In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:24:51 GMT, Claes Redestad wrote: > This patch removes dead code in the ci area. > > There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method Nice clean up! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1345 From rcastanedalo at openjdk.java.net Fri Nov 20 15:20:09 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 20 Nov 2020 15:20:09 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 13:35:24 GMT, Roland Westrelin wrote: >> This was reported by Paul with the vector API. There are 2 issues: >> >> - CastII nodes (added by Objects.checkIndex()) gets in the way of the >> pattern matching performed by range check elimination >> >> - By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) >> some CastII can be eliminated which improves address computation code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - build fix > - Merge branch 'master' into JDK-8256730 > - step over CastII in range checks & push CastII thru add src/hotspot/share/opto/castnode.cpp line 269: > 267: _carry_dependency, _range_check_dependency); > 268: cx->set_req(0, in(0)); > 269: cx = phase->transform(cx); Hi Roland, have you tested that these calls to `transform()` do not create an explosion of recursive `Ideal()` calls in the face of long AddI chains (see https://github.com/openjdk/jdk/pull/727). ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From kvn at openjdk.java.net Fri Nov 20 15:28:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 15:28:05 GMT Subject: RFR: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 18:16:06 GMT, Vladimir Kozlov wrote: >> Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. >> >> Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. >> >> (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) >> >> Testing (with other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Changes requested by kvn (Reviewer). After thinking more on this a separate method could complicate changes. At least add a comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From roland at openjdk.java.net Fri Nov 20 15:29:05 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 20 Nov 2020 15:29:05 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 15:17:23 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - build fix >> - Merge branch 'master' into JDK-8256730 >> - step over CastII in range checks & push CastII thru add > > src/hotspot/share/opto/castnode.cpp line 269: > >> 267: _carry_dependency, _range_check_dependency); >> 268: cx->set_req(0, in(0)); >> 269: cx = phase->transform(cx); > > Hi Roland, have you tested that these calls to `transform()` do not create an explosion of recursive `Ideal()` calls in the face of long AddI chains (see https://github.com/openjdk/jdk/pull/727). I wondered about that but had no issue during testing. Let me see if I can change your test case to have recursive Ideal() calls. ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From vlivanov at openjdk.java.net Fri Nov 20 16:29:21 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 16:29:21 GMT Subject: RFR: 8256073: Improve vector rematerialization support [v2] In-Reply-To: References: Message-ID: <-HXMY4vlwmYGptC55Q-beOemiNZP02AqHvSotP0nokc=.4348fc09-e0ec-4eba-a316-911afc93142b@github.com> > Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. > > Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. > > (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) > > Testing (with other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Address review comments. - Merge branch 'master' into 8256073.rematerialization - Improve vector rematerialization support ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1136/files - new: https://git.openjdk.java.net/jdk/pull/1136/files/df58c86c..0f751960 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1136&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1136&range=00-01 Stats: 31298 lines in 688 files changed: 18523 ins; 8201 del; 4574 mod Patch: https://git.openjdk.java.net/jdk/pull/1136.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1136/head:pull/1136 PR: https://git.openjdk.java.net/jdk/pull/1136 From vlivanov at openjdk.java.net Fri Nov 20 16:29:22 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 16:29:22 GMT Subject: RFR: 8256073: Improve vector rematerialization support [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 15:24:50 GMT, Vladimir Kozlov wrote: > After thinking more on this a separate method could complicate changes. At least add a comment. Added. Frankly speaking, I was on a fence when touching that code, but found it cleaner to have masks and vectors initialized in the same place. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From vlivanov at openjdk.java.net Fri Nov 20 16:29:24 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 16:29:24 GMT Subject: RFR: 8256073: Improve vector rematerialization support [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 10:52:06 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Address review comments. >> - Merge branch 'master' into 8256073.rematerialization >> - Improve vector rematerialization support > > src/hotspot/share/prims/vectorSupport.cpp line 96: > >> 94: case T_DOUBLE: arr->bool_at_put(index, (*(jlong*)addr) != 0); break; >> 95: >> 96: default: assert(false, "unsupported: %s", type2name(elem_bt)); > > Why did you replace `fatal` by `assert`? Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From psandoz at openjdk.java.net Fri Nov 20 16:39:06 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 20 Nov 2020 16:39:06 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 15:05:35 GMT, Nils Eliasson wrote: >>> By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) >> some CastII can be eliminated which improves address computation code. >> >> Wasn't there a recent bugfix that did the opposite? I think it was in a different context but I will try to find out. > >>> By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) > some CastII can be eliminated which improves address computation code. > >> Wasn't there a recent bugfix that did the opposite? I think it was in a different context but I will try to find out. > > False alarm. It was a completely different problem. Code generation on x86 is good. If also improves other cases, for example https://github.com/richardstartin/vectorbenchmarks/blob/master/src/main/java/com/openkappa/panama/vectorbenchmarks/BitmapLogicals.java#L69 ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From vlivanov at openjdk.java.net Fri Nov 20 17:16:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 17:16:06 GMT Subject: RFR: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 01:17:58 GMT, Vladimir Kozlov wrote: >> Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). >> >> The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. >> >> As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. >> >> Testing (with some other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Okay. Thanks for the reviews, Claes and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From vlivanov at openjdk.java.net Fri Nov 20 17:16:28 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 17:16:28 GMT Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken [v2] In-Reply-To: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: > Vector box allocation is a multi-step process: it involves 2 allocations (typed vector instance + primitive array) and 2 initializing stores (vector value store into primitive array and field initialization in typed vector instance). If deoptimization happens at any of the aforementioned allocation points the result is broken: either wrong instance is put on stack (primitive array instead of typed vector) or vector initializing store is missing. > > There are 2 ways to fix the problem: > - piggy-back on rematerialization; > - reexecute the bytecode which allocates the instance. > > I chose the latter option because it's simpler to implement. (Rematerialization requires some adjustment of JVM state associated with each allocation to record vector type and value.) > > The downside is there shouldn't be any side effects present. It's not a problem right now, because boxing happens only at vector intrinsics use sites and the only intrinsic which has any side effects is vector store operation (it doesn't produce vectors, hence, no boxing needed). > > The actual fix is small: adding `PreserveReexecuteState` in `LibraryCallKit::box_vector` is enough for the problem to go away. The rest is cleanups/refactorings. > > Testing: > - [x] jdk/incubator/vector tests w/ -XX:+DeoptimizeALot & -XX:UseAVX={3,2,1,0} > - [ ] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8255367.vector_box - Trivial cleanup - Fix deoptimization during vector boxing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1126/files - new: https://git.openjdk.java.net/jdk/pull/1126/files/f13f7285..46e81f73 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1126&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1126&range=00-01 Stats: 31289 lines in 687 files changed: 18516 ins; 8201 del; 4572 mod Patch: https://git.openjdk.java.net/jdk/pull/1126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1126/head:pull/1126 PR: https://git.openjdk.java.net/jdk/pull/1126 From vlivanov at openjdk.java.net Fri Nov 20 17:16:30 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 17:16:30 GMT Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken [v2] In-Reply-To: References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: <43T1hqKrAriG9SewTb78Nay-3cKzx71IT5mqzmYu_e4=.e378a1dd-d61f-4d08-9152-4935a017a3b8@github.com> On Tue, 17 Nov 2020 10:11:39 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into 8255367.vector_box >> - Trivial cleanup >> - Fix deoptimization during vector boxing > > Looks good to me. Thanks for the review, Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/1126 From vlivanov at openjdk.java.net Fri Nov 20 17:17:07 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 17:17:07 GMT Subject: RFR: 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 In-Reply-To: References: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> Message-ID: On Tue, 10 Nov 2020 00:14:37 GMT, Vladimir Kozlov wrote: >> `RegisterMap` handles registers on a per-slot basis: every register is split into slot-sized (32-bit) parts that are tracked independently. On x86 vector registers are up to 512-bit in size and occupy up to 16 slots. In order to save on constructing RegisterMaps, `RegisterMap`s are sparsely populated: location of the first slot is recorded and the rest is derived from it and `RegisterMap::pd_location()` is used to compute the address of a particular slot if it is missing in the map. >> >> As I mentioned in #1131, frame layout for vector registers is quite complicated: ZMM0-15 are split in 3 parts (2 x 128-bit + 1 x 256-bit) when saved while ZMM16-31 are stored in full. >> >> Proposed patch reifies those details in `RegisterMap::pd_location()` logic and it becomes possible to initialize just 3 slot locations for ZMM0-15 to be able to recover every slot location inside the register while for ZMM16-31 initializing a single (base) slot is enough. >> >> Testing (along with some other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Good Thanks for the review, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/1132 From vlivanov at openjdk.java.net Fri Nov 20 17:17:08 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 17:17:08 GMT Subject: RFR: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: <-F5bYNjrIidP-2Jeln2nLg-Vr9VJNKqDk2PUUNMCNGA=.20bfdfea-65cf-4f32-8b7c-c8a6621cf6e0@github.com> References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> <-F5bYNjrIidP-2Jeln2nLg-Vr9VJNKqDk2PUUNMCNGA=.20bfdfea-65cf-4f32-8b7c-c8a6621cf6e0@github.com> Message-ID: On Tue, 10 Nov 2020 00:21:20 GMT, Vladimir Kozlov wrote: >> `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). >> >> The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. >> >> Testing (with some other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Good Thanks for the reviews, Sandhya and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/1131 From dcubed at openjdk.java.net Fri Nov 20 17:36:13 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 20 Nov 2020 17:36:13 GMT Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 Message-ID: A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. ------------- Commit messages: - 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 Changes: https://git.openjdk.java.net/jdk/pull/1356/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1356&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256803 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1356.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1356/head:pull/1356 PR: https://git.openjdk.java.net/jdk/pull/1356 From mikael at openjdk.java.net Fri Nov 20 17:49:07 2020 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Fri, 20 Nov 2020 17:49:07 GMT Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 17:29:51 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1356 From dcubed at openjdk.java.net Fri Nov 20 18:02:08 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 20 Nov 2020 18:02:08 GMT Subject: Integrated: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: References: Message-ID: <0ZBaWJErw5EYMePhS4uuydx5CnFSNI5lEDtJFOM-oyc=.e8d6ec9e-8328-4d67-a8e2-fedca3f0e7eb@github.com> On Fri, 20 Nov 2020 17:29:51 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. This pull request has now been integrated. Changeset: 4dd71ae1 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/4dd71ae1 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 Reviewed-by: mikael ------------- PR: https://git.openjdk.java.net/jdk/pull/1356 From dcubed at openjdk.java.net Fri Nov 20 18:02:06 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 20 Nov 2020 18:02:06 GMT Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 17:45:55 GMT, Mikael Vidstedt wrote: >> A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. > > Marked as reviewed by mikael (Reviewer). @vidmik - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1356 From kvn at openjdk.java.net Fri Nov 20 18:39:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 18:39:04 GMT Subject: RFR: 8256738: Compiler interface clean-up In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:24:51 GMT, Claes Redestad wrote: > This patch removes dead code in the ci area. > > There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1345 From kvn at openjdk.java.net Fri Nov 20 18:43:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 18:43:07 GMT Subject: RFR: 8256073: Improve vector rematerialization support [v2] In-Reply-To: <-HXMY4vlwmYGptC55Q-beOemiNZP02AqHvSotP0nokc=.4348fc09-e0ec-4eba-a316-911afc93142b@github.com> References: <-HXMY4vlwmYGptC55Q-beOemiNZP02AqHvSotP0nokc=.4348fc09-e0ec-4eba-a316-911afc93142b@github.com> Message-ID: On Fri, 20 Nov 2020 16:29:21 GMT, Vladimir Ivanov wrote: >> Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. >> >> Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. >> >> (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) >> >> Testing (with other relevant patches): >> - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address review comments. > - Merge branch 'master' into 8256073.rematerialization > - Improve vector rematerialization support Marked as reviewed by kvn (Reviewer). src/hotspot/share/prims/vectorSupport.cpp line 94: > 92: // (In generated code, the conversion is performed by VectorStoreMask.) > 93: // > 94: // TODO: revisit when predicate registers are fully supported. Good and thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From kvn at openjdk.java.net Fri Nov 20 19:03:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 19:03:11 GMT Subject: RFR: 8256508: Improve CompileCommand flag In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:41:23 GMT, Nils Eliasson wrote: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1276 From kvn at openjdk.java.net Fri Nov 20 19:14:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 19:14:11 GMT Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken [v2] In-Reply-To: References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: On Fri, 20 Nov 2020 17:16:28 GMT, Vladimir Ivanov wrote: >> Vector box allocation is a multi-step process: it involves 2 allocations (typed vector instance + primitive array) and 2 initializing stores (vector value store into primitive array and field initialization in typed vector instance). If deoptimization happens at any of the aforementioned allocation points the result is broken: either wrong instance is put on stack (primitive array instead of typed vector) or vector initializing store is missing. >> >> There are 2 ways to fix the problem: >> - piggy-back on rematerialization; >> - reexecute the bytecode which allocates the instance. >> >> I chose the latter option because it's simpler to implement. (Rematerialization requires some adjustment of JVM state associated with each allocation to record vector type and value.) >> >> The downside is there shouldn't be any side effects present. It's not a problem right now, because boxing happens only at vector intrinsics use sites and the only intrinsic which has any side effects is vector store operation (it doesn't produce vectors, hence, no boxing needed). >> >> The actual fix is small: adding `PreserveReexecuteState` in `LibraryCallKit::box_vector` is enough for the problem to go away. The rest is cleanups/refactorings. >> >> Testing: >> - [x] jdk/incubator/vector tests w/ -XX:+DeoptimizeALot & -XX:UseAVX={3,2,1,0} >> - [ ] hs-precheckin-comp, hs-tier1, hs-tier2 >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8255367.vector_box > - Trivial cleanup > - Fix deoptimization during vector boxing Good. The only concern with such approach is that array could be allocated twice if deoptimization happens during Vector object allocation (after compiled code allocated array). We saw some JVMTI issues with Graal which too does reallocation in some cases during deoptimization. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1126 From vlivanov at openjdk.java.net Fri Nov 20 20:04:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 20:04:06 GMT Subject: RFR: 8256073: Improve vector rematerialization support [v2] In-Reply-To: References: <-HXMY4vlwmYGptC55Q-beOemiNZP02AqHvSotP0nokc=.4348fc09-e0ec-4eba-a316-911afc93142b@github.com> Message-ID: <6c95SWJXmJ-gkfQPNdLcSgj4lstn6rc7bh1qyPs8_Vo=.a7c888c1-330f-4804-b726-39fb73f769c7@github.com> On Fri, 20 Nov 2020 18:40:28 GMT, Vladimir Kozlov wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Address review comments. >> - Merge branch 'master' into 8256073.rematerialization >> - Improve vector rematerialization support > > Marked as reviewed by kvn (Reviewer). Thanks for the reviews, Tobias and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From vlivanov at openjdk.java.net Fri Nov 20 21:18:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 21:18:06 GMT Subject: Integrated: 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 In-Reply-To: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> References: <24Ohbh-GTkS0HUALP3xJi3RPILa63LFMPV_N6IJ9s8o=.34f1b8f0-08d2-4a39-bdbc-2b06ba6df1e0@github.com> Message-ID: On Mon, 9 Nov 2020 16:54:56 GMT, Vladimir Ivanov wrote: > `RegisterMap` handles registers on a per-slot basis: every register is split into slot-sized (32-bit) parts that are tracked independently. On x86 vector registers are up to 512-bit in size and occupy up to 16 slots. In order to save on constructing RegisterMaps, `RegisterMap`s are sparsely populated: location of the first slot is recorded and the rest is derived from it and `RegisterMap::pd_location()` is used to compute the address of a particular slot if it is missing in the map. > > As I mentioned in #1131, frame layout for vector registers is quite complicated: ZMM0-15 are split in 3 parts (2 x 128-bit + 1 x 256-bit) when saved while ZMM16-31 are stored in full. > > Proposed patch reifies those details in `RegisterMap::pd_location()` logic and it becomes possible to initialize just 3 slot locations for ZMM0-15 to be able to recover every slot location inside the register while for ZMM16-31 initializing a single (base) slot is enough. > > Testing (along with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: e6fa85b4 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/e6fa85b4 Stats: 35 lines in 1 file changed: 19 ins; 7 del; 9 mod 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1132 From vlivanov at openjdk.java.net Fri Nov 20 21:19:05 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 21:19:05 GMT Subject: Integrated: 8256056: Deoptimization stub doesn't save vector registers on x86 In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:59:30 GMT, Vladimir Ivanov wrote: > Deoptimization stub doesn't save vector registers on x86 and it may break vector rematerialization when a vector value (produced through Vector API) ends up in a register (and not spilled on stack). > > The fix unconditionally saves all wide vector registers when going through deopt stub. If there are any performance concerns with that, it's possible to introduce a dedicated variant and choose between 2 versions when patching nmethods (like what happens with safepoint stubs). But deoptimization is already quite expensive to save much on that, so I decided to keep things as is. > > As a cleanup, I made `save_vectors` parameter explicit to make the intentions clearer. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: 503590f6 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/503590f6 Stats: 16 lines in 1 file changed: 5 ins; 1 del; 10 mod 8256056: Deoptimization stub doesn't save vector registers on x86 Reviewed-by: redestad, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1134 From vlivanov at openjdk.java.net Fri Nov 20 21:19:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 21:19:06 GMT Subject: Integrated: 8255367: C2: Deoptimization during vector box construction is broken In-Reply-To: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: On Mon, 9 Nov 2020 13:35:35 GMT, Vladimir Ivanov wrote: > Vector box allocation is a multi-step process: it involves 2 allocations (typed vector instance + primitive array) and 2 initializing stores (vector value store into primitive array and field initialization in typed vector instance). If deoptimization happens at any of the aforementioned allocation points the result is broken: either wrong instance is put on stack (primitive array instead of typed vector) or vector initializing store is missing. > > There are 2 ways to fix the problem: > - piggy-back on rematerialization; > - reexecute the bytecode which allocates the instance. > > I chose the latter option because it's simpler to implement. (Rematerialization requires some adjustment of JVM state associated with each allocation to record vector type and value.) > > The downside is there shouldn't be any side effects present. It's not a problem right now, because boxing happens only at vector intrinsics use sites and the only intrinsic which has any side effects is vector store operation (it doesn't produce vectors, hence, no boxing needed). > > The actual fix is small: adding `PreserveReexecuteState` in `LibraryCallKit::box_vector` is enough for the problem to go away. The rest is cleanups/refactorings. > > Testing: > - [x] jdk/incubator/vector tests w/ -XX:+DeoptimizeALot & -XX:UseAVX={3,2,1,0} > - [ ] hs-precheckin-comp, hs-tier1, hs-tier2 > > Thanks! This pull request has now been integrated. Changeset: 41c05876 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/41c05876 Stats: 171 lines in 4 files changed: 79 ins; 84 del; 8 mod 8255367: C2: Deoptimization during vector box construction is broken Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1126 From vlivanov at openjdk.java.net Fri Nov 20 21:19:07 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 21:19:07 GMT Subject: Integrated: 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers In-Reply-To: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> References: <3mXn6Hal7wiro0H9-_iK5qI_FCnQlTw0k051aqouKZU=.f2ec8278-df12-4157-ab5e-a18b716e56df@github.com> Message-ID: On Mon, 9 Nov 2020 16:44:23 GMT, Vladimir Ivanov wrote: > `YMM0-15` registers are handled specially when CPU registers are saved. They are split in 2 parts (128-bit each) and put in different parts of the frame (see `RegisterSaver::layout` for details). AVX512 adds 16 more vector registers (ZMM16-31) and those are saved full-sized in a separate region. But `RegisterSaver::save_live_registers()` doesn't do anything special for `ZMM0-15` and their upper halves are lost (though there's space reserved for them in the frame). > > The fix adds missing logic which saves upper halves (256-bit in size) of ZMM0-15 registers. Thus every ZMM0-15 register ends up split into 3 parts which are stored independently in the frame. > > Testing (with some other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX=3 on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: f79e9d45 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/f79e9d45 Stats: 23 lines in 1 file changed: 14 ins; 0 del; 9 mod 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1131 From vlivanov at openjdk.java.net Fri Nov 20 21:21:12 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 21:21:12 GMT Subject: Integrated: 8256073: Improve vector rematerialization support In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:46:00 GMT, Vladimir Ivanov wrote: > Having #1131, #1132, and #1134 in place, the only missing piece left to have vector rematerialization fully working is support of non-contiguous vector values in vector rematerialization logic. This patch covers that. > > Current version makes the assumption that vector values are contiguously laid in memory. It's the case for on-stack locations, but for in-register values it's not the case (at least, on x86). Rewritten version doesn't make such assumption for in-register case anymore and processes every vector element independently. > > (Along the way, the refactoring fixes a bug when handling a corner case: the case when a vector instance is scalarized by EA and the primitive array field (VectorPayload.payload) has a constant value (NULL) is erroneously treated as requiring custom rematerialization and it hits an assert.) > > Testing (with other relevant patches): > - [x] jdk/incubator/vector w/ -XX:+DeoptimizeALot and -XX:UseAVX={3,2,1,0} on AVX512-capable hardware > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: 57025e65 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/57025e65 Stats: 142 lines in 2 files changed: 38 ins; 69 del; 35 mod 8256073: Improve vector rematerialization support Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1136 From vladimir.x.ivanov at oracle.com Fri Nov 20 21:50:51 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 21 Nov 2020 00:50:51 +0300 Subject: RFR: 8255367: C2: Deoptimization during vector box construction is broken [v2] In-Reply-To: References: <5zCMxKvPC0ssR26TpxkHQ6r1OXBVr9k7eE8W8nCpuSM=.a4f867e5-5619-48ee-ba57-07c77e1ee550@github.com> Message-ID: <80048edd-414e-e08b-a2a8-4ec2cbf5a862@oracle.com> Thanks for the review, Vladimir. > The only concern with such approach is that array could be allocated twice if deoptimization happens during Vector object allocation (after compiled code allocated array). We saw some JVMTI issues with Graal which too does reallocation in some cases during deoptimization. I assume itcaused some test failures where tests verify that exact number of allocations performed. Considering vectors are aggressively scalarized and reboxed, it's already hard to predict exact boxing behavior. In other words, the number of allocations already drastically differ between execution modes (C2 vs C1/interpreter). Best regards, Vladimir Ivanov From xliu at openjdk.java.net Sat Nov 21 09:30:15 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 21 Nov 2020 09:30:15 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 08:48:28 GMT, Nils Eliasson wrote: >> 8247732: validate user-input intrinsic_ids in ControlIntrinsic > > src/hotspot/share/compiler/compilerDirectives.hpp line 198: > >> 196: if (vmIntrinsics::_none == vmIntrinsics::find_id(*iter)) { >> 197: _bad = NEW_C_HEAP_ARRAY(char, strlen(*iter) + 1, mtCompiler); >> 198: strncpy(_bad, *iter, strlen(*iter) + 1); > > This doesn't compile. Using strlen as an argument to strncpy is disallowed. > >> "warning: 'char* __builtin_strncpy(char*, const char*, long unsigned int)' specified bound depends on the length of the source argument [-Wstringop-overflow=]" > > Do a min between strlen and the maximum allowed length. > > Fix this for both uses of the string length (row 197 and 198). Out of curiosity, what kind of gcc do you use? I am using gcc/g++-8.4.0 and I do append `--with-extra-cxxflags='-Wstringop-overflow -Wstringop-truncation'`, why can't I trigger this warning? I read this. https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8/ let me try to fix it by replacing strncpy with strcpy. ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From aph at redhat.com Sat Nov 21 09:40:29 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 21 Nov 2020 09:40:29 +0000 Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: References: Message-ID: <6c3210d7-b318-6473-0e85-d1ef7db38277@redhat.com> On 11/20/20 5:36 PM, Daniel D.Daugherty wrote: > A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. Please post the output of the test failure. What was the hardware? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From xliu at openjdk.java.net Sat Nov 21 10:03:40 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 21 Nov 2020 10:03:40 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v2] In-Reply-To: References: Message-ID: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has updated the pull request incrementally with 174 additional commits since the last revision: - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8254105: allow static nested declarations Reviewed-by: mcimadamore - 8255934: JConsole 14 and greater fails to connect to older JVM Reviewed-by: cjplummer, sspitsyn - 8256806: Shenandoah: optimize shenandoah/jni/TestPinnedGarbage.java test Reviewed-by: rkennke - 8256073: Improve vector rematerialization support Reviewed-by: thartmann, kvn - 8255367: C2: Deoptimization during vector box construction is broken Reviewed-by: thartmann, kvn - 8256061: RegisterSaver::save_live_registers() omits upper halves of ZMM0-15 registers Reviewed-by: kvn - 8256056: Deoptimization stub doesn't save vector registers on x86 Reviewed-by: redestad, kvn - 8256058: Improve vector register handling in RegisterMap::pd_location() on x86 Reviewed-by: kvn - 8256183: InputStream.skipNBytes is missing @since 12 Reviewed-by: dfuchs, lancea, bpb - ... and 164 more: https://git.openjdk.java.net/jdk/compare/22390787...eb0d4c18 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1179/files - new: https://git.openjdk.java.net/jdk/pull/1179/files/22390787..eb0d4c18 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=00-01 Stats: 128687 lines in 989 files changed: 62138 ins; 51176 del; 15373 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From xliu at openjdk.java.net Sat Nov 21 10:04:36 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 21 Nov 2020 10:04:36 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v3] In-Reply-To: References: Message-ID: <_xC9_bq4ZQJOywx2HbhJ3srrsneg4r-g4ZOc3Zb3KIw=.8b7afae9-537b-4274-8753-0c781650b047@github.com> > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8247732: validate user-input intrinsic_ids in ControlIntrinsic ------------- Changes: https://git.openjdk.java.net/jdk/pull/1179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=02 Stats: 545 lines in 31 files changed: 522 ins; 2 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From vladimir.kozlov at oracle.com Sat Nov 21 17:13:02 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 21 Nov 2020 09:13:02 -0800 Subject: Some questions on intrinsic for UTF8 to UTF16 decoding In-Reply-To: References: Message-ID: <59858a23-8ab0-ead7-4d15-757103e07d51@oracle.com> Hi Ludovic, On 11/20/20 3:01 AM, Ludovic Henry wrote: > Hi, > > I've started to implement an intrinsic to vectorize the decoding (and soon encoding) of UTF-8 to UTF-16. My current work in progress is at [1]. However, I'm running into limitations in my knowledge of Hotspot, and I am seeking your advice and know-how. > > The first thing I'm running into is how to pass parameters to the intrinsic by reference. AFAIK there is no way to do such a thing in Java code. My hope is then that when creating the call to the intrinsic in library_call.cpp, we can pass the address of the variable instead of the value. But I don't know if it's even possible, and if it is, how to do so. The simpliest solution is a new int[] array which holds locals var values which intrinsic can update and return. An other solution (and more complex code in library_call.cpp) is to pass `src' and 'dst' to intrinsic and read/update their fields inside: private CoderResult decodeArrayLoop(ByteBuffer src, CharBuffer dst) { decodeArrayVectorized(src, dst); // This method is optimized for ASCII input. byte[] sa = src.array(); int sp = src.arrayOffset() + src.position(); int sl = src.arrayOffset() + src.limit(); char[] da = dst.array(); int dp = dst.arrayOffset() + dst.position(); int dl = dst.arrayOffset() + dst.limit(); > > The second thing I'm running into is not so much a technical limitation, but a question that, I am sure, is going to be raised during the review. This vectorization depends on a lookup table, but this lookup table can grow quite big (32768 elements, each of a size of 64-72 bytes, so ~2MB). I understand that this is much bigger than anything currently existing for any of the intrinsics, so I'm currently trying to figure out how I can reduce drastically the size of this table (compaction, lazy building, etc.). But first, I would like to hear your ideas as it may be an issue that was already faced in the past and for which a better solution was found. Yes, 2Mb is too much. And the problem is not size but affect on startup time - it is calculated dynamically. An other issue with such intrinsic I see is that decodeArrayLoop() code has a lot of checks for malformed strings which intrinsic does not have. Most likely it will not pass JCK testing. Would be interesting to see performance if you vectorize only ASCII copy loop which seems most common case and you don't need table: // ASCII only loop while (dp < dlASCII && sa[sp] >= 0) da[dp++] = (char) sa[sp++]; I don't think C2 can auto-vectorize it because of sa[sp] >= 0 check. Intrinsic can return number of elements copied which can be used to update `sp` and `dp`. Regards, Vladimir > > Thank you, > > -- > Ludovic > > [1] https://github.com/openjdk/jdk/compare/master...luhenry:vectorUTF8 > From jbhateja at openjdk.java.net Sat Nov 21 18:59:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 18:59:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v15] In-Reply-To: References: Message-ID: <0i0tjPQs0FixJzm1sk3a6CqTHQodF-U55lM__ePTI2c=.d6070473-9aa4-41cf-8e35-f280f5d4836d@github.com> > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/edb74db3..b83808a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=13-14 Stats: 14 lines in 3 files changed: 3 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Sat Nov 21 19:31:13 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 19:31:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v16] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with three additional commits since the last revision: - Reverting configure file - Merge branch 'JDK-8252848' of http://github.com/jatin-bhateja/jdk into JDK-8252848 - Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/b83808a8..4a2a7897 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=14-15 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Sat Nov 21 19:40:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 19:40:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 00:06:31 GMT, Vladimir Kozlov wrote: > Forgot to say that failure was on Windows with only avx512f, avx512cd Thanks Vladimir, I have resolved your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Sun Nov 22 02:22:28 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 22 Nov 2020 02:22:28 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Sat, 21 Nov 2020 19:36:54 GMT, Jatin Bhateja wrote: >> Forgot to say that failure was on Windows with only avx512f, avx512cd. > >> Forgot to say that failure was on Windows with only avx512f, avx512cd > > Thanks Vladimir, I have resolved your review comments. Version 15 failed next tests on linux-x64 with -XX:+UseParallelGC -XX:+UseNUMA flags: vmTestbase/metaspace/stressHierarchy/stressHierarchy015/TestDescription.java vmTestbase/metaspace/stressHierarchy/stressHierarchy006/TestDescription.java vmTestbase/metaspace/stressHierarchy/stressHierarchy005/TestDescription.java # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/macroArrayCopy.cpp:861), pid=8205, tid=8216 # assert(ArrayCopyNode::may_modify(dest_t, (*ctrl)->in(0)->as_MemBar(), &_igvn, ac)) failed: dependency on arraycopy lost # # Problematic frame: # V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc # Host: Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz, 8 cores, 58G, Oracle Linux Server release 7.9 Current CompileTask: C2: 27392 5458 4 package_level34_num50.Dummy::composeString (10 bytes) Stack: [0x00007f4c5a024000,0x00007f4c5a125000], sp=0x00007f4c5a120420, free space=1009k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc V [libjvm.so+0x1350741] PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)+0x641 V [libjvm.so+0x1340d7b] PhaseMacroExpand::expand_macro_nodes()+0xfdb V [libjvm.so+0x9fe79b] Compile::Optimize()+0x177b V [libjvm.so+0xa00268] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x17e8 V [libjvm.so+0x8322ae] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1ce V [libjvm.so+0xa103f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 V [libjvm.so+0xa10f48] CompileBroker::compiler_thread_loop()+0x5a8 ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From redestad at openjdk.java.net Sun Nov 22 15:39:34 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 22 Nov 2020 15:39:34 GMT Subject: RFR: 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal Message-ID: By pre-sizing the Node_Lists created in PhaseChaitin::post_allocate_copy_removal we avoid all re-allocations. As lists are allocated to the size we need we might also reduce memory waste, which reduce memory used by these structures by up to 50% Throughput wise this is also a gain and saves about 10,000 instructions per C2 compilation in SimpleRepeatCompilation.trivialMath. By using bulk clearing, avoiding redundant clearing of newly created lists and adding a shallow copy routine to Node_List we save another ~9,500 instructions per compilation in the same benchmark. This corresponds with a small but statistically significant gain: Benchmark Mode Cnt Score Error Units SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 355.342 ? 2.069 ms/op # baseline SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 350.705 ? 1.577 ms/op # patch Testing: tier1-4 ------------- Commit messages: - Minor cleanup - Merge branch 'master' of https://github.com/openjdk/jdk into postaloc_presize - Bulk clear/copy, avoid unnecessary clearing - C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal Changes: https://git.openjdk.java.net/jdk/pull/1370/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1370&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256827 Stats: 51 lines in 3 files changed: 13 ins; 14 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/1370.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1370/head:pull/1370 PR: https://git.openjdk.java.net/jdk/pull/1370 From kvn at openjdk.java.net Sun Nov 22 17:09:23 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 22 Nov 2020 17:09:23 GMT Subject: RFR: 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal In-Reply-To: References: Message-ID: <_HR1_owKV55GHTb3yNKI73e1g-99Cy3_8J5lTDPaZj0=.980b9d8e-6680-4751-bc44-60b4a7733cc8@github.com> On Sun, 22 Nov 2020 01:05:28 GMT, Claes Redestad wrote: > By pre-sizing the Node_Lists created in PhaseChaitin::post_allocate_copy_removal we avoid all re-allocations. > > As lists are allocated to the size we need we might also reduce memory waste, which reduce memory used by these structures by up to 50% > > Throughput wise this is also a gain and saves about 10,000 instructions per C2 compilation in SimpleRepeatCompilation.trivialMath. By using bulk clearing, avoiding redundant clearing of newly created lists and adding a shallow copy routine to Node_List we save another ~9,500 instructions per compilation in the same benchmark. This corresponds with a small but statistically significant gain: > > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 355.342 ? 2.069 ms/op # baseline > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 350.705 ? 1.577 ms/op # patch > > Testing: tier1-4 Nice. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1370 From jbhateja at openjdk.java.net Sun Nov 22 21:04:56 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 22 Nov 2020 21:04:56 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v17] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing special handling for constant length, GVN will remove dead stub blocks in case constant length is less than partial inline size. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/4a2a7897..465c5f54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=15-16 Stats: 68 lines in 2 files changed: 1 ins; 39 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From xgong at openjdk.java.net Mon Nov 23 02:36:06 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 02:36:06 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: <_c9PhWNGhNcq0rqLT11A_KVXKv7iVzfBXvt9HG9pr24=.be9a9c5b-1380-4735-b3e7-7f8460cf8b20@github.com> Message-ID: On Fri, 20 Nov 2020 10:44:37 GMT, Andrew Haley wrote: >> Hi @theRealAph , thanks for the comment. `imm` here is an unsigned immediate and it has been masked with `0xff` before, which is the same with the implementation of `sf`. > > Yes, I'm saying don't do that. Thanks a lot for the clarify ! I got it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From xgong at openjdk.java.net Mon Nov 23 03:08:59 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 03:08:59 GMT Subject: RFR: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: <_c9PhWNGhNcq0rqLT11A_KVXKv7iVzfBXvt9HG9pr24=.be9a9c5b-1380-4735-b3e7-7f8460cf8b20@github.com> Message-ID: On Fri, 20 Nov 2020 10:44:02 GMT, Andrew Haley wrote: >> Hi @theRealAph , thanks for looking at this PR, and thanks for your comment here. Yes, I agree that the compilers we know like GCC/LLVM can make sure the behavior is defined on AArch64. And actually we didn't met any issues here. However, I'm not quite sure whether other compilers can guarantee it. This is just used to avoid the undefined behavior in future. So do you think we need to fix it here? I can abandon this patch if this is not valuable. Thanks! > > The compilers do guarantee it. Here's GCC, for example: > > 4.5 Integers > > GCC supports only two?s complement integer types, and all bit patterns are ordinary values. > > Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ?>>? acts on negative numbers by sign extension. Thanks a lot for the explanation. I will close this PR as it is no need to modify it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From xgong at openjdk.java.net Mon Nov 23 03:09:00 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 03:09:00 GMT Subject: Withdrawn: 8256436: AArch64: Fix undefined behavior for signed right shift in assembler In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 05:21:39 GMT, Xiaohong Gong wrote: > Right shift a signed negative value is implementation-defined in C++ (see [1]). It's better to avoid the signed right shift operations, > and use the unsigned right shift instead. > > [1] https://docs.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-160&viewFallbackFrom=vs-2019 > > Tested jtreg langtools:tier1, hotspot:hotspot_all_no_apps and jdk:jdk_core, and all tests pass without new failures. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1307 From github.com+10482586+erik1iu at openjdk.java.net Mon Nov 23 03:55:02 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Mon, 23 Nov 2020 03:55:02 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 In-Reply-To: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: On Tue, 17 Nov 2020 06:14:37 GMT, Eric Liu wrote: > This patch fixed some potential risks in assembler_aarch64.hpp. > > According to the C standard, shift operation is undefined if the shift > count greater than or equals to the length in bits of the promoted left > operand. > > In assembler_aarch64.hpp, there are some utility functions for easily > operating the encoded instructions. E.g. > > Instruction_aarch64::patch(address, int, int, uint64_t) > > All those functions use `(1U << nbits) - 1` to calculate mask which may > have some potential risks if `nbits` equals 32. That would be an > unexpected result if someone intends to deal with an entire instruction. > > To fix this issue, this patch simply uses `1ULL` to replace `1U`. Hi, could anyone please take a look at this trivial PR? Thanks very much! ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From xgong at openjdk.java.net Mon Nov 23 04:58:59 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 04:58:59 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> Message-ID: On Fri, 20 Nov 2020 10:19:40 GMT, Xiaohong Gong wrote: >>> We need to add them all to avoid the "bad AD file" crash. >> >> I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? >> >> Is there a before and after test case for the problem this fixes? > >> > We need to add them all to avoid the "bad AD file" crash. >> >> I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? >> >> Is there a before and after test case for the problem this fixes? > > Hi @adinn , thanks for looking at this PR. Currently only the VectorAPI can generate the `MinVNode/MaxVNode` for integer types. I found this issue when I tried to use the related API to do some other investigation work. And actually there is the jtreg tests (eg: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L2403) for it. I think the reason that we didn't find the issue is the loop count `INVOC_COUNT` , which is too small that didn't make the method hot enough to be compiled with C2. Currently the value is set to 100: > static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); > When I use a larger loop count like 10000, the test can crash with `bad AD file`. So this issue can be reproduced by adding `-Djdk.incubator.vector.test.loop-iterations=10000` when running the tests. > Hi @XiaohongGong . Thanks for clarifying that. Your code changes look ok to me but it would be good if we could also change the default setting for this loop count to ensure that this case gets tested when SVE hw is present. > > iterations=100 appears to be defaulted in file config.sh in the same dir as the test program. Do you know if using iterations=100 is actually triggering C2 compilation for any case (SVE or other, including on Intel)? If not then we really need to increase the default count to a higher value for all cases. > > If this is just an SVE-specific thing then we could maybe add some special case processing to the config script to detect an arch where SVE is present and set a higher value. Yeah, I agree that it's better to adjust the loop count for jtreg tests if necessary. Do you think it's better to change it with another patch since I think it's not a SVE special case (NEON and X86 have the same issue)? Changing the default loop count for all tests needs more time to run the tests and make the tests easier to time out. And I'v no idea about how to set the suitable value to balance the time and effectiveness. I think some cases can actually trigger the C2 compilation if using default iterations=100. However, it cannot for simple cases like `min/max`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From thartmann at openjdk.java.net Mon Nov 23 07:11:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 07:11:57 GMT Subject: Integrated: 8256719: C1 flags that should have expired are still present In-Reply-To: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> References: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> Message-ID: On Fri, 20 Nov 2020 08:18:38 GMT, Tobias Hartmann wrote: > [JDK-8235673](https://bugs.openjdk.java.net/browse/JDK-8235673) removed some flags from C1 and made them C2 only. They expired in JDK 16 and should simply be removed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 659aec80 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/659aec80 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod 8256719: C1 flags that should have expired are still present Reviewed-by: shade, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/1338 From thartmann at openjdk.java.net Mon Nov 23 07:11:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 07:11:55 GMT Subject: RFR: 8256719: C1 flags that should have expired are still present In-Reply-To: References: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> Message-ID: On Fri, 20 Nov 2020 08:25:59 GMT, Aleksey Shipilev wrote: >> [JDK-8235673](https://bugs.openjdk.java.net/browse/JDK-8235673) removed some flags from C1 and made them C2 only. They expired in JDK 16 and should simply be removed. >> >> Thanks, >> Tobias > > Looks fine. @shipilev, @neliasso thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1338 From thartmann at openjdk.java.net Mon Nov 23 07:25:54 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 07:25:54 GMT Subject: RFR: 8256738: Compiler interface clean-up In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:24:51 GMT, Claes Redestad wrote: > This patch removes dead code in the ci area. > > There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1345 From thartmann at openjdk.java.net Mon Nov 23 07:38:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 07:38:00 GMT Subject: RFR: 8249144: Potential memory leak in TypedMethodOptionMatcher In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 14:45:02 GMT, Zhengyu Gu wrote: > TypedMethodOptionMatcher owns ccstr value (vs. os::strdup_check_oom()), but never frees it in destructor. > > It does not appear to a real issue so far, because TypedMethodOptionMatcher seems immortal. Given it releases _option > string, it should also release ccstr value. > > This patch has been reviewed https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-July/042356.html, but I lost track of it during repo migration. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1353 From shade at openjdk.java.net Mon Nov 23 07:41:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 07:41:57 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> References: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> Message-ID: On Tue, 17 Nov 2020 00:42:20 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Indents and comments > > Why to run test code if you know the result will not match? Why not just skip testing? > I thought about using `@requires vm.cpu.features` checks but your check in main() is simpler and more clear. Which way you'd like me to move with this, @vnkozlov? :) I am fine with either way, leaning towards what we already have. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From xgong at openjdk.java.net Mon Nov 23 08:23:17 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 08:23:17 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max [v2] In-Reply-To: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Message-ID: > Currently the Arm SVE implementation for integer (byte,short,int,long) vector min/max is missing. This is needed for VectorAPI. We need to add them all to avoid the "bad AD file" crash. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Minor fine-tune the basic type check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1337/files - new: https://git.openjdk.java.net/jdk/pull/1337/files/fa91bc25..3eb99c2a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1337&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1337&range=00-01 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1337.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1337/head:pull/1337 PR: https://git.openjdk.java.net/jdk/pull/1337 From chagedorn at openjdk.java.net Mon Nov 23 08:39:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 23 Nov 2020 08:39:57 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance [v4] In-Reply-To: References: Message-ID: <_GAY4xJFmeO_jVlem9tLK8BOAfSf13YdZ11EHOhn3S4=.59fc93a0-9d66-45b6-b60d-b4adc0bddceb@github.com> On Fri, 20 Nov 2020 07:56:57 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Remove _pre_loop_head field >> - Update comments and invariant selection in offset_plus_k >> - Check dominance with pre loop head instead of tail >> - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance > > Thanks for making these changes. Looks good to me! @TobiHartmann Thanks for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From chagedorn at openjdk.java.net Mon Nov 23 08:39:58 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 23 Nov 2020 08:39:58 GMT Subject: Integrated: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 11:28:10 GMT, Christian Hagedorn wrote: > The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): > https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 > > We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). > > My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. > ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) This pull request has now been integrated. Changeset: e4a32bea Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/e4a32bea Stats: 102 lines in 2 files changed: 51 ins; 7 del; 44 mod 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/954 From adinn at openjdk.java.net Mon Nov 23 08:58:59 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 23 Nov 2020 08:58:59 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max [v2] In-Reply-To: References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Message-ID: <-NVqmmdNEW2xTUQ8IPOS9qYoqhPbNEoIrFdoCSBnoaQ=.d75a3708-221b-4078-bdf0-d359d5d8387e@github.com> On Mon, 23 Nov 2020 08:23:17 GMT, Xiaohong Gong wrote: >> Currently the Arm SVE implementation for integer (byte,short,int,long) vector min/max is missing. This is needed for VectorAPI. We need to add them all to avoid the "bad AD file" crash. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Minor fine-tune the basic type check Marked as reviewed by adinn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From adinn at openjdk.java.net Mon Nov 23 08:59:00 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 23 Nov 2020 08:59:00 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> <0f82WYJGWD1I3f_bhNVD6c_FuL5yzLkzt3XCbHHt2dw=.1baab1b6-7d92-4277-93c3-08bf93d55e3e@github.com> Message-ID: On Mon, 23 Nov 2020 04:56:32 GMT, Xiaohong Gong wrote: >>> > We need to add them all to avoid the "bad AD file" crash. >>> >>> I'm not clear how this came about. Is this fix needed as a response to another change that caused the crash? Or is the problem that the incomplete implementation was pushed without testing some cases properly? >>> >>> Is there a before and after test case for the problem this fixes? >> >> Hi @adinn , thanks for looking at this PR. Currently only the VectorAPI can generate the `MinVNode/MaxVNode` for integer types. I found this issue when I tried to use the related API to do some other investigation work. And actually there is the jtreg tests (eg: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/IntMaxVectorTests.java#L2403) for it. I think the reason that we didn't find the issue is the loop count `INVOC_COUNT` , which is too small that didn't make the method hot enough to be compiled with C2. Currently the value is set to 100: >> static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); >> When I use a larger loop count like 10000, the test can crash with `bad AD file`. So this issue can be reproduced by adding `-Djdk.incubator.vector.test.loop-iterations=10000` when running the tests. > >> Hi @XiaohongGong . Thanks for clarifying that. Your code changes look ok to me but it would be good if we could also change the default setting for this loop count to ensure that this case gets tested when SVE hw is present. >> >> iterations=100 appears to be defaulted in file config.sh in the same dir as the test program. Do you know if using iterations=100 is actually triggering C2 compilation for any case (SVE or other, including on Intel)? If not then we really need to increase the default count to a higher value for all cases. >> >> If this is just an SVE-specific thing then we could maybe add some special case processing to the config script to detect an arch where SVE is present and set a higher value. > > Yeah, I agree that it's better to adjust the loop count for jtreg tests if necessary. Do you think it's better to change it with another patch since I think it's not a SVE special case (NEON and X86 have the same issue)? Changing the default loop count for all tests needs more time to run the tests and make the tests easier to time out. And I'v no idea about how to set the suitable value to balance the time and effectiveness. I think some cases can actually trigger the C2 compilation if using default iterations=100. However, it cannot for simple cases like `min/max`. >Yeah, I agree that it's better to adjust the loop count for jtreg tests if necessary. Do you think it's better to change it with another patch since I think it's not a SVE special case (NEON and X86 have the same issue)? Changing the default loop count for all tests needs more time to run the tests and make the tests easier to time out. And I've no idea about how to set the suitable value to balance the time and effectiveness. I think some cases can actually trigger the C2 compilation if using default iterations=100. However, it cannot for simple cases like min/max. Yes, please raise a separate patch for the iteration count change. We can discuss the timing vs time-out issue there, including the option of splitting out simple tests that need higher counts. If you can do that then I'm happy with this patch as it is. ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From xgong at openjdk.java.net Mon Nov 23 09:05:57 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Mon, 23 Nov 2020 09:05:57 GMT Subject: RFR: 8256614: AArch64: Add SVE backend implementation for integer min/max [v2] In-Reply-To: <-NVqmmdNEW2xTUQ8IPOS9qYoqhPbNEoIrFdoCSBnoaQ=.d75a3708-221b-4078-bdf0-d359d5d8387e@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> <-NVqmmdNEW2xTUQ8IPOS9qYoqhPbNEoIrFdoCSBnoaQ=.d75a3708-221b-4078-bdf0-d359d5d8387e@github.com> Message-ID: On Mon, 23 Nov 2020 08:56:34 GMT, Andrew Dinn wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fine-tune the basic type check > > Marked as reviewed by adinn (Reviewer). > > Yeah, I agree that it's better to adjust the loop count for jtreg tests if necessary. Do you think it's better to change it with another patch since I think it's not a SVE special case (NEON and X86 have the same issue)? Changing the default loop count for all tests needs more time to run the tests and make the tests easier to time out. And I've no idea about how to set the suitable value to balance the time and effectiveness. I think some cases can actually trigger the C2 compilation if using default iterations=100. However, it cannot for simple cases like min/max. > > Yes, please raise a separate patch for the iteration count change. We can discuss the timing vs time-out issue there, including the option of splitting out simple tests that need higher counts. If you can do that then I'm happy with this patch as it is. Sure, I will do it later then. Thanks a lot! ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From thartmann at openjdk.java.net Mon Nov 23 09:08:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 09:08:57 GMT Subject: RFR: 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal In-Reply-To: References: Message-ID: <2MVq_ChfG8pKPeGOonggmhqO2vgRbVij9Uj2aRXUP0U=.52c2fb67-575c-46b8-8e17-16a03f09055b@github.com> On Sun, 22 Nov 2020 01:05:28 GMT, Claes Redestad wrote: > By pre-sizing the Node_Lists created in PhaseChaitin::post_allocate_copy_removal we avoid all re-allocations. > > As lists are allocated to the size we need we might also reduce memory waste, which reduce memory used by these structures by up to 50% > > Throughput wise this is also a gain and saves about 10,000 instructions per C2 compilation in SimpleRepeatCompilation.trivialMath. By using bulk clearing, avoiding redundant clearing of newly created lists and adding a shallow copy routine to Node_List we save another ~9,500 instructions per compilation in the same benchmark. This corresponds with a small but statistically significant gain: > > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 355.342 ? 2.069 ms/op # baseline > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 350.705 ? 1.577 ms/op # patch > > Testing: tier1-4 Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1370 From roland at openjdk.java.net Mon Nov 23 09:40:18 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 23 Nov 2020 09:40:18 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v3] In-Reply-To: References: Message-ID: > This was reported by Paul with the vector API. There are 2 issues: > > - CastII nodes (added by Objects.checkIndex()) gets in the way of the > pattern matching performed by range check elimination > > - By transforming (CastII (AddI x y)) into (AddI (CastII x) (CastII y)) > some CastII can be eliminated which improves address computation code. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: CastII pushed thru chain of AddIs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1342/files - new: https://git.openjdk.java.net/jdk/pull/1342/files/76e6348d..faf980ab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1342&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1342&range=01-02 Stats: 386 lines in 3 files changed: 214 ins; 170 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1342.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1342/head:pull/1342 PR: https://git.openjdk.java.net/jdk/pull/1342 From roland at openjdk.java.net Mon Nov 23 09:40:19 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 23 Nov 2020 09:40:19 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 15:26:11 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.cpp line 269: >> >>> 267: _carry_dependency, _range_check_dependency); >>> 268: cx->set_req(0, in(0)); >>> 269: cx = phase->transform(cx); >> >> Hi Roland, have you tested that these calls to `transform()` do not create an explosion of recursive `Ideal()` calls in the face of long AddI chains (see https://github.com/openjdk/jdk/pull/727). > > I wondered about that but had no issue during testing. Let me see if I can change your test case to have recursive Ideal() calls. I managed to reproduce a similar issue by tweaking your test cases. I just pushed the new test cases + a change that mirrors your fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From mdoerr at openjdk.java.net Mon Nov 23 09:48:57 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 23 Nov 2020 09:48:57 GMT Subject: RFR: 8256719: C1 flags that should have expired are still present In-Reply-To: References: <3nTaNaFby0VpF6WEE7pvNwm55g4XeScPjfHGo6wZyts=.3980be88-8ef0-4f9f-a9ab-36d125954db7@github.com> Message-ID: On Mon, 23 Nov 2020 07:09:03 GMT, Tobias Hartmann wrote: >> Looks fine. > > @shipilev, @neliasso thanks for the reviews! Thanks for removing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1338 From shade at openjdk.java.net Mon Nov 23 10:14:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 10:14:57 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 In-Reply-To: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: On Tue, 17 Nov 2020 06:14:37 GMT, Eric Liu wrote: > This patch fixed some potential risks in assembler_aarch64.hpp. > > According to the C standard, shift operation is undefined if the shift > count greater than or equals to the length in bits of the promoted left > operand. > > In assembler_aarch64.hpp, there are some utility functions for easily > operating the encoded instructions. E.g. > > Instruction_aarch64::patch(address, int, int, uint64_t) > > All those functions use `(1U << nbits) - 1` to calculate mask which may > have some potential risks if `nbits` equals 32. That would be an > unexpected result if someone intends to deal with an entire instruction. > > To fix this issue, this patch simply uses `1ULL` to replace `1U`. So there is a global `right_n_bits` macro that apparently does what we want? @theRealAph might want to look at this issue (probably in disgust). ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From redestad at openjdk.java.net Mon Nov 23 10:19:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:19:58 GMT Subject: RFR: 8256738: Compiler interface clean-up In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 15:15:18 GMT, Nils Eliasson wrote: >> This patch removes dead code in the ci area. >> >> There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method > > Nice clean up! @neliasso @vnkozlov @TobiHartmann - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1345 From redestad at openjdk.java.net Mon Nov 23 10:19:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:19:59 GMT Subject: Integrated: 8256738: Compiler interface clean-up In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:24:51 GMT, Claes Redestad wrote: > This patch removes dead code in the ci area. > > There's also a trivial micro-optimization to is_boxing_method/is_unboxing_method This pull request has now been integrated. Changeset: 65b77d59 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/65b77d59 Stats: 216 lines in 21 files changed: 0 ins; 211 del; 5 mod 8256738: Compiler interface clean-up Reviewed-by: neliasso, kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1345 From redestad at openjdk.java.net Mon Nov 23 10:21:56 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:21:56 GMT Subject: RFR: 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal In-Reply-To: <_HR1_owKV55GHTb3yNKI73e1g-99Cy3_8J5lTDPaZj0=.980b9d8e-6680-4751-bc44-60b4a7733cc8@github.com> References: <_HR1_owKV55GHTb3yNKI73e1g-99Cy3_8J5lTDPaZj0=.980b9d8e-6680-4751-bc44-60b4a7733cc8@github.com> Message-ID: On Sun, 22 Nov 2020 17:06:13 GMT, Vladimir Kozlov wrote: >> By pre-sizing the Node_Lists created in PhaseChaitin::post_allocate_copy_removal we avoid all re-allocations. >> >> As lists are allocated to the size we need we might also reduce memory waste, which reduce memory used by these structures by up to 50% >> >> Throughput wise this is also a gain and saves about 10,000 instructions per C2 compilation in SimpleRepeatCompilation.trivialMath. By using bulk clearing, avoiding redundant clearing of newly created lists and adding a shallow copy routine to Node_List we save another ~9,500 instructions per compilation in the same benchmark. This corresponds with a small but statistically significant gain: >> >> Benchmark Mode Cnt Score Error Units >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 355.342 ? 2.069 ms/op # baseline >> SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 350.705 ? 1.577 ms/op # patch >> >> Testing: tier1-4 > > Nice. @vnkozlov @TobiHartmann - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1370 From redestad at openjdk.java.net Mon Nov 23 10:21:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:21:59 GMT Subject: Integrated: 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 01:05:28 GMT, Claes Redestad wrote: > By pre-sizing the Node_Lists created in PhaseChaitin::post_allocate_copy_removal we avoid all re-allocations. > > As lists are allocated to the size we need we might also reduce memory waste, which reduce memory used by these structures by up to 50% > > Throughput wise this is also a gain and saves about 10,000 instructions per C2 compilation in SimpleRepeatCompilation.trivialMath. By using bulk clearing, avoiding redundant clearing of newly created lists and adding a shallow copy routine to Node_List we save another ~9,500 instructions per compilation in the same benchmark. This corresponds with a small but statistically significant gain: > > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 355.342 ? 2.069 ms/op # baseline > SimpleRepeatCompilation.trivialMath_repeat_c2 ss 50 350.705 ? 1.577 ms/op # patch > > Testing: tier1-4 This pull request has now been integrated. Changeset: b450e7c1 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/b450e7c1 Stats: 51 lines in 3 files changed: 13 ins; 14 del; 24 mod 8256827: C2: Avoid reallocations by pre-sizing lists in post_allocate_copy_removal Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1370 From aph at openjdk.java.net Mon Nov 23 10:25:00 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 23 Nov 2020 10:25:00 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 In-Reply-To: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: <3sEK6sctF4UMTUl8dGS8hDsf1cjlziZbOKyv8fLc-mQ=.47f839ed-5d7b-491f-936a-9c70fcbb231d@github.com> On Tue, 17 Nov 2020 06:14:37 GMT, Eric Liu wrote: > This patch fixed some potential risks in assembler_aarch64.hpp. > > According to the C standard, shift operation is undefined if the shift > count greater than or equals to the length in bits of the promoted left > operand. > > In assembler_aarch64.hpp, there are some utility functions for easily > operating the encoded instructions. E.g. > > Instruction_aarch64::patch(address, int, int, uint64_t) > > All those functions use `(1U << nbits) - 1` to calculate mask which may > have some potential risks if `nbits` equals 32. That would be an > unexpected result if someone intends to deal with an entire instruction. > > To fix this issue, this patch simply uses `1ULL` to replace `1U`. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 207: > 205: int nbits = msb - lsb + 1; > 206: assert_cond(msb >= lsb); > 207: uint32_t mask = (1ULL << nbits) - 1; Please use checked_cast((1ULL << nbits) - 1) here. If we don't cast to the shorter type at this point, some compilers will give a warning. ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From aph at openjdk.java.net Mon Nov 23 10:50:02 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 23 Nov 2020 10:50:02 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 In-Reply-To: References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: On Mon, 23 Nov 2020 10:12:06 GMT, Aleksey Shipilev wrote: > So there is a global `right_n_bits` macro that apparently does what we want? @theRealAph might want to look at this issue (probably in disgust). True, it looks like what we want, written using some horrible macros. Fine by me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From rwestrel at redhat.com Mon Nov 23 11:04:12 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 23 Nov 2020 12:04:12 +0100 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation Message-ID: <873610l4gj.fsf@redhat.com> Original bug: https://bugs.openjdk.java.net/browse/JDK-8229495 https://hg.openjdk.java.net/jdk/jdk15/rev/c973b5ec934d Original patch does not apply cleanly to 11u (different context in loopTransform.cpp, loopnode.cpp, node.hpp) but the 11u patch is identical to the original patch (except for a change related to node budget handling which doesn't exist in 11). 11u webrev: http://cr.openjdk.java.net/~roland/8229495.11u/webrev.00/ Testing: x86_64 build, tier1. The test case passes with and without the patch (I noticed when working on the upstream fix that that test case is fragile so it's not unexpected). Roland. From luhenry at microsoft.com Mon Nov 23 13:20:57 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 23 Nov 2020 13:20:57 +0000 Subject: Some questions on intrinsic for UTF8 to UTF16 decoding In-Reply-To: <59858a23-8ab0-ead7-4d15-757103e07d51@oracle.com> References: <59858a23-8ab0-ead7-4d15-757103e07d51@oracle.com> Message-ID: Hi Vladimir, > The simpliest solution is a new int[] array which holds locals var values which intrinsic can update and return. Good idea, I'll try that and report back. > An other solution (and more complex code in library_call.cpp) is to pass `src' and 'dst' to intrinsic and read/update their fields inside: I was thinking about that, but I'm not sure what it would look like. Could you just please point me to a place where something similar is already done? > Yes, 2Mb is too much. And the problem is not size but affect on startup time - it is calculated dynamically. I agree, the impact on the startup time would be the most impactful. I was thinking over the weekend whether I could generate this data statically at compile-time (with something along the line of template metaprogramming). More generally, what is the thought process in this kind of optimization, like what metrics do we favor? I am going to spend some time on this intrinsic this week to get some numbers. I'll come back to you on this discussion. > An other issue with such intrinsic I see is that decodeArrayLoop() code has a lot of checks for malformed strings which intrinsic does not have. Most likely it will not pass JCK testing. Yes, that is simply because I didn't implement it yet. The plan is to decode as much as possible in the intrinsic, and if it runs into any case that it doesn't know how to handle, it backs off and falls back to the current implementation. > Would be interesting to see performance if you vectorize only ASCII copy loop which seems most common case and you don't need table: That could definitely be possible. However, I'm not sure whether ASCII only loop is really the most common case. Moreover, the way the intrinsic is done here, if at any point a non-ASCII character is encountered, it backs off and falls back to the loop-based algorithm. We could however mix and match the ASCII-only case with the more general loop with something along the line of: ``` while (sp < sl) { if (sp + 7 < sl && dp + 7 < dl) { long b = unsafe.getLong(sa, Unsafe.ARRAY_BYTE_BASE_OFFSET + sp * Unsafe.ARRAY_BYTE_INDEX_SCALE); if ((b & 0xC0C0C0C0C0C0C0C0) != 0x8080808080808080) { da[dp + 0] = (char)sa[sp+0]; da[dp + 1] = (char)sa[sp+1]; da[dp + 2] = (char)sa[sp+2]; da[dp + 3] = (char)sa[sp+3]; da[dp + 4] = (char)sa[sp+4]; da[dp + 5] = (char)sa[sp+5]; da[dp + 6] = (char)sa[sp+6]; da[dp + 7] = (char)sa[sp+7]; sp += 8; dp += 8; continue; } } int b1 = sa[sp]; if (b1 >= 0) { ... ``` From preliminary results, the gain is not substantial on ASCII-only text, and is minimal on `"H?llo World!".repeat(1000 * 1000)`. I'll check whether we can get some gains with the Vector API on that case. Thank you for your feedback. Ludovic From thartmann at openjdk.java.net Mon Nov 23 13:32:09 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 13:32:09 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" Message-ID: The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. Thanks, Tobias ------------- Commit messages: - 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" Changes: https://git.openjdk.java.net/jdk/pull/1384/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256823 Stats: 111 lines in 5 files changed: 88 ins; 3 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/1384.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1384/head:pull/1384 PR: https://git.openjdk.java.net/jdk/pull/1384 From tobias.hartmann at oracle.com Mon Nov 23 13:33:50 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 23 Nov 2020 14:33:50 +0100 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <873610l4gj.fsf@redhat.com> References: <873610l4gj.fsf@redhat.com> Message-ID: Looks good to me. Best regards, Tobias On 23.11.20 12:04, Roland Westrelin wrote: > > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8229495 > https://hg.openjdk.java.net/jdk/jdk15/rev/c973b5ec934d > > Original patch does not apply cleanly to 11u (different context in > loopTransform.cpp, loopnode.cpp, node.hpp) but the 11u patch is > identical to the original patch (except for a change related to node > budget handling which doesn't exist in 11). > > 11u webrev: > http://cr.openjdk.java.net/~roland/8229495.11u/webrev.00/ > > Testing: x86_64 build, tier1. The test case passes with and without the > patch (I noticed when working on the upstream fix that that test case is > fragile so it's not unexpected). > > Roland. > From zgu at openjdk.java.net Mon Nov 23 13:48:00 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 23 Nov 2020 13:48:00 GMT Subject: RFR: 8249144: Potential memory leak in TypedMethodOptionMatcher In-Reply-To: References: Message-ID: <12cjG6I32j8UEKLFm6GrLxqGTX2R4YBtTyyFsmzCqTo=.5ce95025-c088-4f79-aea1-bd0838ae8257@github.com> On Mon, 23 Nov 2020 07:34:53 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks, Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/1353 From zgu at openjdk.java.net Mon Nov 23 13:48:01 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 23 Nov 2020 13:48:01 GMT Subject: Integrated: 8249144: Potential memory leak in TypedMethodOptionMatcher In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 14:45:02 GMT, Zhengyu Gu wrote: > TypedMethodOptionMatcher owns ccstr value (vs. os::strdup_check_oom()), but never frees it in destructor. > > It does not appear to a real issue so far, because TypedMethodOptionMatcher seems immortal. Given it releases _option > string, it should also release ccstr value. > > This patch has been reviewed https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-July/042356.html, but I lost track of it during repo migration. This pull request has now been integrated. Changeset: 84429cd9 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/84429cd9 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8249144: Potential memory leak in TypedMethodOptionMatcher Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1353 From rwestrel at redhat.com Mon Nov 23 14:13:34 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 23 Nov 2020 15:13:34 +0100 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: References: <873610l4gj.fsf@redhat.com> Message-ID: <87zh38jh4h.fsf@redhat.com> > Looks good to me. Thanks for the review. Roland. From vlivanov at openjdk.java.net Mon Nov 23 14:14:59 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 23 Nov 2020 14:14:59 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:26:30 GMT, Tobias Hartmann wrote: > The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. > > I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. > > Thanks, > Tobias Overall, looks good. Some minor comments follow. src/hotspot/share/opto/mulnode.cpp line 1450: > 1448: } > 1449: // Rotate by a multiple of 32/64 does nothing > 1450: int mask = (t1->isa_int() ? BitsPerJavaInteger : BitsPerJavaLong) - 1; Would be nice to assert that t1 is either TypeInt or TypeLong. src/hotspot/share/opto/mulnode.cpp line 1451: > 1449: // Rotate by a multiple of 32/64 does nothing > 1450: int mask = (t1->isa_int() ? BitsPerJavaInteger : BitsPerJavaLong) - 1; > 1451: if ((getShiftCon(phase, this, -1) & mask) == 0) { I find the interaction between `in(2)`, `-1`, and `& mask == 0` intricate (it covers both cases: when shift constant is successfully extracted and when it is not). Maybe refactor `getShiftCon()` to return a `bool` which signals that the constant was found and then return the value as an out argument? ------------- PR: https://git.openjdk.java.net/jdk/pull/1384 From redestad at openjdk.java.net Mon Nov 23 15:11:04 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 15:11:04 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods Message-ID: PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. ------------- Commit messages: - Simplify a bit - 8256858: C2: Devirtualize PhaseIterGVN-specific methods Changes: https://git.openjdk.java.net/jdk/pull/1385/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1385&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256858 Stats: 55 lines in 5 files changed: 12 ins; 12 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/1385.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1385/head:pull/1385 PR: https://git.openjdk.java.net/jdk/pull/1385 From thartmann at openjdk.java.net Mon Nov 23 15:18:12 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 15:18:12 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: > The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. > > I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Refactored getShiftCon method ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1384/files - new: https://git.openjdk.java.net/jdk/pull/1384/files/f12cccb1..9c7234b0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=00-01 Stats: 74 lines in 1 file changed: 30 ins; 2 del; 42 mod Patch: https://git.openjdk.java.net/jdk/pull/1384.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1384/head:pull/1384 PR: https://git.openjdk.java.net/jdk/pull/1384 From thartmann at openjdk.java.net Mon Nov 23 15:18:13 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 23 Nov 2020 15:18:13 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 14:12:35 GMT, Vladimir Ivanov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactored getShiftCon method > > Overall, looks good. > > Some minor comments follow. Thanks for the review Vladimir. I've updated the fix accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/1384 From shade at openjdk.java.net Mon Nov 23 15:21:16 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 15:21:16 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v3] In-Reply-To: References: Message-ID: > Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux arm fastdebug cross-compilation Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8256857-arm-foreign - Use nullptr instead - Add debug.hpp include as well - 8256857: ARM32 builds broken after JDK-8254231 - 8256857: ARM32 builds broken after JDK-8254231 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1383/files - new: https://git.openjdk.java.net/jdk/pull/1383/files/01918d87..ff639741 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=01-02 Stats: 220 lines in 14 files changed: 196 ins; 14 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1383/head:pull/1383 PR: https://git.openjdk.java.net/jdk/pull/1383 From jvernee at openjdk.java.net Mon Nov 23 15:21:17 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 23 Nov 2020 15:21:17 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v3] In-Reply-To: References: Message-ID: <1L9xggeBg0l8h7Dujr7UYtJ_Yj2-QC-iTZNVKGSScZM=.23123e6f-f4d0-4438-86f5-6a9db635020f@github.com> On Mon, 23 Nov 2020 15:06:33 GMT, Aleksey Shipilev wrote: >> LGTM. >> >> Though, note that the style guide asks to prefer `nullptr` to `NULL`: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#nullptr >> >> nullptr >> >> Prefer nullptr (n2431) to NULL. Don't use (constexpr or literal) 0 for pointers. >> >> For historical reasons there are widespread uses of both NULL and of integer 0 as a pointer value. > >> Though, note that the style guide asks to prefer `nullptr` to `NULL`: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#nullptr > > Fixed. Please push your Zero integration first, and then I rebase this one? @shipilev Done. Though I'm not a `jdk` Reviewer, so another review is needed for this PR. But, as the person most familiar with the code, it looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From vlivanov at openjdk.java.net Mon Nov 23 15:25:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 23 Nov 2020 15:25:58 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:18:12 GMT, Tobias Hartmann wrote: >> The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. >> >> I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactored getShiftCon method Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1384 From shade at openjdk.java.net Mon Nov 23 15:44:13 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 15:44:13 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 Message-ID: Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. Additional testing: - [x] Linux s390x fastdebug cross-compilation ------------- Commit messages: - 8256860: S390 builds broken after JDK-8254231 Changes: https://git.openjdk.java.net/jdk/pull/1392/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1392&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256860 Stats: 146 lines in 7 files changed: 146 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1392/head:pull/1392 PR: https://git.openjdk.java.net/jdk/pull/1392 From thomas.stuefe at gmail.com Mon Nov 23 15:49:43 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 23 Nov 2020 16:49:43 +0100 Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: Hmm. Seems to have broken Linux x64 too, at least my GH actions fail: https://github.com/tstuefe/jdk/runs/1442845292 2020-11-23T15:36:46.4011768Z Compiling 2781 files for java.desktop 2020-11-23T15:37:00.9905872Z /home/runner/work/jdk/jdk/jdk/src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp: In member function 'void ProgrammableInvoker::Generator::generate()': 2020-11-23T15:37:00.9908797Z /home/runner/work/jdk/jdk/jdk/src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp:53:23: error: 'c_rarg0' was not declared in this scope 2020-11-23T15:37:00.9910237Z 53 | __ movptr(ctxt_reg, c_rarg0); // FIXME c args? or java? 2020-11-23T15:37:00.9910995Z | ^~~~~~~ 2020-11-23T15:37:01.2114779Z make[3]: *** [lib/CompileJvm.gmk:143: /home/runner/work/jdk/jdk/jdk/build/linux-x86/hotspot/variant-server/libjvm/objs/universalNativeInvoker_x86.o] Error 1 2020-11-23T15:37:01.2148227Z make[2]: *** [make/Main.gmk:252: hotspot-server-libs] Error 2 2020-11-23T15:37:01.2149412Z make[2]: *** Waiting for unfinished jobs.... On Mon, Nov 23, 2020 at 4:44 PM Aleksey Shipilev wrote: > Foreign linker integration broke S390 builds. This change stubs out the > new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux s390x fastdebug cross-compilation > > ------------- > > Commit messages: > - 8256860: S390 builds broken after JDK-8254231 > > Changes: https://git.openjdk.java.net/jdk/pull/1392/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1392&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256860 > Stats: 146 lines in 7 files changed: 146 ins; 0 del; 0 mod > Patch: https://git.openjdk.java.net/jdk/pull/1392.diff > Fetch: git fetch https://git.openjdk.java.net/jdk > pull/1392/head:pull/1392 > > PR: https://git.openjdk.java.net/jdk/pull/1392 > From shade at openjdk.java.net Mon Nov 23 15:56:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 15:56:58 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: <4NON8Z2fTXVMvV2fnfPgYueYrEHORlCn1Ul9MBVcHRU=.809f6d27-6f42-413a-9e6c-84b19a50ac0c@github.com> On Mon, 23 Nov 2020 15:38:10 GMT, Aleksey Shipilev wrote: > Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux s390x fastdebug cross-compilation > 2020-11-23T15:37:00.9910237Z 53 | __ movptr(ctxt_reg, c_rarg0); > // FIXME c args? or java? That's different, x86_32 failure. I believe Jorn fixes it in JDK-8256486. ------------- PR: https://git.openjdk.java.net/jdk/pull/1392 From ihse at openjdk.java.net Mon Nov 23 15:59:07 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 23 Nov 2020 15:59:07 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <9oXnHULCd76_J69CKMVVZl3FfDte1pnt38y06LVV4Sg=.26a4ab2c-5ff7-4e2f-9428-0d8cd931d243@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> <9oXnHULCd76_J69CKMVVZl3FfDte1pnt38y06LVV4Sg=.26a4ab2c-5ff7-4e2f-9428-0d8cd931d243@github.com> Message-ID: On Mon, 26 Oct 2020 11:41:16 GMT, Magnus Ihse Bursie wrote: >> Some notes (perhaps most to myself) about how this ties into the existing hsdis implementation, and with JDK-8188073 (Capstone porting). >> >> When printing disassembly from hotspot, the current solution tries to locate and load the hsdis library, which prints disassembly using bfd. The reason for using the separate library approach is, as far as I can understand, perhaps a mix of both incompatible licensing for bfd, and a wish to not burden the jvm library with additional bloat which is needed only for debugging. >> >> The Capstone approach, in the prototype patch presented by Jorn in the issue, is to create a new capstonedis library, and dispatch to it instead of hsdis. >> >> The approach used in this patch is to refactor the existing hsdis library into an abstract base class for hsdis backends, with two concrete implementations, one for bfd and one for llvm. >> >> Unfortunately, I think the resulting code in hsdis.cpp in this patch is hard to read and understand. > > I think a proper solution to both this and the Capstone implementation is to provide a proper framework for selecting the hsdis backend as a first step, and refactor the existing bfd implementation as the first such backend. After that, we can add llvm and capstone as alternative hsdis backend implementations. FWIW, I started working on a framework which would add support for selectable backends for hsdis. Unfortunately it was not as simple as I initially thought, so I had to put it on hold while directing my time to working on the winenv patch instead. I believe the proper way forward is to get the "selectable hsdis backend" framework in place, and then retrofit this patch to add LLVM support in that framework. If this means that this PR should be closed, or kept open until this is done, I don't know. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From kvn at openjdk.java.net Mon Nov 23 16:03:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 16:03:57 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: References: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> Message-ID: <2pW5Q5sSG_2g_6Ai0Mz0t3bHlylYHzIbJ3mH9liKT80=.0de5ef5e-3586-4118-ae20-d5a432c081a0@github.com> On Mon, 23 Nov 2020 07:39:28 GMT, Aleksey Shipilev wrote: >> Why to run test code if you know the result will not match? Why not just skip testing? >> I thought about using `@requires vm.cpu.features` checks but your check in main() is simpler and more clear. > > Which way you'd like me to move with this, @vnkozlov? :) I am fine with either way, leaning towards what we already have. What I meant is to run tests only if SSE is used: if (expectStableFloats) { testFloat(); } if (expectStableDoubles) { testDouble(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From stuefe at openjdk.java.net Mon Nov 23 16:10:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 16:10:01 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: <4NON8Z2fTXVMvV2fnfPgYueYrEHORlCn1Ul9MBVcHRU=.809f6d27-6f42-413a-9e6c-84b19a50ac0c@github.com> References: <4NON8Z2fTXVMvV2fnfPgYueYrEHORlCn1Ul9MBVcHRU=.809f6d27-6f42-413a-9e6c-84b19a50ac0c@github.com> Message-ID: On Mon, 23 Nov 2020 15:54:28 GMT, Aleksey Shipilev wrote: > it Right sorry, I got it mixed up with the build error on linux x64 zero. ------------- PR: https://git.openjdk.java.net/jdk/pull/1392 From stuefe at openjdk.java.net Mon Nov 23 16:24:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 16:24:00 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:38:10 GMT, Aleksey Shipilev wrote: > Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux s390x fastdebug cross-compilation IIUC JDK-8254231 implements support for aarch and x84, which leaves all other architectures (s390, arm, ppc, broken? Thats... :-( Thanks a lot for fixing! If your patch fixes the build without having to stub out `ABIDescriptor::is_volatile_reg`, feel free to push. Cheers, Thomas src/hotspot/cpu/s390/foreign_globals_s390.cpp line 36: > 34: Unimplemented(); > 35: return {}; > 36: } Wont we need stubs for `ABIDescriptor::is_volatile_reg` too? ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1392 From vladimir.kozlov at oracle.com Mon Nov 23 16:28:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Nov 2020 08:28:43 -0800 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <873610l4gj.fsf@redhat.com> References: <873610l4gj.fsf@redhat.com> Message-ID: <3dc7c40f-5b16-4018-be98-cb1a6718c141@oracle.com> New test TestRCEAfterUnrolling.java is missing. Otherwise backport looks fine. Thanks, Vladimir On 11/23/20 3:04 AM, Roland Westrelin wrote: > > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8229495 > https://hg.openjdk.java.net/jdk/jdk15/rev/c973b5ec934d > > Original patch does not apply cleanly to 11u (different context in > loopTransform.cpp, loopnode.cpp, node.hpp) but the 11u patch is > identical to the original patch (except for a change related to node > budget handling which doesn't exist in 11). > > 11u webrev: > http://cr.openjdk.java.net/~roland/8229495.11u/webrev.00/ > > Testing: x86_64 build, tier1. The test case passes with and without the > patch (I noticed when working on the upstream fix that that test case is > fragile so it's not unexpected). > > Roland. > From rwestrel at redhat.com Mon Nov 23 16:38:15 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 23 Nov 2020 17:38:15 +0100 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <3dc7c40f-5b16-4018-be98-cb1a6718c141@oracle.com> References: <873610l4gj.fsf@redhat.com> <3dc7c40f-5b16-4018-be98-cb1a6718c141@oracle.com> Message-ID: <87wnycjafc.fsf@redhat.com> > New test TestRCEAfterUnrolling.java is missing. Otherwise backport looks fine. Thanks for the review. Here with the test case: http://cr.openjdk.java.net/~roland/8229495.11u/webrev.01/ Roland. From vladimir.kozlov at oracle.com Mon Nov 23 17:01:35 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Nov 2020 09:01:35 -0800 Subject: Some questions on intrinsic for UTF8 to UTF16 decoding In-Reply-To: References: <59858a23-8ab0-ead7-4d15-757103e07d51@oracle.com> Message-ID: On 11/23/20 5:20 AM, Ludovic Henry wrote: > Hi Vladimir, > >> The simpliest solution is a new int[] array which holds locals var values which intrinsic can update and return. > > Good idea, I'll try that and report back. > >> An other solution (and more complex code in library_call.cpp) is to pass `src' and 'dst' to intrinsic and read/update their fields inside: > > I was thinking about that, but I'm not sure what it would look like. Could you just please point me to a place where something similar is already done? There is method in library_call.cpp to load fields: load_field_from_object(). It can be used to load fields from 'ByteBuffer src' and 'CharBuffer dst' to calculate start and length as it is done at the beginning of decodeArrayLoop(). But unfortunately we don't have similar methods for stores. But you can look on code in LibraryCallKit::inline_unsafe_access() to create one - this is why I said it is more complicated. > >> Yes, 2Mb is too much. And the problem is not size but affect on startup time - it is calculated dynamically. > > I agree, the impact on the startup time would be the most impactful. I was thinking over the weekend whether I could generate this data statically at compile-time (with something along the line of template metaprogramming). > > More generally, what is the thought process in this kind of optimization, like what metrics do we favor? I am going to spend some time on this intrinsic this week to get some numbers. I'll come back to you on this discussion. It is all about cost vs benefits. If we can get, say, > 10% performance improvement across a broad set of applications then we can accept increase in memory size and affect on startup. But if it is in range < 2% performance - it is hard to accept. That is why we always first investigate what Java method can be intrinsified with simple changes to get big improvements which will help a lot of applications. And also look if we can add new general optimization to JIT compiler instead and get improvement not only in particular method but in others too. That is why we ask to show performance numbers first before we accept such new code. > >> An other issue with such intrinsic I see is that decodeArrayLoop() code has a lot of checks for malformed strings which intrinsic does not have. Most likely it will not pass JCK testing. > > Yes, that is simply because I didn't implement it yet. The plan is to decode as much as possible in the intrinsic, and if it runs into any case that it doesn't know how to handle, it backs off and falls back to the current implementation. > >> Would be interesting to see performance if you vectorize only ASCII copy loop which seems most common case and you don't need table: > > That could definitely be possible. However, I'm not sure whether ASCII only loop is really the most common case. Moreover, the way the intrinsic is done here, if at any point a non-ASCII character is encountered, it backs off and falls back to the loop-based algorithm. We could however mix and match the ASCII-only case with the more general loop with something along the line of: Did you see that it is auto-vectorised? Manually unrolled Java loop are not good for JIT compilation. I am not surprise you don't see any benefits with it. I was thinking to make it as separate method and write intinsic for it with vector compare instructions. Or use Vector API as you suggested. > > ``` > while (sp < sl) { > if (sp + 7 < sl && dp + 7 < dl) { > long b = unsafe.getLong(sa, Unsafe.ARRAY_BYTE_BASE_OFFSET + sp * Unsafe.ARRAY_BYTE_INDEX_SCALE); > if ((b & 0xC0C0C0C0C0C0C0C0) != 0x8080808080808080) { > da[dp + 0] = (char)sa[sp+0]; > da[dp + 1] = (char)sa[sp+1]; > da[dp + 2] = (char)sa[sp+2]; > da[dp + 3] = (char)sa[sp+3]; > da[dp + 4] = (char)sa[sp+4]; > da[dp + 5] = (char)sa[sp+5]; > da[dp + 6] = (char)sa[sp+6]; > da[dp + 7] = (char)sa[sp+7]; > sp += 8; > dp += 8; > continue; > } > } > int b1 = sa[sp]; > if (b1 >= 0) { > ... > ``` > > From preliminary results, the gain is not substantial on ASCII-only text, and is minimal on `"H?llo World!".repeat(1000 * 1000)`. I will bet for this case your intrinsic will not show benefit too - string is too short. Thanks, Vladimir > > I'll check whether we can get some gains with the Vector API on that case. > > Thank you for your feedback. > Ludovic > > From shade at openjdk.java.net Mon Nov 23 17:05:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 17:05:57 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 16:15:50 GMT, Thomas Stuefe wrote: >> Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. >> >> Additional testing: >> - [x] Linux s390x fastdebug cross-compilation > > src/hotspot/cpu/s390/foreign_globals_s390.cpp line 36: > >> 34: Unimplemented(); >> 35: return {}; >> 36: } > > Wont we need stubs for `ABIDescriptor::is_volatile_reg` too? No other arch needs it, and `s390x` builds fine for me without. AFAICS from the current code, `ABIDescriptor::is_volatile_reg` is only used from the arch-specific implementations of `universalNativeInvoker` and `universalUpcallHandler`. Which are stubbed out for `s390x`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1392 From vladimir.kozlov at oracle.com Mon Nov 23 17:53:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Nov 2020 09:53:24 -0800 Subject: [11u] 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <87wnycjafc.fsf@redhat.com> References: <873610l4gj.fsf@redhat.com> <3dc7c40f-5b16-4018-be98-cb1a6718c141@oracle.com> <87wnycjafc.fsf@redhat.com> Message-ID: Good. Thanks, Vladimir On 11/23/20 8:38 AM, Roland Westrelin wrote: > >> New test TestRCEAfterUnrolling.java is missing. Otherwise backport looks fine. > > Thanks for the review. Here with the test case: > > http://cr.openjdk.java.net/~roland/8229495.11u/webrev.01/ > > Roland. > From shade at openjdk.java.net Mon Nov 23 17:59:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 17:59:04 GMT Subject: RFR: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 16:21:04 GMT, Thomas Stuefe wrote: >> Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. >> >> Additional testing: >> - [x] Linux s390x fastdebug cross-compilation > > IIUC JDK-8254231 implements support for aarch and x84, which leaves all other architectures (s390, arm, ppc, broken? Thats... :-( > > Thanks a lot for fixing! If your patch fixes the build without having to stub out `ABIDescriptor::is_volatile_reg`, feel free to push. > > Cheers, Thomas Thanks @tstuefe! ------------- PR: https://git.openjdk.java.net/jdk/pull/1392 From shade at openjdk.java.net Mon Nov 23 17:59:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 17:59:05 GMT Subject: Integrated: 8256860: S390 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:38:10 GMT, Aleksey Shipilev wrote: > Foreign linker integration broke S390 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux s390x fastdebug cross-compilation This pull request has now been integrated. Changeset: 18e85064 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/18e85064 Stats: 146 lines in 7 files changed: 146 ins; 0 del; 0 mod 8256860: S390 builds broken after JDK-8254231 Reviewed-by: stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/1392 From kvn at openjdk.java.net Mon Nov 23 18:00:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 18:00:59 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: <5vX9kYCqJI_JXDtvludEMmM1iUJAFfQdT970cfIbl6s=.52725416-8775-41d7-8e53-f4f3635bc291@github.com> On Mon, 23 Nov 2020 15:18:12 GMT, Tobias Hartmann wrote: >> The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. >> >> I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactored getShiftCon method Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1384 From shade at openjdk.java.net Mon Nov 23 18:25:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 18:25:57 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v3] In-Reply-To: <1L9xggeBg0l8h7Dujr7UYtJ_Yj2-QC-iTZNVKGSScZM=.23123e6f-f4d0-4438-86f5-6a9db635020f@github.com> References: <1L9xggeBg0l8h7Dujr7UYtJ_Yj2-QC-iTZNVKGSScZM=.23123e6f-f4d0-4438-86f5-6a9db635020f@github.com> Message-ID: On Mon, 23 Nov 2020 15:17:30 GMT, Jorn Vernee wrote: >>> Though, note that the style guide asks to prefer `nullptr` to `NULL`: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#nullptr >> >> Fixed. Please push your Zero integration first, and then I rebase this one? > > @shipilev Done. Though I'm not a `jdk` Reviewer, so another review is needed for this PR. > > But, as the person most familiar with the code, it looks good to me. Thanks @JornVernee, no problem. Anyone to formally ack this? Let's unbreak more builds! ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From redestad at openjdk.java.net Mon Nov 23 19:40:06 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 19:40:06 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator Message-ID: By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). ------------- Commit messages: - Remove tricky assert - Optimize away _current_bit, improve assertions - 8256883: C2: Add a RegMask iterator Changes: https://git.openjdk.java.net/jdk/pull/1397/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256883 Stats: 82 lines in 4 files changed: 47 ins; 10 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/1397.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1397/head:pull/1397 PR: https://git.openjdk.java.net/jdk/pull/1397 From kvn at openjdk.java.net Mon Nov 23 19:56:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 19:56:07 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:48:21 GMT, Claes Redestad wrote: > PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. > > By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. > > This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1385 From kvn at openjdk.java.net Mon Nov 23 20:07:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 20:07:06 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 18:01:01 GMT, Claes Redestad wrote: > By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. > > As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). There are 2 places in ZGC code (x86 and aarch64) which could use new iterator too: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L482 ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1397 From github.com+51754783+coreyashford at openjdk.java.net Mon Nov 23 21:04:57 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 23 Nov 2020 21:04:57 GMT Subject: RFR: 8256431: [PPC64] Implement Base64 encodeBlock() for Power64-LE In-Reply-To: References: Message-ID: <5Cj5b7mh1v63JNMx6P_gFdPJDF8mWRMrvjfpVc8acpA=.b5a1de59-8e15-45c7-a144-8a348e3bd332@github.com> On Tue, 17 Nov 2020 01:35:05 GMT, Corey Ashford wrote: > Add a vector-based implementation of the Base64 encodeBlock intrinsic for Power9 and Power10, little-endian Linux only. > > This implementation is based upon a paper (linked in comments) describing an Intel SSE vector-based implementation of Base64 encoding. Although the Intel SSE instruction set and the Power VMX/VSX instruction sets are different, the method used in the paper is adaptable to Power. In addition there are a few places in the algorithm where it's possible to gain some performance by using more optimal instruction sequences for VMX/VSX, and some additional benefit is gained from the ISA 3.1 additions available in Power10. > > There is one controversial method I used in this implementation: I defined a macro to emit the instruction sequence for encoding 12 bytes in a vector to 16 bytes, because this sequence is needed in three places. Turning it into a function would have been possible, but I would have needed to pass quite a few register numbers into the function. I would have liked to have used a nested function, to give the function visibility to the register numbers declared in the outer scope, but alas nested functions are not possible in C++. > > The overall performance advantage on Power9 is about 4.0X, based on the main/java/org/openjdk/micro/bench/java/util/Base64VarLenEncode.java benchmark. This benchmark covers random buffer lengths from 8 to 20007 bytes. Buffers that are short won't perform as well, approaching the performance of the pure Java code (or slightly worse for very short buffers), Buffers that are consistently long will perform better a little better than 4.0X. Since there was no email sent to the hotspot-compiler-dev mailing list upon the creation of this PR, I am making this comment in hopes that a posting will be produced. ------------- PR: https://git.openjdk.java.net/jdk/pull/1245 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:04 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:04 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory Message-ID: This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. ------------- Commit messages: - 8256488: Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory Changes: https://git.openjdk.java.net/jdk/pull/1293/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1293&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256488 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1293.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1293/head:pull/1293 PR: https://git.openjdk.java.net/jdk/pull/1293 From simonis at openjdk.java.net Mon Nov 23 21:07:04 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:04 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 14:10:48 GMT, Eugene Astigeevich wrote: > This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. > This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. Thanks for the detailed performance numbers. Looks good to me. ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:09 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs Message-ID: This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. ------------- Commit messages: - 8255351: Add detection for Graviton 1 & 2 CPUs Changes: https://git.openjdk.java.net/jdk/pull/1315/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255351 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1315.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1315/head:pull/1315 PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:04 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:04 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 14:10:48 GMT, Eugene Astigeevich wrote: > This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. > This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. Here is the demonstration why ldpq/stpq is faster than ld4/st4 on Graviton2: >From Arm Neoverse N1 Optimization Guide (Graviton 2): | instr | exec lat | thr | pipelines | |------|--------|-----|----------| | ldp | 7 | 1 | L | | stp | 3 | 1/2 | V/L | | ld4 | 10 | 1/5 | V/L | | st4 | 9 | 1/6 | V/L | There are two L and two V. Estimated execution time for: ld4 ldpq st4 stpq | cycle | instr issued | |------|---------| | 0 | ld4 (L0, V0), ldpq (L1) | | 1 | . | | 2 | . | | 3 | . | | 4 | . | | 5 | . | | 6 | . | | 7 | stpq (L0, V0) | | 8 | . | | 9 | . | | 10 | st4 (L0, V0) | | 11 | . | | 12 | . | | 13 | . | | 14 | . | | 15 | . | | 16 | . | | 17 | . | | 18 | . | Estimated execution time for: ldpq ldpq ldpq stpq stpq stpq | cycle | instr issued | |------|---------| | 0 | ldpq (L0), ldpq (L1) | | 1 | ldpq (L0) | | 2 | . | | 3 | . | | 4 | . | | 5 | . | | 6 | . | | 7 | stpq (L0), stpq (L1) | | 8 | . | | 9 | stpq (L0) | | 10 | . | | 11 | . | So it is 19 vs 12. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From simonis at openjdk.java.net Mon Nov 23 21:07:09 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 12:39:47 GMT, Eugene Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Yes, that's good. Handling potential regressions/improvements in #1293 is fine for me. Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:04 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:04 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 19:18:55 GMT, Eugene Astigeevich wrote: >> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. >> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. > > Here is the demonstration why ldpq/stpq is faster than ld4/st4 on Graviton2: > From Arm Neoverse N1 Optimization Guide (Graviton 2): > | instr | exec lat | thr | pipelines | > |------|--------|-----|----------| > | ldp | 7 | 1 | L | > | stp | 3 | 1/2 | V/L | > | ld4 | 10 | 1/5 | V/L | > | st4 | 9 | 1/6 | V/L | > > There are two L and two V. > Estimated execution time for: > ld4 > ldpq > st4 > stpq > | cycle | instr issued | > |------|---------| > | 0 | ld4 (L0, V0), ldpq (L1) | > | 1 | . | > | 2 | . | > | 3 | . | > | 4 | . | > | 5 | . | > | 6 | . | > | 7 | stpq (L0, V0) | > | 8 | . | > | 9 | . | > | 10 | st4 (L0, V0) | > | 11 | . | > | 12 | . | > | 13 | . | > | 14 | . | > | 15 | . | > | 16 | . | > | 17 | . | > | 18 | . | > > Estimated execution time for: > ldpq > ldpq > ldpq > stpq > stpq > stpq > | cycle | instr issued | > |------|---------| > | 0 | ldpq (L0), ldpq (L1) | > | 1 | ldpq (L0) | > | 2 | . | > | 3 | . | > | 4 | . | > | 5 | . | > | 6 | . | > | 7 | stpq (L0), stpq (L1) | > | 8 | . | > | 9 | stpq (L0) | > | 10 | . | > | 11 | . | > > So it is 19 vs 12. Here is the demonstration why ldpq/stpq is slightly faster than ld4/st4 on Graviton1: >From Arm Cortex A72 Optimization Guide (Graviton 1): | instr | exec lat | thr | pipelines | |------|--------|-----|----------| | ldp | 6 | 1/2 | L | | stp | 4 | 1/4 | I/S | | ld4 | 11 | 1/4 | V/L | | st4 | 8 | 1/8 | V/S | There are one L, one S and two V. Estimated execution time for: ld4 ldpq st4 stpq | cycle | instr issued | |------|---------| | 0 | ld4 (L, V0) | | 1 | . | | 2 | . | | 3 | . | | 4 | ldpq (L) | | 5 | . | | 6 | stpq (S, I0) | | 7 | . | | 8 | . | | 9 | . | | 10 | . | | 11 | st4 (S, V0) | | 12 | . | | 13 | . | | 14 | . | | 15 | . | | 16 | . | | 17 | . | | 18 | . | Estimated execution time for: ldpq ldpq ldpq stpq stpq stpq | cycle | instr issued | |------|---------| | 0 | ldpq (L) | | 1 | . | | 2 | ldpq (L) | | 3 | . | | 4 | ldpq (L) | | 5 | . | | 6 | stpq (S, I0) | | 7 | . | | 8 | . | | 9 | . | | 10 | stpq (S, I0) | | 11 | . | | 12 | . | | 13 | . | | 14 | stpq (S, I0) | | 15 | . | | 16 | . | | 17 | . | So it is 19 vs 18. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From simonis at openjdk.java.net Mon Nov 23 21:07:05 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 19:46:54 GMT, Eugene Astigeevich wrote: >> Here is the demonstration why ldpq/stpq is faster than ld4/st4 on Graviton2: >> From Arm Neoverse N1 Optimization Guide (Graviton 2): >> | instr | exec lat | thr | pipelines | >> |------|--------|-----|----------| >> | ldp | 7 | 1 | L | >> | stp | 3 | 1/2 | V/L | >> | ld4 | 10 | 1/5 | V/L | >> | st4 | 9 | 1/6 | V/L | >> >> There are two L and two V. >> Estimated execution time for: >> ld4 >> ldpq >> st4 >> stpq >> | cycle | instr issued | >> |------|---------| >> | 0 | ld4 (L0, V0), ldpq (L1) | >> | 1 | . | >> | 2 | . | >> | 3 | . | >> | 4 | . | >> | 5 | . | >> | 6 | . | >> | 7 | stpq (L0, V0) | >> | 8 | . | >> | 9 | . | >> | 10 | st4 (L0, V0) | >> | 11 | . | >> | 12 | . | >> | 13 | . | >> | 14 | . | >> | 15 | . | >> | 16 | . | >> | 17 | . | >> | 18 | . | >> >> Estimated execution time for: >> ldpq >> ldpq >> ldpq >> stpq >> stpq >> stpq >> | cycle | instr issued | >> |------|---------| >> | 0 | ldpq (L0), ldpq (L1) | >> | 1 | ldpq (L0) | >> | 2 | . | >> | 3 | . | >> | 4 | . | >> | 5 | . | >> | 6 | . | >> | 7 | stpq (L0), stpq (L1) | >> | 8 | . | >> | 9 | stpq (L0) | >> | 10 | . | >> | 11 | . | >> >> So it is 19 vs 12. > > Here is the demonstration why ldpq/stpq is slightly faster than ld4/st4 on Graviton1: > From Arm Cortex A72 Optimization Guide (Graviton 1): > | instr | exec lat | thr | pipelines | > |------|--------|-----|----------| > | ldp | 6 | 1/2 | L | > | stp | 4 | 1/4 | I/S | > | ld4 | 11 | 1/4 | V/L | > | st4 | 8 | 1/8 | V/S | > > There are one L, one S and two V. > Estimated execution time for: > ld4 > ldpq > st4 > stpq > | cycle | instr issued | > |------|---------| > | 0 | ld4 (L, V0) | > | 1 | . | > | 2 | . | > | 3 | . | > | 4 | ldpq (L) | > | 5 | . | > | 6 | stpq (S, I0) | > | 7 | . | > | 8 | . | > | 9 | . | > | 10 | . | > | 11 | st4 (S, V0) | > | 12 | . | > | 13 | . | > | 14 | . | > | 15 | . | > | 16 | . | > | 17 | . | > | 18 | . | > > Estimated execution time for: > ldpq > ldpq > ldpq > stpq > stpq > stpq > | cycle | instr issued | > |------|---------| > | 0 | ldpq (L) | > | 1 | . | > | 2 | ldpq (L) | > | 3 | . | > | 4 | ldpq (L) | > | 5 | . | > | 6 | stpq (S, I0) | > | 7 | . | > | 8 | . | > | 9 | . | > | 10 | stpq (S, I0) | > | 11 | . | > | 12 | . | > | 13 | . | > | 14 | stpq (S, I0) | > | 15 | . | > | 16 | . | > | 17 | . | > > So it is 19 vs 18. Hi Evegeny, thanks for fixing this and for the detailed explanation. The change looks good to me and I will sponsor it. Can you please also post some performance numbers before and after your change? @adinn, @theRealAph, @mo-beck as this was only tested on Graviton until now and we don't have access to other aarch64 implementations, could you please be so kind to check this on your hardware to make sure we don't introduce any regression? ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:10 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:10 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 16:47:01 GMT, Volker Simonis wrote: >> Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. > > Hi Evegeny, > > in general, your changes look good to me. > > You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. > Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) > > Could you please also post some results for byte arrays (with and without SIMD). > > Thank you and best regards, > Volker I work for the Amazon Corretto team and am covered by the Amazon OCA. See the [comment](https://github.com/openjdk/jdk/pull/1315#issuecomment-731262807) from Volker above. > Hi Evegeny, > > in general, your changes look good to me. > > You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. > Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) > > Could you please also post some results for byte arrays (with and without SIMD). > > Thank you and best regards, > Volker Thank you, Volker. I did more runs of testByte microbenchmarks with PR https://github.com/openjdk/jdk/pull/1293. They show that only ArrayCopyUnalignedDst.testByte has some regressions. I am running full range of copying from 65 to 96 to see which are more affected. I decided to enable UseSIMDForMemoryOps for all types of copying because overall it with PR https://github.com/openjdk/jdk/pull/1293 brings good improvements. I'll address ArrayCopyUnalignedDst.testByte regressions in a separate PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From simonis at openjdk.java.net Mon Nov 23 21:07:09 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 16:16:19 GMT, Volker Simonis wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. Hi Evegeny, in general, your changes look good to me. You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) Could you please also post some results for byte arrays (with and without SIMD). Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From simonis at openjdk.java.net Mon Nov 23 21:07:05 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 17:57:03 GMT, Volker Simonis wrote: >> Here is the demonstration why ldpq/stpq is slightly faster than ld4/st4 on Graviton1: >> From Arm Cortex A72 Optimization Guide (Graviton 1): >> | instr | exec lat | thr | pipelines | >> |------|--------|-----|----------| >> | ldp | 6 | 1/2 | L | >> | stp | 4 | 1/4 | I/S | >> | ld4 | 11 | 1/4 | V/L | >> | st4 | 8 | 1/8 | V/S | >> >> There are one L, one S and two V. >> Estimated execution time for: >> ld4 >> ldpq >> st4 >> stpq >> | cycle | instr issued | >> |------|---------| >> | 0 | ld4 (L, V0) | >> | 1 | . | >> | 2 | . | >> | 3 | . | >> | 4 | ldpq (L) | >> | 5 | . | >> | 6 | stpq (S, I0) | >> | 7 | . | >> | 8 | . | >> | 9 | . | >> | 10 | . | >> | 11 | st4 (S, V0) | >> | 12 | . | >> | 13 | . | >> | 14 | . | >> | 15 | . | >> | 16 | . | >> | 17 | . | >> | 18 | . | >> >> Estimated execution time for: >> ldpq >> ldpq >> ldpq >> stpq >> stpq >> stpq >> | cycle | instr issued | >> |------|---------| >> | 0 | ldpq (L) | >> | 1 | . | >> | 2 | ldpq (L) | >> | 3 | . | >> | 4 | ldpq (L) | >> | 5 | . | >> | 6 | stpq (S, I0) | >> | 7 | . | >> | 8 | . | >> | 9 | . | >> | 10 | stpq (S, I0) | >> | 11 | . | >> | 12 | . | >> | 13 | . | >> | 14 | stpq (S, I0) | >> | 15 | . | >> | 16 | . | >> | 17 | . | >> >> So it is 19 vs 18. > > Hi Evegeny, > > thanks for fixing this and for the detailed explanation. > > The change looks good to me and I will sponsor it. > Can you please also post some performance numbers before and after your change? > > @adinn, @theRealAph, @mo-beck as this was only tested on Graviton until now and we don't have access to other aarch64 implementations, could you please be so kind to check this on your hardware to make sure we don't introduce any regression? > Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! Evegeny is part of the Amazon Corretto team and covered by Amazons OCA. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:05 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 17:58:39 GMT, Volker Simonis wrote: >> Hi Evegeny, >> >> thanks for fixing this and for the detailed explanation. >> >> The change looks good to me and I will sponsor it. >> Can you please also post some performance numbers before and after your change? >> >> @adinn, @theRealAph, @mo-beck as this was only tested on Graviton until now and we don't have access to other aarch64 implementations, could you please be so kind to check this on your hardware to make sure we don't introduce any regression? > >> Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! > > Evegeny is part of the Amazon Corretto team and covered by Amazons OCA. JMH microbenchmark results for testByte: |Benchmark|Length|Count|Units|ld4 vs simd_off|ldpq vs simd_off|ldpq vs ld4|Maximum Relative Error| |-|-|-|-|-|-|-|-| |ArrayCopyAligned.testByte|65|25|ns/op|49.34%|-1.91%|-34.32%|0.37%| |ArrayCopyAligned.testByte|66|25|ns/op|49.18%|-1.95%|-34.28%|0.27%| |ArrayCopyAligned.testByte|67|25|ns/op|49.29%|-1.82%|-34.24%|0.38%| |ArrayCopyAligned.testByte|68|25|ns/op|51.10%|-2.61%|-35.55%|0.59%| |ArrayCopyAligned.testByte|69|25|ns/op|49.22%|-1.82%|-34.21%|0.36%| |ArrayCopyAligned.testByte|70|25|ns/op|49.38%|-1.72%|-34.21%|0.38%| |ArrayCopyAligned.testByte|71|25|ns/op|49.34%|-2.06%|-34.42%|0.30%| |ArrayCopyAligned.testByte|72|25|ns/op|50.97%|-2.78%|-35.60%|0.65%| |ArrayCopyAligned.testByte|73|25|ns/op|50.01%|-1.62%|-34.42%|0.37%| |ArrayCopyAligned.testByte|74|25|ns/op|49.81%|-1.84%|-34.48%|0.35%| |ArrayCopyAligned.testByte|75|25|ns/op|49.85%|-1.86%|-34.51%|0.35%| |ArrayCopyAligned.testByte|76|25|ns/op|51.33%|-2.54%|-35.60%|0.59%| |ArrayCopyAligned.testByte|77|25|ns/op|49.72%|-1.81%|-34.42%|0.41%| |ArrayCopyAligned.testByte|78|25|ns/op|49.87%|-1.74%|-34.44%|0.37%| |ArrayCopyAligned.testByte|79|25|ns/op|49.67%|-1.91%|-34.47%|0.46%| |ArrayCopyAligned.testByte|80|25|ns/op|51.35%|-2.77%|-35.76%|0.65%| |ArrayCopyAligned.testByte|81|25|ns/op|8.70%|-29.07%|-34.75%|0.35%| |ArrayCopyAligned.testByte|82|25|ns/op|13.64%|-25.96%|-34.85%|0.44%| |ArrayCopyAligned.testByte|83|25|ns/op|12.04%|-26.80%|-34.66%|0.37%| |ArrayCopyAligned.testByte|84|25|ns/op|13.63%|-26.54%|-35.35%|0.46%| |ArrayCopyAligned.testByte|85|25|ns/op|11.52%|-27.18%|-34.71%|0.52%| |ArrayCopyAligned.testByte|86|25|ns/op|11.59%|-27.15%|-34.71%|0.29%| |ArrayCopyAligned.testByte|87|25|ns/op|10.47%|-27.82%|-34.66%|0.29%| |ArrayCopyAligned.testByte|88|25|ns/op|8.69%|-29.65%|-35.27%|0.20%| |ArrayCopyAligned.testByte|89|25|ns/op|8.70%|-28.86%|-34.56%|0.66%| |ArrayCopyAligned.testByte|90|25|ns/op|13.01%|-26.28%|-34.77%|0.28%| |ArrayCopyAligned.testByte|91|25|ns/op|10.96%|-27.62%|-34.77%|0.34%| |ArrayCopyAligned.testByte|92|25|ns/op|13.26%|-26.76%|-35.33%|0.32%| |ArrayCopyAligned.testByte|93|25|ns/op|10.67%|-27.61%|-34.59%|0.63%| |ArrayCopyAligned.testByte|94|25|ns/op|11.05%|-27.62%|-34.83%|0.33%| |ArrayCopyAligned.testByte|95|25|ns/op|6.69%|-30.16%|-34.54%|0.61%| |ArrayCopyAligned.testByte|96|25|ns/op|8.70%|-30.14%|-35.73%|0.23%| |ArrayCopyUnalignedBoth.testByte|65|25|ns/op|37.93%|2.64%|-25.59%|0.92%| |ArrayCopyUnalignedBoth.testByte|66|25|ns/op|37.58%|-1.15%|-28.15%|0.57%| |ArrayCopyUnalignedBoth.testByte|67|25|ns/op|39.73%|7.31%|-23.20%|1.03%| |ArrayCopyUnalignedBoth.testByte|68|25|ns/op|37.07%|3.08%|-24.80%|0.88%| |ArrayCopyUnalignedBoth.testByte|69|25|ns/op|37.80%|3.15%|-25.15%|1.16%| |ArrayCopyUnalignedBoth.testByte|70|25|ns/op|37.48%|-1.18%|-28.12%|0.74%| |ArrayCopyUnalignedBoth.testByte|71|25|ns/op|39.83%|7.74%|-22.95%|1.00%| |ArrayCopyUnalignedBoth.testByte|72|25|ns/op|37.29%|3.87%|-24.34%|1.03%| |ArrayCopyUnalignedBoth.testByte|73|25|ns/op|37.71%|3.00%|-25.21%|0.89%| |ArrayCopyUnalignedBoth.testByte|74|25|ns/op|37.51%|-1.04%|-28.03%|0.79%| |ArrayCopyUnalignedBoth.testByte|75|25|ns/op|39.83%|7.33%|-23.24%|1.05%| |ArrayCopyUnalignedBoth.testByte|76|25|ns/op|37.47%|3.41%|-24.78%|0.97%| |ArrayCopyUnalignedBoth.testByte|77|25|ns/op|37.59%|3.71%|-24.63%|0.96%| |ArrayCopyUnalignedBoth.testByte|78|25|ns/op|39.23%|-5.11%|-31.84%|0.18%| |ArrayCopyUnalignedBoth.testByte|79|25|ns/op|40.30%|-5.81%|-32.86%|0.19%| |ArrayCopyUnalignedBoth.testByte|80|25|ns/op|37.41%|-4.85%|-30.75%|0.22%| |ArrayCopyUnalignedBoth.testByte|81|25|ns/op|-3.82%|-33.50%|-30.86%|0.17%| |ArrayCopyUnalignedBoth.testByte|82|25|ns/op|-4.27%|-34.19%|-31.26%|0.23%| |ArrayCopyUnalignedBoth.testByte|83|25|ns/op|-3.83%|-34.43%|-31.82%|0.23%| |ArrayCopyUnalignedBoth.testByte|84|25|ns/op|-4.29%|-33.78%|-30.81%|0.14%| |ArrayCopyUnalignedBoth.testByte|85|25|ns/op|-4.13%|-33.67%|-30.82%|0.15%| |ArrayCopyUnalignedBoth.testByte|86|25|ns/op|-7.46%|-36.44%|-31.31%|0.28%| |ArrayCopyUnalignedBoth.testByte|87|25|ns/op|-3.85%|-34.39%|-31.76%|0.18%| |ArrayCopyUnalignedBoth.testByte|88|25|ns/op|-4.30%|-33.77%|-30.79%|0.19%| |ArrayCopyUnalignedBoth.testByte|89|25|ns/op|-4.12%|-33.74%|-30.90%|0.16%| |ArrayCopyUnalignedBoth.testByte|90|25|ns/op|-7.51%|-36.41%|-31.25%|0.63%| |ArrayCopyUnalignedBoth.testByte|91|25|ns/op|-4.19%|-34.64%|-31.77%|0.16%| |ArrayCopyUnalignedBoth.testByte|92|25|ns/op|-7.45%|-35.98%|-30.83%|0.42%| |ArrayCopyUnalignedBoth.testByte|93|25|ns/op|-7.21%|-35.76%|-30.76%|0.47%| |ArrayCopyUnalignedBoth.testByte|94|25|ns/op|-9.69%|-38.64%|-32.05%|0.38%| |ArrayCopyUnalignedBoth.testByte|95|25|ns/op|-3.85%|-35.64%|-33.06%|0.37%| |ArrayCopyUnalignedBoth.testByte|96|25|ns/op|-4.89%|-34.30%|-30.93%|0.25%| |ArrayCopyUnalignedDst.testByte|65|25|ns/op|48.48%|18.07%|-20.48%|1.29%| |ArrayCopyUnalignedDst.testByte|66|25|ns/op|48.79%|17.99%|-20.70%|1.58%| |ArrayCopyUnalignedDst.testByte|67|25|ns/op|49.03%|16.96%|-21.52%|3.64%| |ArrayCopyUnalignedDst.testByte|68|25|ns/op|49.99%|23.55%|-17.63%|0.97%| |ArrayCopyUnalignedDst.testByte|69|25|ns/op|49.03%|22.42%|-17.86%|1.33%| |ArrayCopyUnalignedDst.testByte|70|25|ns/op|49.19%|22.68%|-17.77%|1.23%| |ArrayCopyUnalignedDst.testByte|71|25|ns/op|48.99%|16.72%|-21.66%|3.30%| |ArrayCopyUnalignedDst.testByte|72|25|ns/op|50.08%|24.67%|-16.93%|1.02%| |ArrayCopyUnalignedDst.testByte|73|25|ns/op|49.69%|22.92%|-17.88%|1.29%| |ArrayCopyUnalignedDst.testByte|74|25|ns/op|49.57%|23.24%|-17.60%|1.14%| |ArrayCopyUnalignedDst.testByte|75|25|ns/op|49.84%|18.77%|-20.74%|3.32%| |ArrayCopyUnalignedDst.testByte|76|25|ns/op|50.06%|24.72%|-16.89%|1.09%| |ArrayCopyUnalignedDst.testByte|77|25|ns/op|49.70%|23.13%|-17.75%|1.24%| |ArrayCopyUnalignedDst.testByte|78|25|ns/op|49.70%|23.31%|-17.63%|1.37%| |ArrayCopyUnalignedDst.testByte|79|25|ns/op|49.83%|-2.56%|-34.97%|0.55%| |ArrayCopyUnalignedDst.testByte|80|25|ns/op|49.84%|-3.07%|-35.31%|0.27%| |ArrayCopyUnalignedDst.testByte|81|25|ns/op|8.70%|-28.50%|-34.22%|0.20%| |ArrayCopyUnalignedDst.testByte|82|25|ns/op|13.63%|-24.95%|-33.95%|0.48%| |ArrayCopyUnalignedDst.testByte|83|25|ns/op|12.38%|-26.46%|-34.56%|0.25%| |ArrayCopyUnalignedDst.testByte|84|25|ns/op|13.63%|-26.45%|-35.27%|0.39%| |ArrayCopyUnalignedDst.testByte|85|25|ns/op|10.67%|-27.24%|-34.26%|0.23%| |ArrayCopyUnalignedDst.testByte|86|25|ns/op|11.70%|-26.56%|-34.25%|0.20%| |ArrayCopyUnalignedDst.testByte|87|25|ns/op|10.51%|-27.65%|-34.53%|0.27%| |ArrayCopyUnalignedDst.testByte|88|25|ns/op|8.69%|-29.76%|-35.38%|0.17%| |ArrayCopyUnalignedDst.testByte|89|25|ns/op|8.69%|-28.64%|-34.35%|0.24%| |ArrayCopyUnalignedDst.testByte|90|25|ns/op|13.03%|-25.69%|-34.25%|0.26%| |ArrayCopyUnalignedDst.testByte|91|25|ns/op|11.09%|-27.20%|-34.47%|0.26%| |ArrayCopyUnalignedDst.testByte|92|25|ns/op|13.46%|-26.68%|-35.38%|0.20%| |ArrayCopyUnalignedDst.testByte|93|25|ns/op|10.75%|-27.34%|-34.39%|0.22%| |ArrayCopyUnalignedDst.testByte|94|25|ns/op|11.07%|-27.00%|-34.27%|0.27%| |ArrayCopyUnalignedDst.testByte|95|25|ns/op|6.67%|-30.77%|-35.11%|0.25%| |ArrayCopyUnalignedDst.testByte|96|25|ns/op|8.70%|-30.01%|-35.61%|0.17%| |ArrayCopyUnalignedSrc.testByte|65|25|ns/op|38.80%|-4.97%|-31.53%|0.15%| |ArrayCopyUnalignedSrc.testByte|66|25|ns/op|38.86%|-4.86%|-31.49%|0.16%| |ArrayCopyUnalignedSrc.testByte|67|25|ns/op|41.44%|-5.85%|-33.44%|0.48%| |ArrayCopyUnalignedSrc.testByte|68|25|ns/op|40.06%|-4.59%|-31.88%|0.16%| |ArrayCopyUnalignedSrc.testByte|69|25|ns/op|38.98%|-4.64%|-31.39%|0.29%| |ArrayCopyUnalignedSrc.testByte|70|25|ns/op|39.00%|-4.60%|-31.37%|0.26%| |ArrayCopyUnalignedSrc.testByte|71|25|ns/op|41.20%|-5.49%|-33.07%|0.27%| |ArrayCopyUnalignedSrc.testByte|72|25|ns/op|40.06%|-4.56%|-31.86%|0.21%| |ArrayCopyUnalignedSrc.testByte|73|25|ns/op|38.57%|-4.92%|-31.38%|0.19%| |ArrayCopyUnalignedSrc.testByte|74|25|ns/op|38.70%|-4.83%|-31.38%|0.25%| |ArrayCopyUnalignedSrc.testByte|75|25|ns/op|41.26%|-5.52%|-33.12%|0.18%| |ArrayCopyUnalignedSrc.testByte|76|25|ns/op|39.51%|-4.77%|-31.74%|0.20%| |ArrayCopyUnalignedSrc.testByte|77|25|ns/op|38.54%|-4.81%|-31.29%|0.32%| |ArrayCopyUnalignedSrc.testByte|78|25|ns/op|38.29%|-5.12%|-31.39%|0.22%| |ArrayCopyUnalignedSrc.testByte|79|25|ns/op|40.90%|-5.56%|-32.97%|0.33%| |ArrayCopyUnalignedSrc.testByte|80|25|ns/op|40.10%|-4.82%|-32.06%|0.22%| |ArrayCopyUnalignedSrc.testByte|81|25|ns/op|-3.84%|-34.15%|-31.52%|0.18%| |ArrayCopyUnalignedSrc.testByte|82|25|ns/op|-3.89%|-34.12%|-31.45%|0.28%| |ArrayCopyUnalignedSrc.testByte|83|25|ns/op|-3.85%|-36.20%|-33.64%|0.35%| |ArrayCopyUnalignedSrc.testByte|84|25|ns/op|-3.85%|-34.71%|-32.09%|0.34%| |ArrayCopyUnalignedSrc.testByte|85|25|ns/op|-3.83%|-34.11%|-31.49%|0.29%| |ArrayCopyUnalignedSrc.testByte|86|25|ns/op|-3.83%|-34.18%|-31.56%|0.38%| |ArrayCopyUnalignedSrc.testByte|87|25|ns/op|-3.84%|-36.04%|-33.48%|0.20%| |ArrayCopyUnalignedSrc.testByte|88|25|ns/op|-3.84%|-34.65%|-32.04%|0.15%| |ArrayCopyUnalignedSrc.testByte|89|25|ns/op|-3.84%|-34.03%|-31.39%|0.16%| |ArrayCopyUnalignedSrc.testByte|90|25|ns/op|-4.32%|-34.37%|-31.40%|0.19%| |ArrayCopyUnalignedSrc.testByte|91|25|ns/op|-3.84%|-36.08%|-33.52%|0.36%| |ArrayCopyUnalignedSrc.testByte|92|25|ns/op|-3.84%|-34.41%|-31.79%|0.38%| |ArrayCopyUnalignedSrc.testByte|93|25|ns/op|-3.85%|-34.04%|-31.40%|0.19%| |ArrayCopyUnalignedSrc.testByte|94|25|ns/op|-3.82%|-34.07%|-31.45%|0.20%| |ArrayCopyUnalignedSrc.testByte|95|25|ns/op|-3.84%|-36.01%|-33.45%|0.32%| |ArrayCopyUnalignedSrc.testByte|96|25|ns/op|-3.88%|-34.93%|-32.30%|0.32%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:10 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:10 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <_XSbz8D-TnDy1N1JfJC07OgEa6tfQ2GhRGdaJSkUAb4=.941ec975-9ca7-46fe-94ac-15b5b7c87ecb@github.com> On Fri, 20 Nov 2020 17:47:13 GMT, Eugene Astigeevich wrote: >> Hi Evegeny, >> >> in general, your changes look good to me. >> >> You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. >> Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) >> >> Could you please also post some results for byte arrays (with and without SIMD). >> >> Thank you and best regards, >> Volker > > I work for the Amazon Corretto team and am covered by the Amazon OCA. See the [comment](https://github.com/openjdk/jdk/pull/1315#issuecomment-731262807) from Volker above. These are JMH microbenchmark results when UseSIMDForMemoryOps is on for all types of copying. Regressions are due to the use of ld4/st4 and are fixed in PR https://github.com/openjdk/jdk/pull/1293 |Benchmark|Length|Cnt|Units|Diff|Max Relative Error| |-|-|-|-|-|-| |ArrayCopy.arrayCopyChar|46|25|ns/op|5.26%🔴|0.01%| |ArrayCopy.arrayCopyCharNonConst|46|25|ns/op|15.79%🔴|0.01%| |ArrayCopy.arrayCopyObject|200|25|ns/op|-5.91%|1.76%| |ArrayCopy.arrayCopyObjectNonConst|200|25|ns/op|-5.78%|0.73%| |ArrayCopy.arrayCopyObjectSameArraysBackwa|200|25|ns/op|-1.27%|0.71%| |ArrayCopy.arrayCopyObjectSameArraysForwar|200|25|ns/op|-2.05%|0.76%| |ArrayCopyAligned.testByte|70|25|ns/op|48.59%🔴|0.48%| |ArrayCopyAligned.testByte|150|25|ns/op|1.72%|4.19%| |ArrayCopyAligned.testByte|300|25|ns/op|-3.20%|4.78%| |ArrayCopyAligned.testByte|600|25|ns/op|-8.24%|2.62%| |ArrayCopyAligned.testByte|1200|25|ns/op|-13.33%|3.13%| |ArrayCopyAligned.testChar|20|25|ns/op|-5.57%|0.01%| |ArrayCopyAligned.testChar|70|25|ns/op|-5.42%|3.64%| |ArrayCopyAligned.testChar|150|25|ns/op|-4.96%|1.58%| |ArrayCopyAligned.testChar|300|25|ns/op|-12.06%|0.83%| |ArrayCopyAligned.testChar|600|25|ns/op|-16.13%|0.37%| |ArrayCopyAligned.testChar|1200|25|ns/op|-16.12%|0.80%| |ArrayCopyAligned.testInt|10|25|ns/op|-5.55%|0.04%| |ArrayCopyAligned.testInt|20|25|ns/op|34.75%🔴|1.30%| |ArrayCopyAligned.testInt|70|25|ns/op|-8.75%|2.12%| |ArrayCopyAligned.testInt|150|25|ns/op|-11.74%|1.13%| |ArrayCopyAligned.testInt|300|25|ns/op|-14.38%|0.69%| |ArrayCopyAligned.testInt|600|25|ns/op|-17.87%|1.08%| |ArrayCopyAligned.testInt|1200|25|ns/op|-18.01%|0.90%| |ArrayCopyAligned.testLong|5|25|ns/op|-4.37%|0.77%| |ArrayCopyAligned.testLong|10|25|ns/op|27.45%🔴|6.59%| |ArrayCopyAligned.testLong|20|25|ns/op|-1.95%|2.76%| |ArrayCopyAligned.testLong|70|25|ns/op|-11.46%|1.42%| |ArrayCopyAligned.testLong|150|25|ns/op|-16.28%|0.68%| |ArrayCopyAligned.testLong|300|25|ns/op|-18.02%|1.90%| |ArrayCopyAligned.testLong|600|25|ns/op|-18.08%|0.90%| |ArrayCopyAligned.testLong|1200|25|ns/op|-18.67%|1.16%| |ArrayCopyUnalignedBoth.testByte|70|25|ns/op|38.98%🔴|0.47%| |ArrayCopyUnalignedBoth.testByte|150|25|ns/op|0.32%|1.15%| |ArrayCopyUnalignedBoth.testByte|300|25|ns/op|-1.94%|1.72%| |ArrayCopyUnalignedBoth.testByte|600|25|ns/op|-4.96%|1.05%| |ArrayCopyUnalignedBoth.testByte|1200|25|ns/op|-11.10%|1.11%| |ArrayCopyUnalignedBoth.testChar|20|25|ns/op|-5.56%|0.07%| |ArrayCopyUnalignedBoth.testChar|70|25|ns/op|-2.05%|1.43%| |ArrayCopyUnalignedBoth.testChar|150|25|ns/op|-5.62%|1.32%| |ArrayCopyUnalignedBoth.testChar|300|25|ns/op|-10.33%|0.70%| |ArrayCopyUnalignedBoth.testChar|600|25|ns/op|-14.68%|0.39%| |ArrayCopyUnalignedBoth.testChar|1200|25|ns/op|-16.43%|0.44%| |ArrayCopyUnalignedBoth.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedBoth.testInt|20|25|ns/op|33.55%🔴|0.36%| |ArrayCopyUnalignedBoth.testInt|70|25|ns/op|-5.35%|2.38%| |ArrayCopyUnalignedBoth.testInt|150|25|ns/op|-11.15%|1.75%| |ArrayCopyUnalignedBoth.testInt|300|25|ns/op|-14.18%|1.27%| |ArrayCopyUnalignedBoth.testInt|600|25|ns/op|-15.84%|0.68%| |ArrayCopyUnalignedBoth.testInt|1200|25|ns/op|-16.42%|0.45%| |ArrayCopyUnalignedBoth.testLong|5|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedBoth.testLong|10|25|ns/op|35.60%🔴|1.69%| |ArrayCopyUnalignedBoth.testLong|20|25|ns/op|-3.18%|2.26%| |ArrayCopyUnalignedBoth.testLong|70|25|ns/op|-10.66%|0.99%| |ArrayCopyUnalignedBoth.testLong|150|25|ns/op|-15.68%|1.01%| |ArrayCopyUnalignedBoth.testLong|300|25|ns/op|-15.57%|0.47%| |ArrayCopyUnalignedBoth.testLong|600|25|ns/op|-17.11%|0.23%| |ArrayCopyUnalignedBoth.testLong|1200|25|ns/op|-17.00%|0.55%| |ArrayCopyUnalignedDst.testByte|70|25|ns/op|48.32%🔴|0.30%| |ArrayCopyUnalignedDst.testByte|150|25|ns/op|-0.68%|4.05%| |ArrayCopyUnalignedDst.testByte|300|25|ns/op|-7.50%|1.35%| |ArrayCopyUnalignedDst.testByte|600|25|ns/op|-10.04%|1.51%| |ArrayCopyUnalignedDst.testByte|1200|25|ns/op|-14.07%|0.98%| |ArrayCopyUnalignedDst.testChar|20|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedDst.testChar|70|25|ns/op|-5.70%|3.79%| |ArrayCopyUnalignedDst.testChar|150|25|ns/op|-6.26%|1.59%| |ArrayCopyUnalignedDst.testChar|300|25|ns/op|-12.78%|0.86%| |ArrayCopyUnalignedDst.testChar|600|25|ns/op|-14.29%|0.54%| |ArrayCopyUnalignedDst.testChar|1200|25|ns/op|-17.37%|1.18%| |ArrayCopyUnalignedDst.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedDst.testInt|20|25|ns/op|34.84%🔴|1.46%| |ArrayCopyUnalignedDst.testInt|70|25|ns/op|-8.32%|1.33%| |ArrayCopyUnalignedDst.testInt|150|25|ns/op|-11.82%|0.61%| |ArrayCopyUnalignedDst.testInt|300|25|ns/op|-13.78%|0.59%| |ArrayCopyUnalignedDst.testInt|600|25|ns/op|-16.61%|0.94%| |ArrayCopyUnalignedDst.testInt|1200|25|ns/op|-17.63%|0.78%| |ArrayCopyUnalignedDst.testLong|5|25|ns/op|-5.51%|0.07%| |ArrayCopyUnalignedDst.testLong|10|25|ns/op|37.01%🔴|1.03%| |ArrayCopyUnalignedDst.testLong|20|25|ns/op|-3.52%|2.72%| |ArrayCopyUnalignedDst.testLong|70|25|ns/op|-10.93%|1.16%| |ArrayCopyUnalignedDst.testLong|150|25|ns/op|-16.99%|0.61%| |ArrayCopyUnalignedDst.testLong|300|25|ns/op|-16.65%|0.45%| |ArrayCopyUnalignedDst.testLong|600|25|ns/op|-16.55%|0.60%| |ArrayCopyUnalignedDst.testLong|1200|25|ns/op|-17.37%|0.76%| |ArrayCopyUnalignedSrc.testByte|70|25|ns/op|38.66%🔴|0.22%| |ArrayCopyUnalignedSrc.testByte|150|25|ns/op|1.64%|1.69%| |ArrayCopyUnalignedSrc.testByte|300|25|ns/op|-5.86%|0.64%| |ArrayCopyUnalignedSrc.testByte|600|25|ns/op|-10.30%|1.71%| |ArrayCopyUnalignedSrc.testByte|1200|25|ns/op|-14.25%|0.91%| |ArrayCopyUnalignedSrc.testChar|20|25|ns/op|-5.73%|0.10%| |ArrayCopyUnalignedSrc.testChar|70|25|ns/op|-3.69%|1.68%| |ArrayCopyUnalignedSrc.testChar|150|25|ns/op|-8.36%|2.28%| |ArrayCopyUnalignedSrc.testChar|300|25|ns/op|-9.90%|0.49%| |ArrayCopyUnalignedSrc.testChar|600|25|ns/op|-15.08%|0.55%| |ArrayCopyUnalignedSrc.testChar|1200|25|ns/op|-17.08%|0.49%| |ArrayCopyUnalignedSrc.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedSrc.testInt|20|25|ns/op|33.53%🔴|0.28%| |ArrayCopyUnalignedSrc.testInt|70|25|ns/op|-8.23%|2.06%| |ArrayCopyUnalignedSrc.testInt|150|25|ns/op|-12.65%|1.27%| |ArrayCopyUnalignedSrc.testInt|300|25|ns/op|-14.22%|0.41%| |ArrayCopyUnalignedSrc.testInt|600|25|ns/op|-16.20%|0.37%| |ArrayCopyUnalignedSrc.testInt|1200|25|ns/op|-15.81%|1.09%| |ArrayCopyUnalignedSrc.testLong|5|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedSrc.testLong|10|25|ns/op|35.81%🔴|0.54%| |ArrayCopyUnalignedSrc.testLong|20|25|ns/op|-3.96%|2.10%| |ArrayCopyUnalignedSrc.testLong|70|25|ns/op|-10.90%|0.79%| |ArrayCopyUnalignedSrc.testLong|150|25|ns/op|-15.83%|0.64%| |ArrayCopyUnalignedSrc.testLong|300|25|ns/op|-17.88%|1.61%| |ArrayCopyUnalignedSrc.testLong|600|25|ns/op|-18.03%|0.88%| |ArrayCopyUnalignedSrc.testLong|1200|25|ns/op|-18.93%|0.04%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:05 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 19:56:50 GMT, Eugene Astigeevich wrote: >>> Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! >> >> Evegeny is part of the Amazon Corretto team and covered by Amazons OCA. > > JMH microbenchmark results for testByte: > |Benchmark|Length|Count|Units|ld4 vs simd_off|ldpq vs simd_off|ldpq vs ld4|Maximum Relative Error| > |-|-|-|-|-|-|-|-| > |ArrayCopyAligned.testByte|65|25|ns/op|49.34%|-1.91%|-34.32%|0.37%| > |ArrayCopyAligned.testByte|66|25|ns/op|49.18%|-1.95%|-34.28%|0.27%| > |ArrayCopyAligned.testByte|67|25|ns/op|49.29%|-1.82%|-34.24%|0.38%| > |ArrayCopyAligned.testByte|68|25|ns/op|51.10%|-2.61%|-35.55%|0.59%| > |ArrayCopyAligned.testByte|69|25|ns/op|49.22%|-1.82%|-34.21%|0.36%| > |ArrayCopyAligned.testByte|70|25|ns/op|49.38%|-1.72%|-34.21%|0.38%| > |ArrayCopyAligned.testByte|71|25|ns/op|49.34%|-2.06%|-34.42%|0.30%| > |ArrayCopyAligned.testByte|72|25|ns/op|50.97%|-2.78%|-35.60%|0.65%| > |ArrayCopyAligned.testByte|73|25|ns/op|50.01%|-1.62%|-34.42%|0.37%| > |ArrayCopyAligned.testByte|74|25|ns/op|49.81%|-1.84%|-34.48%|0.35%| > |ArrayCopyAligned.testByte|75|25|ns/op|49.85%|-1.86%|-34.51%|0.35%| > |ArrayCopyAligned.testByte|76|25|ns/op|51.33%|-2.54%|-35.60%|0.59%| > |ArrayCopyAligned.testByte|77|25|ns/op|49.72%|-1.81%|-34.42%|0.41%| > |ArrayCopyAligned.testByte|78|25|ns/op|49.87%|-1.74%|-34.44%|0.37%| > |ArrayCopyAligned.testByte|79|25|ns/op|49.67%|-1.91%|-34.47%|0.46%| > |ArrayCopyAligned.testByte|80|25|ns/op|51.35%|-2.77%|-35.76%|0.65%| > |ArrayCopyAligned.testByte|81|25|ns/op|8.70%|-29.07%|-34.75%|0.35%| > |ArrayCopyAligned.testByte|82|25|ns/op|13.64%|-25.96%|-34.85%|0.44%| > |ArrayCopyAligned.testByte|83|25|ns/op|12.04%|-26.80%|-34.66%|0.37%| > |ArrayCopyAligned.testByte|84|25|ns/op|13.63%|-26.54%|-35.35%|0.46%| > |ArrayCopyAligned.testByte|85|25|ns/op|11.52%|-27.18%|-34.71%|0.52%| > |ArrayCopyAligned.testByte|86|25|ns/op|11.59%|-27.15%|-34.71%|0.29%| > |ArrayCopyAligned.testByte|87|25|ns/op|10.47%|-27.82%|-34.66%|0.29%| > |ArrayCopyAligned.testByte|88|25|ns/op|8.69%|-29.65%|-35.27%|0.20%| > |ArrayCopyAligned.testByte|89|25|ns/op|8.70%|-28.86%|-34.56%|0.66%| > |ArrayCopyAligned.testByte|90|25|ns/op|13.01%|-26.28%|-34.77%|0.28%| > |ArrayCopyAligned.testByte|91|25|ns/op|10.96%|-27.62%|-34.77%|0.34%| > |ArrayCopyAligned.testByte|92|25|ns/op|13.26%|-26.76%|-35.33%|0.32%| > |ArrayCopyAligned.testByte|93|25|ns/op|10.67%|-27.61%|-34.59%|0.63%| > |ArrayCopyAligned.testByte|94|25|ns/op|11.05%|-27.62%|-34.83%|0.33%| > |ArrayCopyAligned.testByte|95|25|ns/op|6.69%|-30.16%|-34.54%|0.61%| > |ArrayCopyAligned.testByte|96|25|ns/op|8.70%|-30.14%|-35.73%|0.23%| > |ArrayCopyUnalignedBoth.testByte|65|25|ns/op|37.93%|2.64%|-25.59%|0.92%| > |ArrayCopyUnalignedBoth.testByte|66|25|ns/op|37.58%|-1.15%|-28.15%|0.57%| > |ArrayCopyUnalignedBoth.testByte|67|25|ns/op|39.73%|7.31%|-23.20%|1.03%| > |ArrayCopyUnalignedBoth.testByte|68|25|ns/op|37.07%|3.08%|-24.80%|0.88%| > |ArrayCopyUnalignedBoth.testByte|69|25|ns/op|37.80%|3.15%|-25.15%|1.16%| > |ArrayCopyUnalignedBoth.testByte|70|25|ns/op|37.48%|-1.18%|-28.12%|0.74%| > |ArrayCopyUnalignedBoth.testByte|71|25|ns/op|39.83%|7.74%|-22.95%|1.00%| > |ArrayCopyUnalignedBoth.testByte|72|25|ns/op|37.29%|3.87%|-24.34%|1.03%| > |ArrayCopyUnalignedBoth.testByte|73|25|ns/op|37.71%|3.00%|-25.21%|0.89%| > |ArrayCopyUnalignedBoth.testByte|74|25|ns/op|37.51%|-1.04%|-28.03%|0.79%| > |ArrayCopyUnalignedBoth.testByte|75|25|ns/op|39.83%|7.33%|-23.24%|1.05%| > |ArrayCopyUnalignedBoth.testByte|76|25|ns/op|37.47%|3.41%|-24.78%|0.97%| > |ArrayCopyUnalignedBoth.testByte|77|25|ns/op|37.59%|3.71%|-24.63%|0.96%| > |ArrayCopyUnalignedBoth.testByte|78|25|ns/op|39.23%|-5.11%|-31.84%|0.18%| > |ArrayCopyUnalignedBoth.testByte|79|25|ns/op|40.30%|-5.81%|-32.86%|0.19%| > |ArrayCopyUnalignedBoth.testByte|80|25|ns/op|37.41%|-4.85%|-30.75%|0.22%| > |ArrayCopyUnalignedBoth.testByte|81|25|ns/op|-3.82%|-33.50%|-30.86%|0.17%| > |ArrayCopyUnalignedBoth.testByte|82|25|ns/op|-4.27%|-34.19%|-31.26%|0.23%| > |ArrayCopyUnalignedBoth.testByte|83|25|ns/op|-3.83%|-34.43%|-31.82%|0.23%| > |ArrayCopyUnalignedBoth.testByte|84|25|ns/op|-4.29%|-33.78%|-30.81%|0.14%| > |ArrayCopyUnalignedBoth.testByte|85|25|ns/op|-4.13%|-33.67%|-30.82%|0.15%| > |ArrayCopyUnalignedBoth.testByte|86|25|ns/op|-7.46%|-36.44%|-31.31%|0.28%| > |ArrayCopyUnalignedBoth.testByte|87|25|ns/op|-3.85%|-34.39%|-31.76%|0.18%| > |ArrayCopyUnalignedBoth.testByte|88|25|ns/op|-4.30%|-33.77%|-30.79%|0.19%| > |ArrayCopyUnalignedBoth.testByte|89|25|ns/op|-4.12%|-33.74%|-30.90%|0.16%| > |ArrayCopyUnalignedBoth.testByte|90|25|ns/op|-7.51%|-36.41%|-31.25%|0.63%| > |ArrayCopyUnalignedBoth.testByte|91|25|ns/op|-4.19%|-34.64%|-31.77%|0.16%| > |ArrayCopyUnalignedBoth.testByte|92|25|ns/op|-7.45%|-35.98%|-30.83%|0.42%| > |ArrayCopyUnalignedBoth.testByte|93|25|ns/op|-7.21%|-35.76%|-30.76%|0.47%| > |ArrayCopyUnalignedBoth.testByte|94|25|ns/op|-9.69%|-38.64%|-32.05%|0.38%| > |ArrayCopyUnalignedBoth.testByte|95|25|ns/op|-3.85%|-35.64%|-33.06%|0.37%| > |ArrayCopyUnalignedBoth.testByte|96|25|ns/op|-4.89%|-34.30%|-30.93%|0.25%| > |ArrayCopyUnalignedDst.testByte|65|25|ns/op|48.48%|18.07%|-20.48%|1.29%| > |ArrayCopyUnalignedDst.testByte|66|25|ns/op|48.79%|17.99%|-20.70%|1.58%| > |ArrayCopyUnalignedDst.testByte|67|25|ns/op|49.03%|16.96%|-21.52%|3.64%| > |ArrayCopyUnalignedDst.testByte|68|25|ns/op|49.99%|23.55%|-17.63%|0.97%| > |ArrayCopyUnalignedDst.testByte|69|25|ns/op|49.03%|22.42%|-17.86%|1.33%| > |ArrayCopyUnalignedDst.testByte|70|25|ns/op|49.19%|22.68%|-17.77%|1.23%| > |ArrayCopyUnalignedDst.testByte|71|25|ns/op|48.99%|16.72%|-21.66%|3.30%| > |ArrayCopyUnalignedDst.testByte|72|25|ns/op|50.08%|24.67%|-16.93%|1.02%| > |ArrayCopyUnalignedDst.testByte|73|25|ns/op|49.69%|22.92%|-17.88%|1.29%| > |ArrayCopyUnalignedDst.testByte|74|25|ns/op|49.57%|23.24%|-17.60%|1.14%| > |ArrayCopyUnalignedDst.testByte|75|25|ns/op|49.84%|18.77%|-20.74%|3.32%| > |ArrayCopyUnalignedDst.testByte|76|25|ns/op|50.06%|24.72%|-16.89%|1.09%| > |ArrayCopyUnalignedDst.testByte|77|25|ns/op|49.70%|23.13%|-17.75%|1.24%| > |ArrayCopyUnalignedDst.testByte|78|25|ns/op|49.70%|23.31%|-17.63%|1.37%| > |ArrayCopyUnalignedDst.testByte|79|25|ns/op|49.83%|-2.56%|-34.97%|0.55%| > |ArrayCopyUnalignedDst.testByte|80|25|ns/op|49.84%|-3.07%|-35.31%|0.27%| > |ArrayCopyUnalignedDst.testByte|81|25|ns/op|8.70%|-28.50%|-34.22%|0.20%| > |ArrayCopyUnalignedDst.testByte|82|25|ns/op|13.63%|-24.95%|-33.95%|0.48%| > |ArrayCopyUnalignedDst.testByte|83|25|ns/op|12.38%|-26.46%|-34.56%|0.25%| > |ArrayCopyUnalignedDst.testByte|84|25|ns/op|13.63%|-26.45%|-35.27%|0.39%| > |ArrayCopyUnalignedDst.testByte|85|25|ns/op|10.67%|-27.24%|-34.26%|0.23%| > |ArrayCopyUnalignedDst.testByte|86|25|ns/op|11.70%|-26.56%|-34.25%|0.20%| > |ArrayCopyUnalignedDst.testByte|87|25|ns/op|10.51%|-27.65%|-34.53%|0.27%| > |ArrayCopyUnalignedDst.testByte|88|25|ns/op|8.69%|-29.76%|-35.38%|0.17%| > |ArrayCopyUnalignedDst.testByte|89|25|ns/op|8.69%|-28.64%|-34.35%|0.24%| > |ArrayCopyUnalignedDst.testByte|90|25|ns/op|13.03%|-25.69%|-34.25%|0.26%| > |ArrayCopyUnalignedDst.testByte|91|25|ns/op|11.09%|-27.20%|-34.47%|0.26%| > |ArrayCopyUnalignedDst.testByte|92|25|ns/op|13.46%|-26.68%|-35.38%|0.20%| > |ArrayCopyUnalignedDst.testByte|93|25|ns/op|10.75%|-27.34%|-34.39%|0.22%| > |ArrayCopyUnalignedDst.testByte|94|25|ns/op|11.07%|-27.00%|-34.27%|0.27%| > |ArrayCopyUnalignedDst.testByte|95|25|ns/op|6.67%|-30.77%|-35.11%|0.25%| > |ArrayCopyUnalignedDst.testByte|96|25|ns/op|8.70%|-30.01%|-35.61%|0.17%| > |ArrayCopyUnalignedSrc.testByte|65|25|ns/op|38.80%|-4.97%|-31.53%|0.15%| > |ArrayCopyUnalignedSrc.testByte|66|25|ns/op|38.86%|-4.86%|-31.49%|0.16%| > |ArrayCopyUnalignedSrc.testByte|67|25|ns/op|41.44%|-5.85%|-33.44%|0.48%| > |ArrayCopyUnalignedSrc.testByte|68|25|ns/op|40.06%|-4.59%|-31.88%|0.16%| > |ArrayCopyUnalignedSrc.testByte|69|25|ns/op|38.98%|-4.64%|-31.39%|0.29%| > |ArrayCopyUnalignedSrc.testByte|70|25|ns/op|39.00%|-4.60%|-31.37%|0.26%| > |ArrayCopyUnalignedSrc.testByte|71|25|ns/op|41.20%|-5.49%|-33.07%|0.27%| > |ArrayCopyUnalignedSrc.testByte|72|25|ns/op|40.06%|-4.56%|-31.86%|0.21%| > |ArrayCopyUnalignedSrc.testByte|73|25|ns/op|38.57%|-4.92%|-31.38%|0.19%| > |ArrayCopyUnalignedSrc.testByte|74|25|ns/op|38.70%|-4.83%|-31.38%|0.25%| > |ArrayCopyUnalignedSrc.testByte|75|25|ns/op|41.26%|-5.52%|-33.12%|0.18%| > |ArrayCopyUnalignedSrc.testByte|76|25|ns/op|39.51%|-4.77%|-31.74%|0.20%| > |ArrayCopyUnalignedSrc.testByte|77|25|ns/op|38.54%|-4.81%|-31.29%|0.32%| > |ArrayCopyUnalignedSrc.testByte|78|25|ns/op|38.29%|-5.12%|-31.39%|0.22%| > |ArrayCopyUnalignedSrc.testByte|79|25|ns/op|40.90%|-5.56%|-32.97%|0.33%| > |ArrayCopyUnalignedSrc.testByte|80|25|ns/op|40.10%|-4.82%|-32.06%|0.22%| > |ArrayCopyUnalignedSrc.testByte|81|25|ns/op|-3.84%|-34.15%|-31.52%|0.18%| > |ArrayCopyUnalignedSrc.testByte|82|25|ns/op|-3.89%|-34.12%|-31.45%|0.28%| > |ArrayCopyUnalignedSrc.testByte|83|25|ns/op|-3.85%|-36.20%|-33.64%|0.35%| > |ArrayCopyUnalignedSrc.testByte|84|25|ns/op|-3.85%|-34.71%|-32.09%|0.34%| > |ArrayCopyUnalignedSrc.testByte|85|25|ns/op|-3.83%|-34.11%|-31.49%|0.29%| > |ArrayCopyUnalignedSrc.testByte|86|25|ns/op|-3.83%|-34.18%|-31.56%|0.38%| > |ArrayCopyUnalignedSrc.testByte|87|25|ns/op|-3.84%|-36.04%|-33.48%|0.20%| > |ArrayCopyUnalignedSrc.testByte|88|25|ns/op|-3.84%|-34.65%|-32.04%|0.15%| > |ArrayCopyUnalignedSrc.testByte|89|25|ns/op|-3.84%|-34.03%|-31.39%|0.16%| > |ArrayCopyUnalignedSrc.testByte|90|25|ns/op|-4.32%|-34.37%|-31.40%|0.19%| > |ArrayCopyUnalignedSrc.testByte|91|25|ns/op|-3.84%|-36.08%|-33.52%|0.36%| > |ArrayCopyUnalignedSrc.testByte|92|25|ns/op|-3.84%|-34.41%|-31.79%|0.38%| > |ArrayCopyUnalignedSrc.testByte|93|25|ns/op|-3.85%|-34.04%|-31.40%|0.19%| > |ArrayCopyUnalignedSrc.testByte|94|25|ns/op|-3.82%|-34.07%|-31.45%|0.20%| > |ArrayCopyUnalignedSrc.testByte|95|25|ns/op|-3.84%|-36.01%|-33.45%|0.32%| > |ArrayCopyUnalignedSrc.testByte|96|25|ns/op|-3.88%|-34.93%|-32.30%|0.32%| JMH microbenchmark results for testChar: |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| |-|-|-|-|-|-| |ArrayCopyAligned.testChar|33|25|ns/op|-29.41%|0.73%| |ArrayCopyAligned.testChar|34|25|ns/op|-30.14%|0.99%| |ArrayCopyAligned.testChar|35|25|ns/op|-29.37%|0.44%| |ArrayCopyAligned.testChar|36|25|ns/op|-29.85%|0.70%| |ArrayCopyAligned.testChar|37|25|ns/op|-29.33%|0.65%| |ArrayCopyAligned.testChar|38|25|ns/op|-29.69%|0.52%| |ArrayCopyAligned.testChar|39|25|ns/op|-29.44%|0.79%| |ArrayCopyAligned.testChar|40|25|ns/op|-29.82%|0.82%| |ArrayCopyAligned.testChar|41|25|ns/op|-29.62%|0.74%| |ArrayCopyAligned.testChar|42|25|ns/op|-29.88%|0.61%| |ArrayCopyAligned.testChar|43|25|ns/op|-29.19%|0.64%| |ArrayCopyAligned.testChar|44|25|ns/op|-29.89%|0.71%| |ArrayCopyAligned.testChar|45|25|ns/op|-29.52%|0.80%| |ArrayCopyAligned.testChar|46|25|ns/op|-29.71%|0.58%| |ArrayCopyAligned.testChar|47|25|ns/op|-29.49%|0.71%| |ArrayCopyAligned.testChar|48|25|ns/op|-29.89%|0.91%| |ArrayCopyUnalignedBoth.testChar|33|25|ns/op|-29.04%|0.87%| |ArrayCopyUnalignedBoth.testChar|34|25|ns/op|-29.21%|0.70%| |ArrayCopyUnalignedBoth.testChar|35|25|ns/op|-27.70%|1.22%| |ArrayCopyUnalignedBoth.testChar|36|25|ns/op|-28.68%|1.86%| |ArrayCopyUnalignedBoth.testChar|37|25|ns/op|-27.81%|1.43%| |ArrayCopyUnalignedBoth.testChar|38|25|ns/op|-29.54%|0.61%| |ArrayCopyUnalignedBoth.testChar|39|25|ns/op|-29.89%|0.85%| |ArrayCopyUnalignedBoth.testChar|40|25|ns/op|-30.97%|0.68%| |ArrayCopyUnalignedBoth.testChar|41|25|ns/op|-29.96%|0.78%| |ArrayCopyUnalignedBoth.testChar|42|25|ns/op|-30.79%|0.81%| |ArrayCopyUnalignedBoth.testChar|43|25|ns/op|-29.57%|0.58%| |ArrayCopyUnalignedBoth.testChar|44|25|ns/op|-31.02%|0.34%| |ArrayCopyUnalignedBoth.testChar|45|25|ns/op|-30.05%|0.75%| |ArrayCopyUnalignedBoth.testChar|46|25|ns/op|-30.56%|0.55%| |ArrayCopyUnalignedBoth.testChar|47|25|ns/op|-30.39%|0.52%| |ArrayCopyUnalignedBoth.testChar|48|25|ns/op|-30.94%|0.38%| |ArrayCopyUnalignedDst.testChar|33|25|ns/op|-19.97%|1.08%| |ArrayCopyUnalignedDst.testChar|34|25|ns/op|-16.05%|0.89%| |ArrayCopyUnalignedDst.testChar|35|25|ns/op|-20.83%|1.26%| |ArrayCopyUnalignedDst.testChar|36|25|ns/op|-16.09%|0.77%| |ArrayCopyUnalignedDst.testChar|37|25|ns/op|-20.11%|1.24%| |ArrayCopyUnalignedDst.testChar|38|25|ns/op|-15.26%|0.91%| |ArrayCopyUnalignedDst.testChar|39|25|ns/op|-29.54%|0.56%| |ArrayCopyUnalignedDst.testChar|40|25|ns/op|-29.53%|0.77%| |ArrayCopyUnalignedDst.testChar|41|25|ns/op|-29.52%|0.87%| |ArrayCopyUnalignedDst.testChar|42|25|ns/op|-29.45%|0.77%| |ArrayCopyUnalignedDst.testChar|43|25|ns/op|-29.57%|1.06%| |ArrayCopyUnalignedDst.testChar|44|25|ns/op|-29.69%|0.61%| |ArrayCopyUnalignedDst.testChar|45|25|ns/op|-29.52%|0.83%| |ArrayCopyUnalignedDst.testChar|46|25|ns/op|-29.31%|0.48%| |ArrayCopyUnalignedDst.testChar|47|25|ns/op|-29.64%|0.50%| |ArrayCopyUnalignedDst.testChar|48|25|ns/op|-29.75%|0.22%| |ArrayCopyUnalignedSrc.testChar|33|25|ns/op|-29.33%|0.76%| |ArrayCopyUnalignedSrc.testChar|34|25|ns/op|-30.11%|0.39%| |ArrayCopyUnalignedSrc.testChar|35|25|ns/op|-29.54%|0.80%| |ArrayCopyUnalignedSrc.testChar|36|25|ns/op|-30.07%|0.36%| |ArrayCopyUnalignedSrc.testChar|37|25|ns/op|-29.41%|0.40%| |ArrayCopyUnalignedSrc.testChar|38|25|ns/op|-29.95%|0.32%| |ArrayCopyUnalignedSrc.testChar|39|25|ns/op|-29.39%|0.82%| |ArrayCopyUnalignedSrc.testChar|40|25|ns/op|-29.85%|0.69%| |ArrayCopyUnalignedSrc.testChar|41|25|ns/op|-28.93%|0.67%| |ArrayCopyUnalignedSrc.testChar|42|25|ns/op|-29.50%|0.70%| |ArrayCopyUnalignedSrc.testChar|43|25|ns/op|-28.95%|0.71%| |ArrayCopyUnalignedSrc.testChar|44|25|ns/op|-29.75%|0.66%| |ArrayCopyUnalignedSrc.testChar|45|25|ns/op|-29.02%|0.87%| |ArrayCopyUnalignedSrc.testChar|46|25|ns/op|-29.76%|0.69%| |ArrayCopyUnalignedSrc.testChar|47|25|ns/op|-29.37%|0.50%| |ArrayCopyUnalignedSrc.testChar|48|25|ns/op|-29.71%|0.73%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:05 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 20:17:16 GMT, Eugene Astigeevich wrote: >> JMH microbenchmark results for testByte: >> |Benchmark|Length|Count|Units|ld4 vs simd_off|ldpq vs simd_off|ldpq vs ld4|Maximum Relative Error| >> |-|-|-|-|-|-|-|-| >> |ArrayCopyAligned.testByte|65|25|ns/op|49.34%|-1.91%|-34.32%|0.37%| >> |ArrayCopyAligned.testByte|66|25|ns/op|49.18%|-1.95%|-34.28%|0.27%| >> |ArrayCopyAligned.testByte|67|25|ns/op|49.29%|-1.82%|-34.24%|0.38%| >> |ArrayCopyAligned.testByte|68|25|ns/op|51.10%|-2.61%|-35.55%|0.59%| >> |ArrayCopyAligned.testByte|69|25|ns/op|49.22%|-1.82%|-34.21%|0.36%| >> |ArrayCopyAligned.testByte|70|25|ns/op|49.38%|-1.72%|-34.21%|0.38%| >> |ArrayCopyAligned.testByte|71|25|ns/op|49.34%|-2.06%|-34.42%|0.30%| >> |ArrayCopyAligned.testByte|72|25|ns/op|50.97%|-2.78%|-35.60%|0.65%| >> |ArrayCopyAligned.testByte|73|25|ns/op|50.01%|-1.62%|-34.42%|0.37%| >> |ArrayCopyAligned.testByte|74|25|ns/op|49.81%|-1.84%|-34.48%|0.35%| >> |ArrayCopyAligned.testByte|75|25|ns/op|49.85%|-1.86%|-34.51%|0.35%| >> |ArrayCopyAligned.testByte|76|25|ns/op|51.33%|-2.54%|-35.60%|0.59%| >> |ArrayCopyAligned.testByte|77|25|ns/op|49.72%|-1.81%|-34.42%|0.41%| >> |ArrayCopyAligned.testByte|78|25|ns/op|49.87%|-1.74%|-34.44%|0.37%| >> |ArrayCopyAligned.testByte|79|25|ns/op|49.67%|-1.91%|-34.47%|0.46%| >> |ArrayCopyAligned.testByte|80|25|ns/op|51.35%|-2.77%|-35.76%|0.65%| >> |ArrayCopyAligned.testByte|81|25|ns/op|8.70%|-29.07%|-34.75%|0.35%| >> |ArrayCopyAligned.testByte|82|25|ns/op|13.64%|-25.96%|-34.85%|0.44%| >> |ArrayCopyAligned.testByte|83|25|ns/op|12.04%|-26.80%|-34.66%|0.37%| >> |ArrayCopyAligned.testByte|84|25|ns/op|13.63%|-26.54%|-35.35%|0.46%| >> |ArrayCopyAligned.testByte|85|25|ns/op|11.52%|-27.18%|-34.71%|0.52%| >> |ArrayCopyAligned.testByte|86|25|ns/op|11.59%|-27.15%|-34.71%|0.29%| >> |ArrayCopyAligned.testByte|87|25|ns/op|10.47%|-27.82%|-34.66%|0.29%| >> |ArrayCopyAligned.testByte|88|25|ns/op|8.69%|-29.65%|-35.27%|0.20%| >> |ArrayCopyAligned.testByte|89|25|ns/op|8.70%|-28.86%|-34.56%|0.66%| >> |ArrayCopyAligned.testByte|90|25|ns/op|13.01%|-26.28%|-34.77%|0.28%| >> |ArrayCopyAligned.testByte|91|25|ns/op|10.96%|-27.62%|-34.77%|0.34%| >> |ArrayCopyAligned.testByte|92|25|ns/op|13.26%|-26.76%|-35.33%|0.32%| >> |ArrayCopyAligned.testByte|93|25|ns/op|10.67%|-27.61%|-34.59%|0.63%| >> |ArrayCopyAligned.testByte|94|25|ns/op|11.05%|-27.62%|-34.83%|0.33%| >> |ArrayCopyAligned.testByte|95|25|ns/op|6.69%|-30.16%|-34.54%|0.61%| >> |ArrayCopyAligned.testByte|96|25|ns/op|8.70%|-30.14%|-35.73%|0.23%| >> |ArrayCopyUnalignedBoth.testByte|65|25|ns/op|37.93%|2.64%|-25.59%|0.92%| >> |ArrayCopyUnalignedBoth.testByte|66|25|ns/op|37.58%|-1.15%|-28.15%|0.57%| >> |ArrayCopyUnalignedBoth.testByte|67|25|ns/op|39.73%|7.31%|-23.20%|1.03%| >> |ArrayCopyUnalignedBoth.testByte|68|25|ns/op|37.07%|3.08%|-24.80%|0.88%| >> |ArrayCopyUnalignedBoth.testByte|69|25|ns/op|37.80%|3.15%|-25.15%|1.16%| >> |ArrayCopyUnalignedBoth.testByte|70|25|ns/op|37.48%|-1.18%|-28.12%|0.74%| >> |ArrayCopyUnalignedBoth.testByte|71|25|ns/op|39.83%|7.74%|-22.95%|1.00%| >> |ArrayCopyUnalignedBoth.testByte|72|25|ns/op|37.29%|3.87%|-24.34%|1.03%| >> |ArrayCopyUnalignedBoth.testByte|73|25|ns/op|37.71%|3.00%|-25.21%|0.89%| >> |ArrayCopyUnalignedBoth.testByte|74|25|ns/op|37.51%|-1.04%|-28.03%|0.79%| >> |ArrayCopyUnalignedBoth.testByte|75|25|ns/op|39.83%|7.33%|-23.24%|1.05%| >> |ArrayCopyUnalignedBoth.testByte|76|25|ns/op|37.47%|3.41%|-24.78%|0.97%| >> |ArrayCopyUnalignedBoth.testByte|77|25|ns/op|37.59%|3.71%|-24.63%|0.96%| >> |ArrayCopyUnalignedBoth.testByte|78|25|ns/op|39.23%|-5.11%|-31.84%|0.18%| >> |ArrayCopyUnalignedBoth.testByte|79|25|ns/op|40.30%|-5.81%|-32.86%|0.19%| >> |ArrayCopyUnalignedBoth.testByte|80|25|ns/op|37.41%|-4.85%|-30.75%|0.22%| >> |ArrayCopyUnalignedBoth.testByte|81|25|ns/op|-3.82%|-33.50%|-30.86%|0.17%| >> |ArrayCopyUnalignedBoth.testByte|82|25|ns/op|-4.27%|-34.19%|-31.26%|0.23%| >> |ArrayCopyUnalignedBoth.testByte|83|25|ns/op|-3.83%|-34.43%|-31.82%|0.23%| >> |ArrayCopyUnalignedBoth.testByte|84|25|ns/op|-4.29%|-33.78%|-30.81%|0.14%| >> |ArrayCopyUnalignedBoth.testByte|85|25|ns/op|-4.13%|-33.67%|-30.82%|0.15%| >> |ArrayCopyUnalignedBoth.testByte|86|25|ns/op|-7.46%|-36.44%|-31.31%|0.28%| >> |ArrayCopyUnalignedBoth.testByte|87|25|ns/op|-3.85%|-34.39%|-31.76%|0.18%| >> |ArrayCopyUnalignedBoth.testByte|88|25|ns/op|-4.30%|-33.77%|-30.79%|0.19%| >> |ArrayCopyUnalignedBoth.testByte|89|25|ns/op|-4.12%|-33.74%|-30.90%|0.16%| >> |ArrayCopyUnalignedBoth.testByte|90|25|ns/op|-7.51%|-36.41%|-31.25%|0.63%| >> |ArrayCopyUnalignedBoth.testByte|91|25|ns/op|-4.19%|-34.64%|-31.77%|0.16%| >> |ArrayCopyUnalignedBoth.testByte|92|25|ns/op|-7.45%|-35.98%|-30.83%|0.42%| >> |ArrayCopyUnalignedBoth.testByte|93|25|ns/op|-7.21%|-35.76%|-30.76%|0.47%| >> |ArrayCopyUnalignedBoth.testByte|94|25|ns/op|-9.69%|-38.64%|-32.05%|0.38%| >> |ArrayCopyUnalignedBoth.testByte|95|25|ns/op|-3.85%|-35.64%|-33.06%|0.37%| >> |ArrayCopyUnalignedBoth.testByte|96|25|ns/op|-4.89%|-34.30%|-30.93%|0.25%| >> |ArrayCopyUnalignedDst.testByte|65|25|ns/op|48.48%|18.07%|-20.48%|1.29%| >> |ArrayCopyUnalignedDst.testByte|66|25|ns/op|48.79%|17.99%|-20.70%|1.58%| >> |ArrayCopyUnalignedDst.testByte|67|25|ns/op|49.03%|16.96%|-21.52%|3.64%| >> |ArrayCopyUnalignedDst.testByte|68|25|ns/op|49.99%|23.55%|-17.63%|0.97%| >> |ArrayCopyUnalignedDst.testByte|69|25|ns/op|49.03%|22.42%|-17.86%|1.33%| >> |ArrayCopyUnalignedDst.testByte|70|25|ns/op|49.19%|22.68%|-17.77%|1.23%| >> |ArrayCopyUnalignedDst.testByte|71|25|ns/op|48.99%|16.72%|-21.66%|3.30%| >> |ArrayCopyUnalignedDst.testByte|72|25|ns/op|50.08%|24.67%|-16.93%|1.02%| >> |ArrayCopyUnalignedDst.testByte|73|25|ns/op|49.69%|22.92%|-17.88%|1.29%| >> |ArrayCopyUnalignedDst.testByte|74|25|ns/op|49.57%|23.24%|-17.60%|1.14%| >> |ArrayCopyUnalignedDst.testByte|75|25|ns/op|49.84%|18.77%|-20.74%|3.32%| >> |ArrayCopyUnalignedDst.testByte|76|25|ns/op|50.06%|24.72%|-16.89%|1.09%| >> |ArrayCopyUnalignedDst.testByte|77|25|ns/op|49.70%|23.13%|-17.75%|1.24%| >> |ArrayCopyUnalignedDst.testByte|78|25|ns/op|49.70%|23.31%|-17.63%|1.37%| >> |ArrayCopyUnalignedDst.testByte|79|25|ns/op|49.83%|-2.56%|-34.97%|0.55%| >> |ArrayCopyUnalignedDst.testByte|80|25|ns/op|49.84%|-3.07%|-35.31%|0.27%| >> |ArrayCopyUnalignedDst.testByte|81|25|ns/op|8.70%|-28.50%|-34.22%|0.20%| >> |ArrayCopyUnalignedDst.testByte|82|25|ns/op|13.63%|-24.95%|-33.95%|0.48%| >> |ArrayCopyUnalignedDst.testByte|83|25|ns/op|12.38%|-26.46%|-34.56%|0.25%| >> |ArrayCopyUnalignedDst.testByte|84|25|ns/op|13.63%|-26.45%|-35.27%|0.39%| >> |ArrayCopyUnalignedDst.testByte|85|25|ns/op|10.67%|-27.24%|-34.26%|0.23%| >> |ArrayCopyUnalignedDst.testByte|86|25|ns/op|11.70%|-26.56%|-34.25%|0.20%| >> |ArrayCopyUnalignedDst.testByte|87|25|ns/op|10.51%|-27.65%|-34.53%|0.27%| >> |ArrayCopyUnalignedDst.testByte|88|25|ns/op|8.69%|-29.76%|-35.38%|0.17%| >> |ArrayCopyUnalignedDst.testByte|89|25|ns/op|8.69%|-28.64%|-34.35%|0.24%| >> |ArrayCopyUnalignedDst.testByte|90|25|ns/op|13.03%|-25.69%|-34.25%|0.26%| >> |ArrayCopyUnalignedDst.testByte|91|25|ns/op|11.09%|-27.20%|-34.47%|0.26%| >> |ArrayCopyUnalignedDst.testByte|92|25|ns/op|13.46%|-26.68%|-35.38%|0.20%| >> |ArrayCopyUnalignedDst.testByte|93|25|ns/op|10.75%|-27.34%|-34.39%|0.22%| >> |ArrayCopyUnalignedDst.testByte|94|25|ns/op|11.07%|-27.00%|-34.27%|0.27%| >> |ArrayCopyUnalignedDst.testByte|95|25|ns/op|6.67%|-30.77%|-35.11%|0.25%| >> |ArrayCopyUnalignedDst.testByte|96|25|ns/op|8.70%|-30.01%|-35.61%|0.17%| >> |ArrayCopyUnalignedSrc.testByte|65|25|ns/op|38.80%|-4.97%|-31.53%|0.15%| >> |ArrayCopyUnalignedSrc.testByte|66|25|ns/op|38.86%|-4.86%|-31.49%|0.16%| >> |ArrayCopyUnalignedSrc.testByte|67|25|ns/op|41.44%|-5.85%|-33.44%|0.48%| >> |ArrayCopyUnalignedSrc.testByte|68|25|ns/op|40.06%|-4.59%|-31.88%|0.16%| >> |ArrayCopyUnalignedSrc.testByte|69|25|ns/op|38.98%|-4.64%|-31.39%|0.29%| >> |ArrayCopyUnalignedSrc.testByte|70|25|ns/op|39.00%|-4.60%|-31.37%|0.26%| >> |ArrayCopyUnalignedSrc.testByte|71|25|ns/op|41.20%|-5.49%|-33.07%|0.27%| >> |ArrayCopyUnalignedSrc.testByte|72|25|ns/op|40.06%|-4.56%|-31.86%|0.21%| >> |ArrayCopyUnalignedSrc.testByte|73|25|ns/op|38.57%|-4.92%|-31.38%|0.19%| >> |ArrayCopyUnalignedSrc.testByte|74|25|ns/op|38.70%|-4.83%|-31.38%|0.25%| >> |ArrayCopyUnalignedSrc.testByte|75|25|ns/op|41.26%|-5.52%|-33.12%|0.18%| >> |ArrayCopyUnalignedSrc.testByte|76|25|ns/op|39.51%|-4.77%|-31.74%|0.20%| >> |ArrayCopyUnalignedSrc.testByte|77|25|ns/op|38.54%|-4.81%|-31.29%|0.32%| >> |ArrayCopyUnalignedSrc.testByte|78|25|ns/op|38.29%|-5.12%|-31.39%|0.22%| >> |ArrayCopyUnalignedSrc.testByte|79|25|ns/op|40.90%|-5.56%|-32.97%|0.33%| >> |ArrayCopyUnalignedSrc.testByte|80|25|ns/op|40.10%|-4.82%|-32.06%|0.22%| >> |ArrayCopyUnalignedSrc.testByte|81|25|ns/op|-3.84%|-34.15%|-31.52%|0.18%| >> |ArrayCopyUnalignedSrc.testByte|82|25|ns/op|-3.89%|-34.12%|-31.45%|0.28%| >> |ArrayCopyUnalignedSrc.testByte|83|25|ns/op|-3.85%|-36.20%|-33.64%|0.35%| >> |ArrayCopyUnalignedSrc.testByte|84|25|ns/op|-3.85%|-34.71%|-32.09%|0.34%| >> |ArrayCopyUnalignedSrc.testByte|85|25|ns/op|-3.83%|-34.11%|-31.49%|0.29%| >> |ArrayCopyUnalignedSrc.testByte|86|25|ns/op|-3.83%|-34.18%|-31.56%|0.38%| >> |ArrayCopyUnalignedSrc.testByte|87|25|ns/op|-3.84%|-36.04%|-33.48%|0.20%| >> |ArrayCopyUnalignedSrc.testByte|88|25|ns/op|-3.84%|-34.65%|-32.04%|0.15%| >> |ArrayCopyUnalignedSrc.testByte|89|25|ns/op|-3.84%|-34.03%|-31.39%|0.16%| >> |ArrayCopyUnalignedSrc.testByte|90|25|ns/op|-4.32%|-34.37%|-31.40%|0.19%| >> |ArrayCopyUnalignedSrc.testByte|91|25|ns/op|-3.84%|-36.08%|-33.52%|0.36%| >> |ArrayCopyUnalignedSrc.testByte|92|25|ns/op|-3.84%|-34.41%|-31.79%|0.38%| >> |ArrayCopyUnalignedSrc.testByte|93|25|ns/op|-3.85%|-34.04%|-31.40%|0.19%| >> |ArrayCopyUnalignedSrc.testByte|94|25|ns/op|-3.82%|-34.07%|-31.45%|0.20%| >> |ArrayCopyUnalignedSrc.testByte|95|25|ns/op|-3.84%|-36.01%|-33.45%|0.32%| >> |ArrayCopyUnalignedSrc.testByte|96|25|ns/op|-3.88%|-34.93%|-32.30%|0.32%| > > JMH microbenchmark results for testChar: > |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| > |-|-|-|-|-|-| > |ArrayCopyAligned.testChar|33|25|ns/op|-29.41%|0.73%| > |ArrayCopyAligned.testChar|34|25|ns/op|-30.14%|0.99%| > |ArrayCopyAligned.testChar|35|25|ns/op|-29.37%|0.44%| > |ArrayCopyAligned.testChar|36|25|ns/op|-29.85%|0.70%| > |ArrayCopyAligned.testChar|37|25|ns/op|-29.33%|0.65%| > |ArrayCopyAligned.testChar|38|25|ns/op|-29.69%|0.52%| > |ArrayCopyAligned.testChar|39|25|ns/op|-29.44%|0.79%| > |ArrayCopyAligned.testChar|40|25|ns/op|-29.82%|0.82%| > |ArrayCopyAligned.testChar|41|25|ns/op|-29.62%|0.74%| > |ArrayCopyAligned.testChar|42|25|ns/op|-29.88%|0.61%| > |ArrayCopyAligned.testChar|43|25|ns/op|-29.19%|0.64%| > |ArrayCopyAligned.testChar|44|25|ns/op|-29.89%|0.71%| > |ArrayCopyAligned.testChar|45|25|ns/op|-29.52%|0.80%| > |ArrayCopyAligned.testChar|46|25|ns/op|-29.71%|0.58%| > |ArrayCopyAligned.testChar|47|25|ns/op|-29.49%|0.71%| > |ArrayCopyAligned.testChar|48|25|ns/op|-29.89%|0.91%| > |ArrayCopyUnalignedBoth.testChar|33|25|ns/op|-29.04%|0.87%| > |ArrayCopyUnalignedBoth.testChar|34|25|ns/op|-29.21%|0.70%| > |ArrayCopyUnalignedBoth.testChar|35|25|ns/op|-27.70%|1.22%| > |ArrayCopyUnalignedBoth.testChar|36|25|ns/op|-28.68%|1.86%| > |ArrayCopyUnalignedBoth.testChar|37|25|ns/op|-27.81%|1.43%| > |ArrayCopyUnalignedBoth.testChar|38|25|ns/op|-29.54%|0.61%| > |ArrayCopyUnalignedBoth.testChar|39|25|ns/op|-29.89%|0.85%| > |ArrayCopyUnalignedBoth.testChar|40|25|ns/op|-30.97%|0.68%| > |ArrayCopyUnalignedBoth.testChar|41|25|ns/op|-29.96%|0.78%| > |ArrayCopyUnalignedBoth.testChar|42|25|ns/op|-30.79%|0.81%| > |ArrayCopyUnalignedBoth.testChar|43|25|ns/op|-29.57%|0.58%| > |ArrayCopyUnalignedBoth.testChar|44|25|ns/op|-31.02%|0.34%| > |ArrayCopyUnalignedBoth.testChar|45|25|ns/op|-30.05%|0.75%| > |ArrayCopyUnalignedBoth.testChar|46|25|ns/op|-30.56%|0.55%| > |ArrayCopyUnalignedBoth.testChar|47|25|ns/op|-30.39%|0.52%| > |ArrayCopyUnalignedBoth.testChar|48|25|ns/op|-30.94%|0.38%| > |ArrayCopyUnalignedDst.testChar|33|25|ns/op|-19.97%|1.08%| > |ArrayCopyUnalignedDst.testChar|34|25|ns/op|-16.05%|0.89%| > |ArrayCopyUnalignedDst.testChar|35|25|ns/op|-20.83%|1.26%| > |ArrayCopyUnalignedDst.testChar|36|25|ns/op|-16.09%|0.77%| > |ArrayCopyUnalignedDst.testChar|37|25|ns/op|-20.11%|1.24%| > |ArrayCopyUnalignedDst.testChar|38|25|ns/op|-15.26%|0.91%| > |ArrayCopyUnalignedDst.testChar|39|25|ns/op|-29.54%|0.56%| > |ArrayCopyUnalignedDst.testChar|40|25|ns/op|-29.53%|0.77%| > |ArrayCopyUnalignedDst.testChar|41|25|ns/op|-29.52%|0.87%| > |ArrayCopyUnalignedDst.testChar|42|25|ns/op|-29.45%|0.77%| > |ArrayCopyUnalignedDst.testChar|43|25|ns/op|-29.57%|1.06%| > |ArrayCopyUnalignedDst.testChar|44|25|ns/op|-29.69%|0.61%| > |ArrayCopyUnalignedDst.testChar|45|25|ns/op|-29.52%|0.83%| > |ArrayCopyUnalignedDst.testChar|46|25|ns/op|-29.31%|0.48%| > |ArrayCopyUnalignedDst.testChar|47|25|ns/op|-29.64%|0.50%| > |ArrayCopyUnalignedDst.testChar|48|25|ns/op|-29.75%|0.22%| > |ArrayCopyUnalignedSrc.testChar|33|25|ns/op|-29.33%|0.76%| > |ArrayCopyUnalignedSrc.testChar|34|25|ns/op|-30.11%|0.39%| > |ArrayCopyUnalignedSrc.testChar|35|25|ns/op|-29.54%|0.80%| > |ArrayCopyUnalignedSrc.testChar|36|25|ns/op|-30.07%|0.36%| > |ArrayCopyUnalignedSrc.testChar|37|25|ns/op|-29.41%|0.40%| > |ArrayCopyUnalignedSrc.testChar|38|25|ns/op|-29.95%|0.32%| > |ArrayCopyUnalignedSrc.testChar|39|25|ns/op|-29.39%|0.82%| > |ArrayCopyUnalignedSrc.testChar|40|25|ns/op|-29.85%|0.69%| > |ArrayCopyUnalignedSrc.testChar|41|25|ns/op|-28.93%|0.67%| > |ArrayCopyUnalignedSrc.testChar|42|25|ns/op|-29.50%|0.70%| > |ArrayCopyUnalignedSrc.testChar|43|25|ns/op|-28.95%|0.71%| > |ArrayCopyUnalignedSrc.testChar|44|25|ns/op|-29.75%|0.66%| > |ArrayCopyUnalignedSrc.testChar|45|25|ns/op|-29.02%|0.87%| > |ArrayCopyUnalignedSrc.testChar|46|25|ns/op|-29.76%|0.69%| > |ArrayCopyUnalignedSrc.testChar|47|25|ns/op|-29.37%|0.50%| > |ArrayCopyUnalignedSrc.testChar|48|25|ns/op|-29.71%|0.73%| JMH microbenchmark results for testInt: |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| |-|-|-|-|-|-| |ArrayCopyAligned.testInt|17|25|ns/op|-26.25%|2.08%| |ArrayCopyAligned.testInt|18|25|ns/op|-29.10%|0.04%| |ArrayCopyAligned.testInt|19|25|ns/op|-28.93%|0.19%| |ArrayCopyAligned.testInt|20|25|ns/op|-29.11%|0.06%| |ArrayCopyAligned.testInt|21|25|ns/op|-29.06%|0.06%| |ArrayCopyAligned.testInt|22|25|ns/op|-29.12%|0.13%| |ArrayCopyAligned.testInt|23|25|ns/op|-29.10%|0.04%| |ArrayCopyAligned.testInt|24|25|ns/op|-28.96%|0.16%| |ArrayCopyUnalignedBoth.testInt|17|25|ns/op|-25.34%|2.05%| |ArrayCopyUnalignedBoth.testInt|18|25|ns/op|-28.96%|0.07%| |ArrayCopyUnalignedBoth.testInt|19|25|ns/op|-29.01%|0.09%| |ArrayCopyUnalignedBoth.testInt|20|25|ns/op|-28.95%|0.10%| |ArrayCopyUnalignedBoth.testInt|21|25|ns/op|-29.01%|0.07%| |ArrayCopyUnalignedBoth.testInt|22|25|ns/op|-29.04%|0.12%| |ArrayCopyUnalignedBoth.testInt|23|25|ns/op|-29.01%|0.10%| |ArrayCopyUnalignedBoth.testInt|24|25|ns/op|-29.05%|0.04%| |ArrayCopyUnalignedDst.testInt|17|25|ns/op|-27.63%|3.12%| |ArrayCopyUnalignedDst.testInt|18|25|ns/op|-25.75%|3.44%| |ArrayCopyUnalignedDst.testInt|19|25|ns/op|-29.06%|0.06%| |ArrayCopyUnalignedDst.testInt|20|25|ns/op|-29.07%|0.04%| |ArrayCopyUnalignedDst.testInt|21|25|ns/op|-29.02%|0.07%| |ArrayCopyUnalignedDst.testInt|22|25|ns/op|-29.03%|0.06%| |ArrayCopyUnalignedDst.testInt|23|25|ns/op|-29.01%|0.07%| |ArrayCopyUnalignedDst.testInt|24|25|ns/op|-29.05%|0.07%| |ArrayCopyUnalignedSrc.testInt|17|25|ns/op|-27.76%|1.35%| |ArrayCopyUnalignedSrc.testInt|18|25|ns/op|-28.91%|0.10%| |ArrayCopyUnalignedSrc.testInt|19|25|ns/op|-28.92%|0.12%| |ArrayCopyUnalignedSrc.testInt|20|25|ns/op|-28.91%|0.09%| |ArrayCopyUnalignedSrc.testInt|21|25|ns/op|-28.97%|0.06%| |ArrayCopyUnalignedSrc.testInt|22|25|ns/op|-28.95%|0.29%| |ArrayCopyUnalignedSrc.testInt|23|25|ns/op|-29.01%|0.04%| |ArrayCopyUnalignedSrc.testInt|24|25|ns/op|-28.93%|0.23%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:05 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:05 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 20:57:51 GMT, Eugene Astigeevich wrote: >> JMH microbenchmark results for testChar: >> |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| >> |-|-|-|-|-|-| >> |ArrayCopyAligned.testChar|33|25|ns/op|-29.41%|0.73%| >> |ArrayCopyAligned.testChar|34|25|ns/op|-30.14%|0.99%| >> |ArrayCopyAligned.testChar|35|25|ns/op|-29.37%|0.44%| >> |ArrayCopyAligned.testChar|36|25|ns/op|-29.85%|0.70%| >> |ArrayCopyAligned.testChar|37|25|ns/op|-29.33%|0.65%| >> |ArrayCopyAligned.testChar|38|25|ns/op|-29.69%|0.52%| >> |ArrayCopyAligned.testChar|39|25|ns/op|-29.44%|0.79%| >> |ArrayCopyAligned.testChar|40|25|ns/op|-29.82%|0.82%| >> |ArrayCopyAligned.testChar|41|25|ns/op|-29.62%|0.74%| >> |ArrayCopyAligned.testChar|42|25|ns/op|-29.88%|0.61%| >> |ArrayCopyAligned.testChar|43|25|ns/op|-29.19%|0.64%| >> |ArrayCopyAligned.testChar|44|25|ns/op|-29.89%|0.71%| >> |ArrayCopyAligned.testChar|45|25|ns/op|-29.52%|0.80%| >> |ArrayCopyAligned.testChar|46|25|ns/op|-29.71%|0.58%| >> |ArrayCopyAligned.testChar|47|25|ns/op|-29.49%|0.71%| >> |ArrayCopyAligned.testChar|48|25|ns/op|-29.89%|0.91%| >> |ArrayCopyUnalignedBoth.testChar|33|25|ns/op|-29.04%|0.87%| >> |ArrayCopyUnalignedBoth.testChar|34|25|ns/op|-29.21%|0.70%| >> |ArrayCopyUnalignedBoth.testChar|35|25|ns/op|-27.70%|1.22%| >> |ArrayCopyUnalignedBoth.testChar|36|25|ns/op|-28.68%|1.86%| >> |ArrayCopyUnalignedBoth.testChar|37|25|ns/op|-27.81%|1.43%| >> |ArrayCopyUnalignedBoth.testChar|38|25|ns/op|-29.54%|0.61%| >> |ArrayCopyUnalignedBoth.testChar|39|25|ns/op|-29.89%|0.85%| >> |ArrayCopyUnalignedBoth.testChar|40|25|ns/op|-30.97%|0.68%| >> |ArrayCopyUnalignedBoth.testChar|41|25|ns/op|-29.96%|0.78%| >> |ArrayCopyUnalignedBoth.testChar|42|25|ns/op|-30.79%|0.81%| >> |ArrayCopyUnalignedBoth.testChar|43|25|ns/op|-29.57%|0.58%| >> |ArrayCopyUnalignedBoth.testChar|44|25|ns/op|-31.02%|0.34%| >> |ArrayCopyUnalignedBoth.testChar|45|25|ns/op|-30.05%|0.75%| >> |ArrayCopyUnalignedBoth.testChar|46|25|ns/op|-30.56%|0.55%| >> |ArrayCopyUnalignedBoth.testChar|47|25|ns/op|-30.39%|0.52%| >> |ArrayCopyUnalignedBoth.testChar|48|25|ns/op|-30.94%|0.38%| >> |ArrayCopyUnalignedDst.testChar|33|25|ns/op|-19.97%|1.08%| >> |ArrayCopyUnalignedDst.testChar|34|25|ns/op|-16.05%|0.89%| >> |ArrayCopyUnalignedDst.testChar|35|25|ns/op|-20.83%|1.26%| >> |ArrayCopyUnalignedDst.testChar|36|25|ns/op|-16.09%|0.77%| >> |ArrayCopyUnalignedDst.testChar|37|25|ns/op|-20.11%|1.24%| >> |ArrayCopyUnalignedDst.testChar|38|25|ns/op|-15.26%|0.91%| >> |ArrayCopyUnalignedDst.testChar|39|25|ns/op|-29.54%|0.56%| >> |ArrayCopyUnalignedDst.testChar|40|25|ns/op|-29.53%|0.77%| >> |ArrayCopyUnalignedDst.testChar|41|25|ns/op|-29.52%|0.87%| >> |ArrayCopyUnalignedDst.testChar|42|25|ns/op|-29.45%|0.77%| >> |ArrayCopyUnalignedDst.testChar|43|25|ns/op|-29.57%|1.06%| >> |ArrayCopyUnalignedDst.testChar|44|25|ns/op|-29.69%|0.61%| >> |ArrayCopyUnalignedDst.testChar|45|25|ns/op|-29.52%|0.83%| >> |ArrayCopyUnalignedDst.testChar|46|25|ns/op|-29.31%|0.48%| >> |ArrayCopyUnalignedDst.testChar|47|25|ns/op|-29.64%|0.50%| >> |ArrayCopyUnalignedDst.testChar|48|25|ns/op|-29.75%|0.22%| >> |ArrayCopyUnalignedSrc.testChar|33|25|ns/op|-29.33%|0.76%| >> |ArrayCopyUnalignedSrc.testChar|34|25|ns/op|-30.11%|0.39%| >> |ArrayCopyUnalignedSrc.testChar|35|25|ns/op|-29.54%|0.80%| >> |ArrayCopyUnalignedSrc.testChar|36|25|ns/op|-30.07%|0.36%| >> |ArrayCopyUnalignedSrc.testChar|37|25|ns/op|-29.41%|0.40%| >> |ArrayCopyUnalignedSrc.testChar|38|25|ns/op|-29.95%|0.32%| >> |ArrayCopyUnalignedSrc.testChar|39|25|ns/op|-29.39%|0.82%| >> |ArrayCopyUnalignedSrc.testChar|40|25|ns/op|-29.85%|0.69%| >> |ArrayCopyUnalignedSrc.testChar|41|25|ns/op|-28.93%|0.67%| >> |ArrayCopyUnalignedSrc.testChar|42|25|ns/op|-29.50%|0.70%| >> |ArrayCopyUnalignedSrc.testChar|43|25|ns/op|-28.95%|0.71%| >> |ArrayCopyUnalignedSrc.testChar|44|25|ns/op|-29.75%|0.66%| >> |ArrayCopyUnalignedSrc.testChar|45|25|ns/op|-29.02%|0.87%| >> |ArrayCopyUnalignedSrc.testChar|46|25|ns/op|-29.76%|0.69%| >> |ArrayCopyUnalignedSrc.testChar|47|25|ns/op|-29.37%|0.50%| >> |ArrayCopyUnalignedSrc.testChar|48|25|ns/op|-29.71%|0.73%| > > JMH microbenchmark results for testInt: > |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| > |-|-|-|-|-|-| > |ArrayCopyAligned.testInt|17|25|ns/op|-26.25%|2.08%| > |ArrayCopyAligned.testInt|18|25|ns/op|-29.10%|0.04%| > |ArrayCopyAligned.testInt|19|25|ns/op|-28.93%|0.19%| > |ArrayCopyAligned.testInt|20|25|ns/op|-29.11%|0.06%| > |ArrayCopyAligned.testInt|21|25|ns/op|-29.06%|0.06%| > |ArrayCopyAligned.testInt|22|25|ns/op|-29.12%|0.13%| > |ArrayCopyAligned.testInt|23|25|ns/op|-29.10%|0.04%| > |ArrayCopyAligned.testInt|24|25|ns/op|-28.96%|0.16%| > |ArrayCopyUnalignedBoth.testInt|17|25|ns/op|-25.34%|2.05%| > |ArrayCopyUnalignedBoth.testInt|18|25|ns/op|-28.96%|0.07%| > |ArrayCopyUnalignedBoth.testInt|19|25|ns/op|-29.01%|0.09%| > |ArrayCopyUnalignedBoth.testInt|20|25|ns/op|-28.95%|0.10%| > |ArrayCopyUnalignedBoth.testInt|21|25|ns/op|-29.01%|0.07%| > |ArrayCopyUnalignedBoth.testInt|22|25|ns/op|-29.04%|0.12%| > |ArrayCopyUnalignedBoth.testInt|23|25|ns/op|-29.01%|0.10%| > |ArrayCopyUnalignedBoth.testInt|24|25|ns/op|-29.05%|0.04%| > |ArrayCopyUnalignedDst.testInt|17|25|ns/op|-27.63%|3.12%| > |ArrayCopyUnalignedDst.testInt|18|25|ns/op|-25.75%|3.44%| > |ArrayCopyUnalignedDst.testInt|19|25|ns/op|-29.06%|0.06%| > |ArrayCopyUnalignedDst.testInt|20|25|ns/op|-29.07%|0.04%| > |ArrayCopyUnalignedDst.testInt|21|25|ns/op|-29.02%|0.07%| > |ArrayCopyUnalignedDst.testInt|22|25|ns/op|-29.03%|0.06%| > |ArrayCopyUnalignedDst.testInt|23|25|ns/op|-29.01%|0.07%| > |ArrayCopyUnalignedDst.testInt|24|25|ns/op|-29.05%|0.07%| > |ArrayCopyUnalignedSrc.testInt|17|25|ns/op|-27.76%|1.35%| > |ArrayCopyUnalignedSrc.testInt|18|25|ns/op|-28.91%|0.10%| > |ArrayCopyUnalignedSrc.testInt|19|25|ns/op|-28.92%|0.12%| > |ArrayCopyUnalignedSrc.testInt|20|25|ns/op|-28.91%|0.09%| > |ArrayCopyUnalignedSrc.testInt|21|25|ns/op|-28.97%|0.06%| > |ArrayCopyUnalignedSrc.testInt|22|25|ns/op|-28.95%|0.29%| > |ArrayCopyUnalignedSrc.testInt|23|25|ns/op|-29.01%|0.04%| > |ArrayCopyUnalignedSrc.testInt|24|25|ns/op|-28.93%|0.23%| JMH microbenchmark results for testLong: |Benchmark|Length|Count|Units|ldpq vs ld4|Maximum Relative Error| |-|-|-|-|-|-| |ArrayCopyAligned.testLong|9|25|ns/op|-29.05%|0.04%| |ArrayCopyAligned.testLong|10|25|ns/op|-28.91%|0.06%| |ArrayCopyAligned.testLong|11|25|ns/op|-29.08%|0.06%| |ArrayCopyAligned.testLong|12|25|ns/op|-29.07%|0.04%| |ArrayCopyUnalignedBoth.testLong|9|25|ns/op|-29.08%|0.06%| |ArrayCopyUnalignedBoth.testLong|10|25|ns/op|-2.83%|0.56%| |ArrayCopyUnalignedBoth.testLong|11|25|ns/op|-29.13%|0.04%| |ArrayCopyUnalignedBoth.testLong|12|25|ns/op|-16.06%|0.45%| |ArrayCopyUnalignedDst.testLong|9|25|ns/op|-29.03%|0.06%| |ArrayCopyUnalignedDst.testLong|10|25|ns/op|-28.88%|0.04%| |ArrayCopyUnalignedDst.testLong|11|25|ns/op|-29.02%|0.07%| |ArrayCopyUnalignedDst.testLong|12|25|ns/op|-28.92%|0.07%| |ArrayCopyUnalignedSrc.testLong|9|25|ns/op|-29.11%|0.04%| |ArrayCopyUnalignedSrc.testLong|10|25|ns/op|-29.10%|0.06%| |ArrayCopyUnalignedSrc.testLong|11|25|ns/op|-29.11%|0.04%| |ArrayCopyUnalignedSrc.testLong|12|25|ns/op|-29.12%|0.03%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From kvn at openjdk.java.net Mon Nov 23 21:22:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 21:22:04 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: > 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { > 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { > 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); What about A73? Should this flag be true for it too? ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 22:08:59 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 23 Nov 2020 22:08:59 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> References: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> Message-ID: <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> On Mon, 23 Nov 2020 21:19:34 GMT, Vladimir Kozlov wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: > >> 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { >> 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { >> 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); > > What about A73? Should this flag be true for it too? Hi Vladimir, Thank you for reviewing the changes. Yes, it can be enabled. However I found only HiKey960/970 by Linaro use A73. Other devices are phones. I can do this if Linaro engineers agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Mon Nov 23 23:40:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 23:40:58 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> References: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> Message-ID: On Mon, 23 Nov 2020 22:06:39 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: >> >>> 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { >>> 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { >>> 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); >> >> What about A73? Should this flag be true for it too? > > Hi Vladimir, > Thank you for reviewing the changes. > Yes, it can be enabled. However I found only HiKey960/970 by Linaro use A73. Other devices are phones. > I can do this if Linaro engineers agree. ARM ecosystem is really strange :) In this case keep these changes as it is and let Linaro engineers add it later if they want. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Mon Nov 23 23:40:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 23:40:56 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From dean.long at oracle.com Mon Nov 23 23:56:47 2020 From: dean.long at oracle.com (Dean Long) Date: Mon, 23 Nov 2020 15:56:47 -0800 Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: <6c3210d7-b318-6473-0e85-d1ef7db38277@redhat.com> References: <6c3210d7-b318-6473-0e85-d1ef7db38277@redhat.com> Message-ID: <3b3f942c-8209-495e-8c45-1a6d579d1241@oracle.com> On 11/21/20 1:40 AM, Andrew Haley wrote: > On 11/20/20 5:36 PM, Daniel D.Daugherty wrote: >> A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. > Please post the output of the test failure. What was the hardware? > The details are in https://bugs.openjdk.java.net/browse/JDK-8256359.? I tagged you in my evaluation. It doesn't seem to be hardware-related. dl From redestad at openjdk.java.net Tue Nov 24 00:02:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 00:02:09 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v3] In-Reply-To: References: Message-ID: > By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. > > As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Rename iterator variables in ZGC code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1397/files - new: https://git.openjdk.java.net/jdk/pull/1397/files/b5345e9e..b928fcc1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=01-02 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/1397.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1397/head:pull/1397 PR: https://git.openjdk.java.net/jdk/pull/1397 From kvn at openjdk.java.net Tue Nov 24 00:10:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 00:10:57 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 00:02:09 GMT, Claes Redestad wrote: >> By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. >> >> As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Rename iterator variables in ZGC code Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1397 From redestad at openjdk.java.net Tue Nov 24 00:14:11 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 00:14:11 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v4] In-Reply-To: References: Message-ID: > By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. > > As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove extra commentary in ZGC aarch code (requested by Per Liden) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1397/files - new: https://git.openjdk.java.net/jdk/pull/1397/files/b928fcc1..f259dd3b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=02-03 Stats: 4 lines in 1 file changed: 1 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1397.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1397/head:pull/1397 PR: https://git.openjdk.java.net/jdk/pull/1397 From pliden at openjdk.java.net Tue Nov 24 00:19:01 2020 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 24 Nov 2020 00:19:01 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v4] In-Reply-To: References: Message-ID: <7G6D8ICHXqs14sv46g36CCqdxKY79bzxqPvfXbmNko4=.5248159a-bb24-4d2a-9c15-17972488443d@github.com> On Tue, 24 Nov 2020 00:14:11 GMT, Claes Redestad wrote: >> By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. >> >> As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra commentary in ZGC aarch code (requested by Per Liden) ZGC changes look good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1397 From xgong at openjdk.java.net Tue Nov 24 02:08:55 2020 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Tue, 24 Nov 2020 02:08:55 GMT Subject: Integrated: 8256614: AArch64: Add SVE backend implementation for integer min/max In-Reply-To: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> References: <7tPkd0KX94vB4PF7473CCUATMpYarJQrhKMtl4l01OY=.d36eefe4-3935-499c-af85-edfdfaa63f2d@github.com> Message-ID: On Fri, 20 Nov 2020 07:13:18 GMT, Xiaohong Gong wrote: > Currently the Arm SVE implementation for integer (byte,short,int,long) vector min/max is missing. This is needed for VectorAPI. We need to add them all to avoid the "bad AD file" crash. This pull request has now been integrated. Changeset: 67a95900 Author: Xiaohong Gong Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/67a95900 Stats: 139 lines in 3 files changed: 80 ins; 31 del; 28 mod 8256614: AArch64: Add SVE backend implementation for integer min/max Reviewed-by: adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/1337 From jbhateja at openjdk.java.net Tue Nov 24 02:30:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 24 Nov 2020 02:30:58 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 02:19:53 GMT, Vladimir Kozlov wrote: >>> Forgot to say that failure was on Windows with only avx512f, avx512cd >> >> Thanks Vladimir, I have resolved your review comments. > > Version 15 failed next tests on linux-x64 with -XX:+UseParallelGC -XX:+UseNUMA flags: > vmTestbase/metaspace/stressHierarchy/stressHierarchy015/TestDescription.java > vmTestbase/metaspace/stressHierarchy/stressHierarchy006/TestDescription.java > vmTestbase/metaspace/stressHierarchy/stressHierarchy005/TestDescription.java > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/macroArrayCopy.cpp:861), pid=8205, tid=8216 > # assert(ArrayCopyNode::may_modify(dest_t, (*ctrl)->in(0)->as_MemBar(), &_igvn, ac)) failed: dependency on arraycopy lost > # > # Problematic frame: > # V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc > # > Host: Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz, 8 cores, 58G, Oracle Linux Server release 7.9 > > Current CompileTask: > C2: 27392 5458 4 package_level34_num50.Dummy::composeString (10 bytes) > > Stack: [0x00007f4c5a024000,0x00007f4c5a125000], sp=0x00007f4c5a120420, free space=1009k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc > V [libjvm.so+0x1350741] PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)+0x641 > V [libjvm.so+0x1340d7b] PhaseMacroExpand::expand_macro_nodes()+0xfdb > V [libjvm.so+0x9fe79b] Compile::Optimize()+0x177b > V [libjvm.so+0xa00268] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x17e8 > V [libjvm.so+0x8322ae] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1ce > V [libjvm.so+0xa103f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 > V [libjvm.so+0xa10f48] CompileBroker::compiler_thread_loop()+0x5a8 Hi @vnkozlov , Kindly let me know if there are any other comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From xliu at openjdk.java.net Tue Nov 24 07:00:57 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 24 Nov 2020 07:00:57 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v3] In-Reply-To: References: Message-ID: <0OUXFdIf2Fo88ivdyPMEgVnBIGK1SELyRiPPZRJZcj8=.a8eb0770-672a-44bb-ad71-f9bef49c9d39@github.com> On Fri, 20 Nov 2020 08:48:28 GMT, Nils Eliasson wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - 8247732: validate user-input intrinsic_ids in ControlIntrinsic >> >> avoid a warning of stringop-overflow >> - 8247732: validate user-input intrinsic_ids in ControlIntrinsic > > src/hotspot/share/compiler/compilerDirectives.hpp line 198: > >> 196: if (vmIntrinsics::_none == vmIntrinsics::find_id(*iter)) { >> 197: _bad = NEW_C_HEAP_ARRAY(char, strlen(*iter) + 1, mtCompiler); >> 198: strncpy(_bad, *iter, strlen(*iter) + 1); > > This doesn't compile. Using strlen as an argument to strncpy is disallowed. > >> "warning: 'char* __builtin_strncpy(char*, const char*, long unsigned int)' specified bound depends on the length of the source argument [-Wstringop-overflow=]" > > Do a min between strlen and the maximum allowed length. > > Fix this for both uses of the string length (row 197 and 198). hi, @neliasso, I update my toolchain from g++-8 to g++-10 and successfully reproduce this issue. I have updated my code to fix it. Actually, only one line is modified. @@ -195,7 +195,7 @@ class ControlIntrinsicValidator { for (ControlIntrinsicIter iter(option, disabled_all); *iter != NULL && _valid; ++iter) { if (vmIntrinsics::_none == vmIntrinsics::find_id(*iter)) { _bad = NEW_C_HEAP_ARRAY(char, strlen(*iter) + 1, mtCompiler); - strncpy(_bad, *iter, strlen(*iter) + 1); + strcpy(_bad, *iter); _valid = false; } } The change has passed all regression test. I don't know why the table above doesn't update. https://github.com/navyxliu/jdk/actions/runs/375913238 Could you take a look my patch, again? ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From thartmann at openjdk.java.net Tue Nov 24 07:04:59 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 07:04:59 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:22:44 GMT, Vladimir Ivanov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactored getShiftCon method > > Looks good. Thanks @iwanowww and @vnkozlov for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1384 From xliu at openjdk.java.net Tue Nov 24 07:17:22 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 24 Nov 2020 07:17:22 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v4] In-Reply-To: References: Message-ID: <90i2kov2bcRVbUDiwhZVCwfdtM5ZlfAZJkyO_IOP0tc=.25459180-fecc-4379-b84e-91d126e4d564@github.com> > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'JDK-8247732' of https://github.com/navyxliu/jdk into JDK-8247732 - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8247732: validate user-input intrinsic_ids in ControlIntrinsic - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8247732: validate user-input intrinsic_ids in ControlIntrinsic ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1179/files - new: https://git.openjdk.java.net/jdk/pull/1179/files/4e4d727e..4db938e3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=02-03 Stats: 72616 lines in 373 files changed: 70008 ins; 1699 del; 909 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From rcastanedalo at openjdk.java.net Tue Nov 24 07:17:57 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 24 Nov 2020 07:17:57 GMT Subject: RFR: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well [v2] In-Reply-To: References: Message-ID: <4dJ8mMFyQ8aJWsEXA8QdBOV4U23_EHteIbeMoYN45FE=.34e234b7-4441-43c7-98a0-744dfabb8bb7@github.com> On Mon, 23 Nov 2020 09:37:25 GMT, Roland Westrelin wrote: >> I wondered about that but had no issue during testing. Let me see if I can change your test case to have recursive Ideal() calls. > > I managed to reproduce a similar issue by tweaking your test cases. I just pushed the new test cases + a change that mirrors your fix. Great, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1342 From thartmann at openjdk.java.net Tue Nov 24 07:19:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 07:19:00 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:48:21 GMT, Claes Redestad wrote: > PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. > > By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. > > This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. Changes requested by thartmann (Reviewer). src/hotspot/share/opto/addnode.cpp line 194: > 192: set_req(2, a22); > 193: progress = this; > 194: if (add2->outcnt() == 0 && phase->is_IterGVN()) { Why did you remove the local variable `igvn`? Performance wise it shouldn't make a difference, right? And it's used below. src/hotspot/share/opto/addnode.cpp line 629: > 627: set_req(Address, phase->transform(new AddPNode(in(Base),in(Address),add->in(1)))); > 628: set_req(Offset, add->in(2)); > 629: if (add->outcnt() == 0 && phase->is_IterGVN()) { Again, I think replacing `igvn` by `phase->is_IterGVN()` makes the code less readable. src/hotspot/share/opto/memnode.cpp line 2752: > 2750: > 2751: PhaseIterGVN* igvn = phase->is_IterGVN(); > 2752: if (result != this && igvn != NULL) { Here you did the reverse (replaced `phase->is_IterGVN()` by `igvn` local). src/hotspot/share/opto/phaseX.hpp line 374: > 372: PhaseValues(PhaseValues* pt); > 373: NOT_PRODUCT(~PhaseValues();) > 374: PhaseIterGVN *is_IterGVN() { return (_iterGVN) ? (PhaseIterGVN*)this : NULL; } `PhaseIterGVN *is_IterGVN` -> `PhaseIterGVN* is_IterGVN` src/hotspot/share/opto/phaseX.hpp line 391: > 389: virtual ConNode* uncached_makecon(const Type* t); // override from PhaseTransform > 390: > 391: const Type* saturate(const Type* new_type, const Type* old_type, Indentation should be fixed (in the next line). ------------- PR: https://git.openjdk.java.net/jdk/pull/1385 From chagedorn at openjdk.java.net Tue Nov 24 08:32:05 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 24 Nov 2020 08:32:05 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:18:12 GMT, Tobias Hartmann wrote: >> The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. >> >> I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactored getShiftCon method Looks good to me! src/hotspot/share/opto/mulnode.cpp line 907: > 905: int hi = ~lo; // 00007FFF > 906: const TypeInt *t11 = phase->type(in(1)->in(1))->isa_int(); > 907: if (!t11) return this; While at it, you could also improve the styling here (spacing, asterisk and explicitly checking against `NULL` for `t11`). ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1384 From xliu at openjdk.java.net Tue Nov 24 08:45:19 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 24 Nov 2020 08:45:19 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v5] In-Reply-To: References: Message-ID: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1179/files - new: https://git.openjdk.java.net/jdk/pull/1179/files/4db938e3..8f3a0621 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=03-04 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From shade at openjdk.java.net Tue Nov 24 08:57:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 08:57:56 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v3] In-Reply-To: References: <1L9xggeBg0l8h7Dujr7UYtJ_Yj2-QC-iTZNVKGSScZM=.23123e6f-f4d0-4438-86f5-6a9db635020f@github.com> Message-ID: On Mon, 23 Nov 2020 18:23:18 GMT, Aleksey Shipilev wrote: >> @shipilev Done. Though I'm not a `jdk` Reviewer, so another review is needed for this PR. >> >> But, as the person most familiar with the code, it looks good to me. > > Thanks @JornVernee, no problem. Anyone to formally ack this? Let's unbreak more builds! Current patch makes ARM builds pass, but then it fails at runtime with: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/methodHandles_arm.cpp:233), pid=21071, tid=21076 # assert(ref_kind != 0 || iid == vmIntrinsics::_invokeBasic) failed: must be _invokeBasic or a linkTo intrinsic Let me try and fix that too. ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From shade at openjdk.java.net Tue Nov 24 09:16:14 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 09:16:14 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v4] In-Reply-To: References: Message-ID: > Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux arm fastdebug cross-compilation Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Fix the runtime error as well: stub out _linkToNative - Merge branch 'master' into JDK-8256857-arm-foreign - Merge branch 'master' into JDK-8256857-arm-foreign - Use nullptr instead - Add debug.hpp include as well - 8256857: ARM32 builds broken after JDK-8254231 - 8256857: ARM32 builds broken after JDK-8254231 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1383/files - new: https://git.openjdk.java.net/jdk/pull/1383/files/ff639741..fc8f7392 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=02-03 Stats: 3208 lines in 86 files changed: 1767 ins; 926 del; 515 mod Patch: https://git.openjdk.java.net/jdk/pull/1383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1383/head:pull/1383 PR: https://git.openjdk.java.net/jdk/pull/1383 From shade at openjdk.java.net Tue Nov 24 09:16:15 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 09:16:15 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v4] In-Reply-To: References: <1L9xggeBg0l8h7Dujr7UYtJ_Yj2-QC-iTZNVKGSScZM=.23123e6f-f4d0-4438-86f5-6a9db635020f@github.com> Message-ID: On Tue, 24 Nov 2020 08:55:14 GMT, Aleksey Shipilev wrote: >> Thanks @JornVernee, no problem. Anyone to formally ack this? Let's unbreak more builds! > > Current patch makes ARM builds pass, but then it fails at runtime with: > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/methodHandles_arm.cpp:233), pid=21071, tid=21076 > # assert(ref_kind != 0 || iid == vmIntrinsics::_invokeBasic) failed: must be _invokeBasic or a linkTo intrinsic > > Let me try and fix that too. Runtime error is now resolved the [same way Zero resolves it](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/zero/methodHandles_zero.cpp#L211). ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From stuefe at openjdk.java.net Tue Nov 24 09:33:04 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 09:33:04 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v4] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 09:16:14 GMT, Aleksey Shipilev wrote: >> Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. >> >> Additional testing: >> - [x] Linux arm fastdebug cross-compilation > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Fix the runtime error as well: stub out _linkToNative > - Merge branch 'master' into JDK-8256857-arm-foreign > - Merge branch 'master' into JDK-8256857-arm-foreign > - Use nullptr instead > - Add debug.hpp include as well > - 8256857: ARM32 builds broken after JDK-8254231 > - 8256857: ARM32 builds broken after JDK-8254231 LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1383 From shade at openjdk.java.net Tue Nov 24 09:35:12 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 09:35:12 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v3] In-Reply-To: References: Message-ID: > Reproduces like this: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" > > STDOUT: > ### NanTest started > Written and read back float values match > 0x7F800001 0x7F800001 > STDERR: > java.lang.RuntimeException: Original and read back double values mismatch > 0xFFF0000000000001 0xFFF8000000000001 > > at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) > at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) > > After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. > > Additional testing: > - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux AArch64 Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Do not run the tests when it is known to fail, even for messages ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1187/files - new: https://git.openjdk.java.net/jdk/pull/1187/files/a47ee04c..4e0350f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=01-02 Stats: 39 lines in 1 file changed: 15 ins; 18 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1187.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1187/head:pull/1187 PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Tue Nov 24 09:35:12 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 09:35:12 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v2] In-Reply-To: <2pW5Q5sSG_2g_6Ai0Mz0t3bHlylYHzIbJ3mH9liKT80=.0de5ef5e-3586-4118-ae20-d5a432c081a0@github.com> References: <_6nluorAvosju4tbANDk-v9c4dzfarBXqk7bHNnHm6s=.dc10c8d2-7ef2-4c91-852c-f4f7d2f5caf5@github.com> <2pW5Q5sSG_2g_6Ai0Mz0t3bHlylYHzIbJ3mH9liKT80=.0de5ef5e-3586-4118-ae20-d5a432c081a0@github.com> Message-ID: On Mon, 23 Nov 2020 16:01:25 GMT, Vladimir Kozlov wrote: > What I meant is to run tests only if SSE is used: > > ``` > if (expectStableFloats) { > testFloat(); > } > if (expectStableDoubles) { > testDouble(); > } > ``` Sure, if you want. See the update. x86_32 is currently broken by Foreign Linker integration, so I have not rebased to current master. New commit still fixes x86_32 failure on this test. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From aph at redhat.com Tue Nov 24 10:04:43 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Nov 2020 10:04:43 +0000 Subject: RFR: 8256803: ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64 In-Reply-To: <3b3f942c-8209-495e-8c45-1a6d579d1241@oracle.com> References: <6c3210d7-b318-6473-0e85-d1ef7db38277@redhat.com> <3b3f942c-8209-495e-8c45-1a6d579d1241@oracle.com> Message-ID: <26fafdd2-1aea-67b3-4da1-7465a06e778d@redhat.com> On 11/23/20 11:56 PM, Dean Long wrote: > On 11/21/20 1:40 AM, Andrew Haley wrote: >> On 11/20/20 5:36 PM, Daniel D.Daugherty wrote: >>> A trivial fix to ProblemList runtime/ReservedStack/ReservedStackTestCompiler.java on linux-aarch64. >> Please post the output of the test failure. What was the hardware? >> > > The details are in https://bugs.openjdk.java.net/browse/JDK-8256359.? I > tagged you in my evaluation. > It doesn't seem to be hardware-related. OK, I'll try again to reproduce it. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Tue Nov 24 10:10:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 24 Nov 2020 10:10:57 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> On Wed, 18 Nov 2020 14:10:48 GMT, Evgeny Astigeevich wrote: > This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. > This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From aph at redhat.com Tue Nov 24 10:10:16 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Nov 2020 10:10:16 +0000 Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: <506aebec-c23f-f555-0d27-af4209636bb8@redhat.com> On 11/23/20 9:07 PM, Volker Simonis wrote: > Thanks for the detailed performance numbers. > Looks good to me. The benchmark is missing from the pull request. We can't do anything without that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From redestad at openjdk.java.net Tue Nov 24 10:11:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 10:11:14 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods [v2] In-Reply-To: References: Message-ID: > PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. > > By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. > > This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Some cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1385/files - new: https://git.openjdk.java.net/jdk/pull/1385/files/ffae8ad9..eaf81828 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1385&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1385&range=00-01 Stats: 12 lines in 2 files changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1385.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1385/head:pull/1385 PR: https://git.openjdk.java.net/jdk/pull/1385 From github.com+42899633+eastig at openjdk.java.net Tue Nov 24 10:19:59 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 24 Nov 2020 10:19:59 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley wrote: >> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. >> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. > > I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 11/23/20 9:07 PM, Volker Simonis wrote: > > > Thanks for the detailed performance numbers. > > Looks good to me. > > The benchmark is missing from the pull request. We can't do anything > without that. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From github.com+42899633+eastig at openjdk.java.net Tue Nov 24 10:42:58 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 24 Nov 2020 10:42:58 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: <4jfzYOCU4cvML5ezVxQHHOwsQdQsKQqYNlpERLSYEfg=.c4a4b7ad-d477-4b84-aa64-9c18ca248934@github.com> On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley wrote: > I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. For all modern Cortex-A ldpq is either faster or the same as ld4, e.g see calculation for Cortex-A72 above. I cannot find any optimizations guides for Ampere eMAG, ThunderX/ThunderX2 and HiSilicon TSV110 to check what latencies and throughput ld4/ldpq have on them. I appreciate if someone helps with this. I don't expect non-Cortex implementations differ much from Cortex. The main issue with ld4 is its low throughput. The intent of ld4 as I understand it is to load data and to process it after that. > I'll have a look at some others. Could you please share more information what CPUs you will check? ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From shade at openjdk.java.net Tue Nov 24 11:05:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 11:05:58 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v4] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 09:30:30 GMT, Thomas Stuefe wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Fix the runtime error as well: stub out _linkToNative >> - Merge branch 'master' into JDK-8256857-arm-foreign >> - Merge branch 'master' into JDK-8256857-arm-foreign >> - Use nullptr instead >> - Add debug.hpp include as well >> - 8256857: ARM32 builds broken after JDK-8254231 >> - 8256857: ARM32 builds broken after JDK-8254231 > > LGTM Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From shade at openjdk.java.net Tue Nov 24 11:06:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 11:06:01 GMT Subject: Integrated: 8256857: ARM32 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: <32E_JyV0mJ5DOpUgwMEDYYb-NK3HseBl0xumC9gZe30=.a20fecfc-4d73-49c7-9daa-b6015ffdf689@github.com> On Mon, 23 Nov 2020 13:14:06 GMT, Aleksey Shipilev wrote: > Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux arm fastdebug cross-compilation This pull request has now been integrated. Changeset: 8f7caa43 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/8f7caa43 Stats: 50 lines in 8 files changed: 49 ins; 0 del; 1 mod 8256857: ARM32 builds broken after JDK-8254231 Reviewed-by: jvernee, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From github.com+10482586+erik1iu at openjdk.java.net Tue Nov 24 11:16:13 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Tue, 24 Nov 2020 11:16:13 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 [v2] In-Reply-To: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> > This patch fixed some potential risks in assembler_aarch64.hpp. > > According to the C standard, shift operation is undefined if the shift > count greater than or equals to the length in bits of the promoted left > operand. > > In assembler_aarch64.hpp, there are some utility functions for easily > operating the encoded instructions. E.g. > > Instruction_aarch64::patch(address, int, int, uint64_t) > > All those functions use `(1U << nbits) - 1` to calculate mask which may > have some potential risks if `nbits` equals 32. That would be an > unexpected result if someone intends to deal with an entire instruction. > > To fix this issue, this patch simply uses `1ULL` to replace `1U`. Eric Liu has updated the pull request incrementally with one additional commit since the last revision: uses pre-defined macro `right_n_bits` to get the right-most bits set. Change-Id: I456bcc883434b04527db912adaccc6a5f2dd96a0 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1248/files - new: https://git.openjdk.java.net/jdk/pull/1248/files/08ce2fba..29ed5d66 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1248&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1248&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1248/head:pull/1248 PR: https://git.openjdk.java.net/jdk/pull/1248 From eosterlund at openjdk.java.net Tue Nov 24 11:25:57 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 24 Nov 2020 11:25:57 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v4] In-Reply-To: References: Message-ID: <-LkPyiTdVTc_ycL_ZD8Tx4MBuVTqQ3Q6oI5ftzfMW04=.8ad48725-c0c4-4a49-91d3-40940c6c855e@github.com> On Tue, 24 Nov 2020 00:14:11 GMT, Claes Redestad wrote: >> By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. >> >> As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra commentary in ZGC aarch code (requested by Per Liden) Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1397 From aph at redhat.com Tue Nov 24 13:08:15 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Nov 2020 13:08:15 +0000 Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: On 24/11/2020 10:19, Evgeny Astigeevich wrote: > The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang Sorry, my mistake. I'll try this now. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Tue Nov 24 13:11:01 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 24 Nov 2020 13:11:01 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 [v2] In-Reply-To: <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> Message-ID: On Tue, 24 Nov 2020 11:16:13 GMT, Eric Liu wrote: >> This patch fixed some potential risks in assembler_aarch64.hpp. >> >> According to the C standard, shift operation is undefined if the shift >> count greater than or equals to the length in bits of the promoted left >> operand. >> >> In assembler_aarch64.hpp, there are some utility functions for easily >> operating the encoded instructions. E.g. >> >> Instruction_aarch64::patch(address, int, int, uint64_t) >> >> All those functions use `(1U << nbits) - 1` to calculate mask which may >> have some potential risks if `nbits` equals 32. That would be an >> unexpected result if someone intends to deal with an entire instruction. >> >> To fix this issue, this patch simply uses `1ULL` to replace `1U`. > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > uses pre-defined macro `right_n_bits` to get the right-most bits set. > > Change-Id: I456bcc883434b04527db912adaccc6a5f2dd96a0 Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From chagedorn at openjdk.java.net Tue Nov 24 13:11:07 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 24 Nov 2020 13:11:07 GMT Subject: RFR: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" Message-ID: In the replay compilation, a `CastPP` node belonging to a null-check `If` node floats above its null-check and is then split through a phi. This results in a broken control input for the `CastPP` node which eventually makes the graph unschedulable. The problem can be traced back to the optimization done in `IfNode::simple_subsuming()`: ![missing_data_dependencies](https://user-images.githubusercontent.com/17833009/100085013-cce64500-2e4b-11eb-8773-1364577da8f2.png) The dominating test `8792 If` subsumes the original null-check `6691 If` and its condition is replaced by the constant `230 ConI`. However, we forget to rewire data dependencies, in this case `6694 CastPP`, to the `8794 IfTrue` projection of the dominating test. This is a problem because `6691 If` is not yet removed by igvn but we already apply the Ideal optimization for the newly created `8792 If` node. We find that `8800 If` is an identical test that dominates `8792 If`. All control dependent data nodes are updated to the dominating `8802 IfTrue` projection and the dominated test `8792 If` is removed. But in this step, we do not find the `6694 CastPP` at `8794 IfTrue` because we have not updated its control input and `6691 If` was not yet removed such that it could have ended up at `8794 IfTrue`. The fix is to rewire any data dependencies of the always taken projection of the subsumed test in `IfNode::simple_subsuming()` to the corresponding projection of the dominating test. ------------- Commit messages: - 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" Changes: https://git.openjdk.java.net/jdk/pull/1410/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1410&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256016 Stats: 17 lines in 1 file changed: 16 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1410.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1410/head:pull/1410 PR: https://git.openjdk.java.net/jdk/pull/1410 From neliasso at openjdk.java.net Tue Nov 24 13:22:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 13:22:12 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v2] In-Reply-To: References: Message-ID: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Exclude option is handled in compilecommand_compatibility_init ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1276/files - new: https://git.openjdk.java.net/jdk/pull/1276/files/92eec9d6..8f97b57b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1276/head:pull/1276 PR: https://git.openjdk.java.net/jdk/pull/1276 From neliasso at openjdk.java.net Tue Nov 24 13:25:01 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 13:25:01 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v2] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 12:32:41 GMT, Claes Redestad wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude option is handled in compilecommand_compatibility_init > > A good usability improvement! > > I've messed up some minor details in these commands in one way or another more times than I've gotten things right on the first try and been left wondering why my commands have no effect. This should help avoid a significant portion of easy mistakes. > > I think the error in the old compile command is that int needs to be intx (and of course there's no java/util/String)? @cl4es @vnkozlov I did a very minor update to compilerDirectives.hpp. The Exclude is already handled separately in init_compilecommand_compatibility. This is back to how it was before this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1276 From github.com+42899633+eastig at openjdk.java.net Tue Nov 24 13:39:57 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 24 Nov 2020 13:39:57 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: <4jfzYOCU4cvML5ezVxQHHOwsQdQsKQqYNlpERLSYEfg=.c4a4b7ad-d477-4b84-aa64-9c18ca248934@github.com> References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> <4jfzYOCU4cvML5ezVxQHHOwsQdQsKQqYNlpERLSYEfg=.c4a4b7ad-d477-4b84-aa64-9c18ca248934@github.com> Message-ID: On Tue, 24 Nov 2020 10:40:35 GMT, Evgeny Astigeevich wrote: >> I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others. > >> I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. > > For all modern Cortex-A ldpq is either faster or the same as ld4, e.g see calculation for Cortex-A72 above. I cannot find any optimizations guides for Ampere eMAG, ThunderX/ThunderX2 and HiSilicon TSV110 to check what latencies and throughput ld4/ldpq have on them. I appreciate if someone helps with this. I don't expect non-Cortex implementations differ much from Cortex. > The main issue with ld4 is its low throughput. The intent of ld4 as I understand it is to load data and to process it after that. > >> I'll have a look at some others. > > Could you please share more information what CPUs you will check? > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 24/11/2020 10:19, Evgeny Astigeevich wrote: > > > The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang > > Sorry, my mistake. I'll try this now. > Not a problem. I am new to GitHub reviewing process and the OpenJDK project. I am still learning things. Let me know if I need to run any additional benchmarks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From thartmann at openjdk.java.net Tue Nov 24 13:44:09 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 13:44:09 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v3] In-Reply-To: References: Message-ID: > The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. > > I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Style cleanup in mulnode.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1384/files - new: https://git.openjdk.java.net/jdk/pull/1384/files/9c7234b0..a207039e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=01-02 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1384.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1384/head:pull/1384 PR: https://git.openjdk.java.net/jdk/pull/1384 From thartmann at openjdk.java.net Tue Nov 24 13:44:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 13:44:10 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 08:28:55 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactored getShiftCon method > > Looks good to me! Thanks Christian, I've updated the code accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/1384 From aph at redhat.com Tue Nov 24 13:49:52 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Nov 2020 13:49:52 +0000 Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> <4jfzYOCU4cvML5ezVxQHHOwsQdQsKQqYNlpERLSYEfg=.c4a4b7ad-d477-4b84-aa64-9c18ca248934@github.com> Message-ID: On 24/11/2020 13:39, Evgeny Astigeevich wrote: > I am new to GitHub reviewing process and the OpenJDK project. I am still learning things. > Let me know if I need to run any additional benchmarks. Test output in CSV form would be nice: it's very hard to read the test results you provided, and CSV can make noce graphs. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Nov 24 13:49:54 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Nov 2020 13:49:54 +0000 Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: <01b02efa-5e85-5dc1-4d4f-66dc754b511f@redhat.com> On 24/11/2020 13:08, Andrew Haley wrote: > On 24/11/2020 10:19, Evgeny Astigeevich wrote: >> The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang > > Sorry, my mistake. I'll try this now. Trying on ThunderX 2, it seems that the differences between ldpq/stpq and ld4/st4 are too small to be statistically significant before and after this patch. For that reason, I don't object. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From chagedorn at openjdk.java.net Tue Nov 24 14:05:01 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 24 Nov 2020 14:05:01 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v3] In-Reply-To: References: Message-ID: <_Yf_R2HdxZmebr0DMQNuW6vF03JdckVp3_uhpLj5XQ0=.8dfbc3c1-b690-44cf-b55d-4309448ee3f6@github.com> On Tue, 24 Nov 2020 13:44:09 GMT, Tobias Hartmann wrote: >> The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. >> >> I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Style cleanup in mulnode.cpp Thanks for cleaning this up! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1384 From github.com+42899633+eastig at openjdk.java.net Tue Nov 24 14:14:58 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 24 Nov 2020 14:14:58 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> <4jfzYOCU4cvML5ezVxQHHOwsQdQsKQqYNlpERLSYEfg=.c4a4b7ad-d477-4b84-aa64-9c18ca248934@github.com> Message-ID: On Tue, 24 Nov 2020 13:37:05 GMT, Evgeny Astigeevich wrote: >>> I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. >> >> For all modern Cortex-A ldpq is either faster or the same as ld4, e.g see calculation for Cortex-A72 above. I cannot find any optimizations guides for Ampere eMAG, ThunderX/ThunderX2 and HiSilicon TSV110 to check what latencies and throughput ld4/ldpq have on them. I appreciate if someone helps with this. I don't expect non-Cortex implementations differ much from Cortex. >> The main issue with ld4 is its low throughput. The intent of ld4 as I understand it is to load data and to process it after that. >> >>> I'll have a look at some others. >> >> Could you please share more information what CPUs you will check? > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >> >> On 24/11/2020 10:19, Evgeny Astigeevich wrote: >> >> > The microbenchmarks are ArrayCopy* microbenchmarks which are a part of OpenJDK: https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang >> >> Sorry, my mistake. I'll try this now. >> > > Not a problem. I am new to GitHub reviewing process and the OpenJDK project. I am still learning things. > Let me know if I need to run any additional benchmarks. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 24/11/2020 13:39, Evgeny Astigeevich wrote: > > > I am new to GitHub reviewing process and the OpenJDK project. I am still learning things. > > Let me know if I need to run any additional benchmarks. > > Test output in CSV form would be nice: it's very hard to read the test > results you provided, and CSV can make noce graphs. Thank you for the feedback. It helped me to find how files can be attached to PR. Usually you look for a clip on a panel. Here it is a little bit unusual. :) ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From thartmann at openjdk.java.net Tue Nov 24 14:23:07 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 14:23:07 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 10:11:14 GMT, Claes Redestad wrote: >> PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. >> >> By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. >> >> This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1385 From shade at openjdk.java.net Tue Nov 24 14:25:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 14:25:02 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 [v2] In-Reply-To: References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> Message-ID: On Tue, 24 Nov 2020 13:07:44 GMT, Andrew Haley wrote: >> Eric Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> uses pre-defined macro `right_n_bits` to get the right-most bits set. >> >> Change-Id: I456bcc883434b04527db912adaccc6a5f2dd96a0 > > Marked as reviewed by aph (Reviewer). I am running `tier1` on `aarch64` here, will sponsor once tests complete. ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From neliasso at openjdk.java.net Tue Nov 24 14:34:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 14:34:11 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v3] In-Reply-To: References: Message-ID: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into improve_compile_command - Exclude option is handled in compilecommand_compatibility_init - Make tests debug only - Fix typos - remove typo - Fix CompilerConfigFileWarning test - Fix test - Merge branch 'master' of https://github.com/openjdk/jdk into improve_compile_command - Fixed messages and help text - Clean up error reporting - ... and 7 more: https://git.openjdk.java.net/jdk/compare/7b3d0958...1995972f ------------- Changes: https://git.openjdk.java.net/jdk/pull/1276/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=02 Stats: 939 lines in 22 files changed: 464 ins; 147 del; 328 mod Patch: https://git.openjdk.java.net/jdk/pull/1276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1276/head:pull/1276 PR: https://git.openjdk.java.net/jdk/pull/1276 From redestad at openjdk.java.net Tue Nov 24 14:37:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 14:37:01 GMT Subject: RFR: 8256858: C2: Devirtualize PhaseIterGVN-specific methods [v2] In-Reply-To: References: Message-ID: <5DSqtUmgu3Lr6oTAH_eSjIoHxMv1eN-rBCEinS2h3dU=.fdafc092-d0b5-455e-8270-6822368b74eb@github.com> On Mon, 23 Nov 2020 19:52:58 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Some cleanups > > Good. @vnkozlov @TobiHartmann - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1385 From redestad at openjdk.java.net Tue Nov 24 14:37:02 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 14:37:02 GMT Subject: Integrated: 8256858: C2: Devirtualize PhaseIterGVN-specific methods In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:48:21 GMT, Claes Redestad wrote: > PhaseValues define the virtual method is_IterGVN, which is trivially returning 0(!) for all types except those derived from PhaseIterGVN. Similarly there's igvn_rehash_node_delayed which is virtual and a no-op in base types, but implemented to call rehash_node_delayed in PhaseIterGVN. > > By devirtualizing we allow for more aggressive inlining and slightly better code generation in a few places. > > This increases sizeof(PhaseValues) from 2480 to 2488 on x64. Since we only go through a limited number of phases per compilation this seems acceptable. This pull request has now been integrated. Changeset: f55ae959 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/f55ae959 Stats: 56 lines in 5 files changed: 12 ins; 10 del; 34 mod 8256858: C2: Devirtualize PhaseIterGVN-specific methods Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1385 From redestad at openjdk.java.net Tue Nov 24 14:39:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 14:39:03 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 14:34:11 GMT, Nils Eliasson wrote: >> The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. >> >> 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. >> >> By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. >> >> This: >> -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 >> >> Is superseded by: >> -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 >> >> Attention check: Did you spot the error in the old command? >> >> In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. > > Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into improve_compile_command > - Exclude option is handled in compilecommand_compatibility_init > - Make tests debug only > - Fix typos > - remove typo > - Fix CompilerConfigFileWarning test > - Fix test > - Merge branch 'master' of https://github.com/openjdk/jdk into improve_compile_command > - Fixed messages and help text > - Clean up error reporting > - ... and 7 more: https://git.openjdk.java.net/jdk/compare/7b3d0958...1995972f Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1276 From redestad at openjdk.java.net Tue Nov 24 14:39:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 14:39:01 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 00:07:46 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename iterator variables in ZGC code > > Good. @vnkozlov @pliden @fisk - thank you for reviewing! I ran ZGC tier1-7 testing on linux-x64 and linux-aarch64 overnight, all passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1397 From redestad at openjdk.java.net Tue Nov 24 14:39:03 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 24 Nov 2020 14:39:03 GMT Subject: Integrated: 8256883: C2: Add a RegMask iterator In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 18:01:01 GMT, Claes Redestad wrote: > By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. > > As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). This pull request has now been integrated. Changeset: fa3cfcd0 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/fa3cfcd0 Stats: 93 lines in 6 files changed: 48 ins; 15 del; 30 mod 8256883: C2: Add a RegMask iterator Reviewed-by: kvn, pliden, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/1397 From mdoerr at openjdk.java.net Tue Nov 24 14:52:10 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 14:52:10 GMT Subject: RFR: 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 Message-ID: Some parts methodHandles_.cpp and sharedRuntime_.cpp are missing. Note that C2 compiler still has issues on PPC64. ------------- Commit messages: - 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 Changes: https://git.openjdk.java.net/jdk/pull/1411/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1411&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256924 Stats: 25 lines in 4 files changed: 21 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1411.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1411/head:pull/1411 PR: https://git.openjdk.java.net/jdk/pull/1411 From thartmann at openjdk.java.net Tue Nov 24 15:21:05 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 15:21:05 GMT Subject: RFR: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" [v4] In-Reply-To: References: Message-ID: > The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. > > I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. > > Thanks, > Tobias Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merged with master - Style cleanup in mulnode.cpp - Refactored getShiftCon method - 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" ------------- Changes: https://git.openjdk.java.net/jdk/pull/1384/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1384&range=03 Stats: 185 lines in 5 files changed: 120 ins; 5 del; 60 mod Patch: https://git.openjdk.java.net/jdk/pull/1384.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1384/head:pull/1384 PR: https://git.openjdk.java.net/jdk/pull/1384 From jiefu at openjdk.java.net Tue Nov 24 15:37:05 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 24 Nov 2020 15:37:05 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 Message-ID: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Hi all, The bug was found while I was learning @iwanowww 's patch [1]. As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. It would be better to fix it. Testing: - tier1~3 on Linux/x64 Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/pull/1132 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 ------------- Commit messages: - 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 Changes: https://git.openjdk.java.net/jdk/pull/1413/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1413&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256956 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1413/head:pull/1413 PR: https://git.openjdk.java.net/jdk/pull/1413 From shade at openjdk.java.net Tue Nov 24 15:55:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 15:55:57 GMT Subject: RFR: 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 14:46:07 GMT, Martin Doerr wrote: > Some parts of methodHandles_.cpp and sharedRuntime_.cpp are missing. VM is crashing because vmIntrinsics::_invokeBasic is not yet handled. > Note that C2 compiler still has issues on PPC64. This look okay to me. Consider fixing a minor nit. src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 319: > 317: if (iid == vmIntrinsics::_linkToNative) { > 318: assert(for_compiler_entry, "only compiler entry is supported"); > 319: } I believe it is customary to write conditional asserts like: assert(for_compiler_entry || iid != vmIntrinsics::_linkToNative, "only compiler entry is supported for linkToNative") ...same thing for S390. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1411 From shade at openjdk.java.net Tue Nov 24 15:59:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 15:59:58 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 15:31:39 GMT, Jie Fu wrote: > Hi all, > > The bug was found while I was learning @iwanowww 's patch [1]. > > As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. > So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. > > It would be better to fix it. > > Testing: > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/1132 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 Please wait for #1266 integration to get this tested on x86_32. (I think current patch does not affect it directly, but better be safe than sorry). ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From mdoerr at openjdk.java.net Tue Nov 24 16:05:57 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 16:05:57 GMT Subject: RFR: 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 15:52:44 GMT, Aleksey Shipilev wrote: >> Some parts of methodHandles_.cpp and sharedRuntime_.cpp are missing. VM is crashing because vmIntrinsics::_invokeBasic is not yet handled. >> Note that C2 compiler still has issues on PPC64. > > src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 319: > >> 317: if (iid == vmIntrinsics::_linkToNative) { >> 318: assert(for_compiler_entry, "only compiler entry is supported"); >> 319: } > > I believe it is customary to write conditional asserts like: > assert(for_compiler_entry || iid != vmIntrinsics::_linkToNative, "only compiler entry is supported for linkToNative") > > ...same thing for S390. Thanks for reviewing it! I've copied the code from x86 and I'd prefer to keep it consistent even though I like your version more. Are you ok with it? ------------- PR: https://git.openjdk.java.net/jdk/pull/1411 From shade at openjdk.java.net Tue Nov 24 16:14:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 16:14:57 GMT Subject: RFR: 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 16:03:33 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 319: >> >>> 317: if (iid == vmIntrinsics::_linkToNative) { >>> 318: assert(for_compiler_entry, "only compiler entry is supported"); >>> 319: } >> >> I believe it is customary to write conditional asserts like: >> assert(for_compiler_entry || iid != vmIntrinsics::_linkToNative, "only compiler entry is supported for linkToNative") >> >> ...same thing for S390. > > Thanks for reviewing it! > I've copied the code from x86 and I'd prefer to keep it consistent even though I like your version more. Are you ok with it? Yeah, that's fine. I don't have a strong opinion here. ------------- PR: https://git.openjdk.java.net/jdk/pull/1411 From mdoerr at openjdk.java.net Tue Nov 24 16:27:54 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 16:27:54 GMT Subject: Integrated: 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 14:46:07 GMT, Martin Doerr wrote: > Some parts of methodHandles_.cpp and sharedRuntime_.cpp are missing. VM is crashing because vmIntrinsics::_invokeBasic is not yet handled. > Note that C2 compiler still has issues on PPC64. This pull request has now been integrated. Changeset: 3b3e90ec Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/3b3e90ec Stats: 25 lines in 4 files changed: 21 ins; 0 del; 4 mod 8256924: ppc, ppcle, s390: JVM crashes at VM init after JDK-8254231 Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1411 From thartmann at openjdk.java.net Tue Nov 24 16:55:53 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 16:55:53 GMT Subject: Integrated: 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:26:30 GMT, Tobias Hartmann wrote: > The ideal transformation added by [JDK-8254872](https://bugs.openjdk.java.net/browse/JDK-8254872) converts `RotateLeftNode(val, shift)` into `RotateRightNode(val, 32/64 - (shift & 31/63))`. If `shift` later becomes zero, we end up trying to emit a rotate with a 32/64 shift value which triggers an assert. > > I've added an identity transformation similar to what is implemented for ShiftNodes that takes care of this. I've also noticed that the corresponding assert is not strong enough for `roll` and `rorl` (probably the author used the assert corresponding to the 64-bit version by accident). The patch also includes some refactoring. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 1c4c99ea Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/1c4c99ea Stats: 185 lines in 5 files changed: 120 ins; 5 del; 60 mod 8256823: C2 compilation fails with "assert(isShiftCount(imm8 >> 1)) failed: illegal shift count" Reviewed-by: vlivanov, kvn, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/1384 From thartmann at openjdk.java.net Tue Nov 24 17:07:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 17:07:56 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 15:56:55 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> The bug was found while I was learning @iwanowww 's patch [1]. >> >> As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. >> So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. >> >> It would be better to fix it. >> >> Testing: >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/pull/1132 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 > > Please wait for #1266 integration to get this tested on x86_32. (I think current patch does not affect it directly, but better be safe than sorry). Just wondering, is that value even used? I did a quick search and couldn't find any usages. ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From kvn at openjdk.java.net Tue Nov 24 17:10:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 17:10:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v17] In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 21:04:56 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing special handling for constant length, GVN will remove dead stub blocks in case constant length is less than partial inline size. tier1-tier4 passed without new failures ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Tue Nov 24 17:17:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 17:17:57 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 09:35:12 GMT, Aleksey Shipilev wrote: >> Reproduces like this: >> >> $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" >> >> STDOUT: >> ### NanTest started >> Written and read back float values match >> 0x7F800001 0x7F800001 >> STDERR: >> java.lang.RuntimeException: Original and read back double values mismatch >> 0xFFF0000000000001 0xFFF8000000000001 >> >> at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) >> at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) >> >> After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. >> >> Additional testing: >> - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux AArch64 > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Do not run the tests when it is known to fail, even for messages Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Tue Nov 24 17:31:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 17:31:57 GMT Subject: RFR: 8256387: Unexpected result if patching an entire instruction on AArch64 [v2] In-Reply-To: <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> <5_KlTlo-EutwKdU91sXNus-fuxJI9tvI3ThAig1rIe8=.67988a5e-4b7a-4a1f-b7a7-10f813690ba0@github.com> Message-ID: <7sGE-rFdNBQ25sodIPHH9b2diEPNTFBthqVJrzw_64A=.ee3b3bb4-6099-4430-a108-ff47b2262c1e@github.com> On Tue, 24 Nov 2020 11:16:13 GMT, Eric Liu wrote: >> This patch fixed some potential risks in assembler_aarch64.hpp. >> >> According to the C standard, shift operation is undefined if the shift >> count greater than or equals to the length in bits of the promoted left >> operand. >> >> In assembler_aarch64.hpp, there are some utility functions for easily >> operating the encoded instructions. E.g. >> >> Instruction_aarch64::patch(address, int, int, uint64_t) >> >> All those functions use `(1U << nbits) - 1` to calculate mask which may >> have some potential risks if `nbits` equals 32. That would be an >> unexpected result if someone intends to deal with an entire instruction. >> >> To fix this issue, this patch simply uses `1ULL` to replace `1U`. > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > uses pre-defined macro `right_n_bits` to get the right-most bits set. > > Change-Id: I456bcc883434b04527db912adaccc6a5f2dd96a0 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From github.com+10482586+erik1iu at openjdk.java.net Tue Nov 24 17:31:58 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Tue, 24 Nov 2020 17:31:58 GMT Subject: Integrated: 8256387: Unexpected result if patching an entire instruction on AArch64 In-Reply-To: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> References: <2bCYsl_nX5Mza7YJn1_tPavBAGuBPIm7Pruvr2yi2LM=.ea838014-79ac-4255-b31d-c275f6863060@github.com> Message-ID: On Tue, 17 Nov 2020 06:14:37 GMT, Eric Liu wrote: > This patch fixed some potential risks in assembler_aarch64.hpp. > > According to the C standard, shift operation is undefined if the shift > count greater than or equals to the length in bits of the promoted left > operand. > > In assembler_aarch64.hpp, there are some utility functions for easily > operating the encoded instructions. E.g. > > Instruction_aarch64::patch(address, int, int, uint64_t) > > All those functions use `(1U << nbits) - 1` to calculate mask which may > have some potential risks if `nbits` equals 32. That would be an > unexpected result if someone intends to deal with an entire instruction. > > To fix this issue, this patch simply uses `1ULL` to replace `1U`. This pull request has now been integrated. Changeset: f1d6e8db Author: Eric Liu Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/f1d6e8db Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod 8256387: Unexpected result if patching an entire instruction on AArch64 Reviewed-by: shade, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1248 From kvn at openjdk.java.net Tue Nov 24 17:35:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 17:35:58 GMT Subject: RFR: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 13:05:32 GMT, Christian Hagedorn wrote: > In the replay compilation, a `CastPP` node belonging to a null-check `If` node floats above its null-check and is then split through a phi. This results in a broken control input for the `CastPP` node which eventually makes the graph unschedulable. > > The problem can be traced back to the optimization done in `IfNode::simple_subsuming()`: > > ![missing_data_dependencies](https://user-images.githubusercontent.com/17833009/100085013-cce64500-2e4b-11eb-8773-1364577da8f2.png) > > The dominating test `8792 If` subsumes the original null-check `6691 If` and its condition is replaced by the constant `230 ConI`. However, we forget to rewire data dependencies, in this case `6694 CastPP`, to the `8794 IfTrue` projection of the dominating test. This is a problem because `6691 If` is not yet removed by igvn but we already apply the Ideal optimization for the newly created `8792 If` node. We find that `8800 If` is an identical test that dominates `8792 If`. All control dependent data nodes are updated to the dominating `8802 IfTrue` projection and the dominated test `8792 If` is removed. But in this step, we do not find the `6694 CastPP` at `8794 IfTrue` because we have not updated its control input and `6691 If` was not yet removed such that it could have ended up at `8794 IfTrue`. > > The fix is to rewire any data dependencies of the always taken projection of the subsumed test in `IfNode::simple_subsuming()` to the corresponding projection of the dominating test. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1410 From iignatyev at openjdk.java.net Tue Nov 24 17:35:58 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 24 Nov 2020 17:35:58 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 17:14:49 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not run the tests when it is known to fail, even for messages > > Good. if `expectStableFloats` is false, we basically don't do any testing, so I'd prefer us to throw `SkippedException` to have more realistic test results. and for a case `expectStableFloats` being false, it would be nice to at least print a message saying we didn't do that part. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Tue Nov 24 17:39:54 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 17:39:54 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 17:05:19 GMT, Tobias Hartmann wrote: > Just wondering, is that value even used? I did a quick search and couldn't find any usages. When I grep for `ConcreteRegisterImpl::number_of_registers`, it seems to have quite a few hits in x86 and shared code. ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From thartmann at openjdk.java.net Tue Nov 24 17:46:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 24 Nov 2020 17:46:56 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 17:37:26 GMT, Aleksey Shipilev wrote: >> Just wondering, is that value even used? I did a quick search and couldn't find any usages. > >> Just wondering, is that value even used? I did a quick search and couldn't find any usages. > > When I grep for `ConcreteRegisterImpl::number_of_registers`, it seems to have quite a few hits in x86 and shared code. Indeed. I've searched for the wrong value. ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From kvn at openjdk.java.net Tue Nov 24 17:51:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 17:51:00 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v3] In-Reply-To: References: Message-ID: <_Bs85thiBsYR8xRZkjm8uwpW1wOWSe6MB-gZLwfoO08=.9ba01ece-48b4-49da-b7b6-1d8d39b92aff@github.com> On Tue, 24 Nov 2020 14:34:11 GMT, Nils Eliasson wrote: >> The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. >> >> 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. >> >> By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. >> >> This: >> -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 >> >> Is superseded by: >> -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 >> >> Attention check: Did you spot the error in the old command? >> >> In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. > > Nils Eliasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into improve_compile_command > - Exclude option is handled in compilecommand_compatibility_init > - Make tests debug only > - Fix typos > - remove typo > - Fix CompilerConfigFileWarning test > - Fix test > - Merge branch 'master' of https://github.com/openjdk/jdk into improve_compile_command > - Fixed messages and help text > - Clean up error reporting > - ... and 7 more: https://git.openjdk.java.net/jdk/compare/7b3d0958...1995972f Please, run pre-integration testing before push. Latest update is fine to me but I am concern that GIT testing has failures in builds. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1276 From mdoerr at openjdk.java.net Tue Nov 24 17:58:01 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 17:58:01 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode Message-ID: observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. That node represents a leaf call and does not safepoint. In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. This was already fixed on some other platforms. ------------- Commit messages: - 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode Changes: https://git.openjdk.java.net/jdk/pull/1418/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1418&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256986 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1418.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1418/head:pull/1418 PR: https://git.openjdk.java.net/jdk/pull/1418 From shade at openjdk.java.net Tue Nov 24 18:05:08 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 18:05:08 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v4] In-Reply-To: References: Message-ID: > Reproduces like this: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" > > STDOUT: > ### NanTest started > Written and read back float values match > 0x7F800001 0x7F800001 > STDERR: > java.lang.RuntimeException: Original and read back double values mismatch > 0xFFF0000000000001 0xFFF8000000000001 > > at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) > at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) > > After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. > > Additional testing: > - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux AArch64 Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use SkippedException, print skipped steps ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1187/files - new: https://git.openjdk.java.net/jdk/pull/1187/files/4e0350f2..13cd68fd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1187&range=02-03 Stats: 11 lines in 1 file changed: 8 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1187.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1187/head:pull/1187 PR: https://git.openjdk.java.net/jdk/pull/1187 From shade at openjdk.java.net Tue Nov 24 18:05:09 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 18:05:09 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 17:33:35 GMT, Igor Ignatyev wrote: > if `expectStableFloats` is false, we basically don't do any testing, so I'd prefer us to throw `SkippedException` to have more realistic test results. and for a case `expectStableFloats` being false, it would be nice to at least print a message saying we didn't do that part. Well, my previous version used to print out the results even when skipped. `expectStableDoubles` implies `expectStableFloats` due to implementation specifics, so we better not rely on that. See new patch: prints messages when skipping either test, and throws `SkippedException` if no test cases were run. ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From iignatyev at openjdk.java.net Tue Nov 24 18:14:06 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 24 Nov 2020 18:14:06 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v4] In-Reply-To: References: Message-ID: <1NFNsAU6ldTDdrXAEi48mE8FTU4YvmT3jOMWmDgmW-A=.dab16ae8-bbdb-4003-b07a-225f66fab1c3@github.com> On Tue, 24 Nov 2020 18:05:08 GMT, Aleksey Shipilev wrote: >> Reproduces like this: >> >> $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" >> >> STDOUT: >> ### NanTest started >> Written and read back float values match >> 0x7F800001 0x7F800001 >> STDERR: >> java.lang.RuntimeException: Original and read back double values mismatch >> 0xFFF0000000000001 0xFFF8000000000001 >> >> at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) >> at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) >> >> After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. >> >> Additional testing: >> - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux AArch64 > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use SkippedException, print skipped steps Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From kvn at openjdk.java.net Tue Nov 24 18:29:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 18:29:57 GMT Subject: RFR: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE [v4] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 18:05:08 GMT, Aleksey Shipilev wrote: >> Reproduces like this: >> >> $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" >> >> STDOUT: >> ### NanTest started >> Written and read back float values match >> 0x7F800001 0x7F800001 >> STDERR: >> java.lang.RuntimeException: Original and read back double values mismatch >> 0xFFF0000000000001 0xFFF8000000000001 >> >> at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) >> at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) >> >> After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. >> >> Additional testing: >> - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` >> - [x] Affected test on Linux AArch64 > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use SkippedException, print skipped steps Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From neliasso at openjdk.java.net Tue Nov 24 18:33:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 18:33:08 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v4] In-Reply-To: References: Message-ID: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Fix bad merge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1276/files - new: https://git.openjdk.java.net/jdk/pull/1276/files/1995972f..7e0acea8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1276&range=02-03 Stats: 6 lines in 1 file changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1276/head:pull/1276 PR: https://git.openjdk.java.net/jdk/pull/1276 From neliasso at openjdk.java.net Tue Nov 24 19:08:59 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 19:08:59 GMT Subject: RFR: 8256508: Improve CompileCommand flag [v3] In-Reply-To: <_Bs85thiBsYR8xRZkjm8uwpW1wOWSe6MB-gZLwfoO08=.9ba01ece-48b4-49da-b7b6-1d8d39b92aff@github.com> References: <_Bs85thiBsYR8xRZkjm8uwpW1wOWSe6MB-gZLwfoO08=.9ba01ece-48b4-49da-b7b6-1d8d39b92aff@github.com> Message-ID: On Tue, 24 Nov 2020 17:48:17 GMT, Vladimir Kozlov wrote: > Please, run pre-integration testing before push. > Latest update is fine to me but I am concern that GIT testing has failures in builds. Yes - I did a bad merge. Fixed it now. Will do a proper round of testing before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1276 From shade at openjdk.java.net Tue Nov 24 19:36:54 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 19:36:54 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 12:58:29 GMT, Claes Redestad wrote: >> JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: >> >> 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. >> 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. >> 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. >> >> Supporting this directly in compilers would improve nanobenchmark fidelity. >> >> Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). >> >> Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. >> >> C1 code is platform-independent, and it adds the new node which is then lowered to nothing. >> >> C2 code is more complicated. I tried to introduce new node and hook arguments there, but failed. There seems to be no way to model the effects we are after: consume the value, but have no observable side effects. Roland suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. This drags the blackhole through C2 as if it has call-like side effects, and then emits nothing. On the downside, it requires fiddling with arch-specific code in every .ad. > > Looks like a reasonable enhancement. > > Should the `BlackholeCommand` be a new option instead of a new top-level command? #1276 is about to improve the structure and usability of `CompileCommand=option` a lot so I suspect it'll be about as straightforward implementation-wise and not that much worse to use (`-XX:CompileCommand=Blackhole,,true`). > > When possible I think predicates such as `supports_blackhole` should be modelled as `static const bool` fields. Most compilers can't inline methods defined in `Matcher`, even from code in `matcher.cpp`. See `Matcher::misaligned_doubles_ok` et.c. Yeah, I can remodel `should_blackhole` as `static const bool` and look at #1276 (which I believe integrates soon), and finally pick up the build fixes after foreign-linker integration. But more importantly, I'd like to see if doing this in `.ad` is even acceptable. That's my major question here. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From xliu at openjdk.java.net Tue Nov 24 21:37:19 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 24 Nov 2020 21:37:19 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v6] In-Reply-To: References: Message-ID: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8247732: validate user-input intrinsic_ids in ControlIntrinsic ------------- Changes: https://git.openjdk.java.net/jdk/pull/1179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=05 Stats: 545 lines in 31 files changed: 522 ins; 2 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From jiefu at openjdk.java.net Tue Nov 24 23:46:56 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 24 Nov 2020 23:46:56 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 17:43:53 GMT, Tobias Hartmann wrote: >>> Just wondering, is that value even used? I did a quick search and couldn't find any usages. >> >> When I grep for `ConcreteRegisterImpl::number_of_registers`, it seems to have quite a few hits in x86 and shared code. > > Indeed. I've searched for the wrong value. Hi @TobiHartmann and @shipilev , Thanks for looking at this. The wrong value really confused me while I was trying to understand the code. Before this fix, RegisterImpl::max_slots_per_register seems not to be used directly on x86. That may be why the bug didn't get exposed before. I've notice that RegisterImpl::max_slots_per_register is used to calculate the value of ConcreteRegisterImpl::number_of_registers in aarch64 [1]. So I think it a good idea to use it in x86 to eliminate '#ifdef AMD64' [2], which seems to improve the readability of the code. What do you think? Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/register_aarch64.hpp#L295 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L260 ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From jbhateja at openjdk.java.net Wed Nov 25 06:11:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 25 Nov 2020 06:11:58 GMT Subject: Integrated: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 15:24:41 GMT, Jatin Bhateja wrote: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) This pull request has now been integrated. Changeset: 0d91f0a1 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/0d91f0a1 Stats: 493 lines in 25 files changed: 448 ins; 23 del; 22 mod 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From vlivanov at openjdk.java.net Wed Nov 25 08:13:55 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 08:13:55 GMT Subject: RFR: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 13:05:32 GMT, Christian Hagedorn wrote: > In the replay compilation, a `CastPP` node belonging to a null-check `If` node floats above its null-check and is then split through a phi. This results in a broken control input for the `CastPP` node which eventually makes the graph unschedulable. > > The problem can be traced back to the optimization done in `IfNode::simple_subsuming()`: > > ![missing_data_dependencies](https://user-images.githubusercontent.com/17833009/100085013-cce64500-2e4b-11eb-8773-1364577da8f2.png) > > The dominating test `8792 If` subsumes the original null-check `6691 If` and its condition is replaced by the constant `230 ConI`. However, we forget to rewire data dependencies, in this case `6694 CastPP`, to the `8794 IfTrue` projection of the dominating test. This is a problem because `6691 If` is not yet removed by igvn but we already apply the Ideal optimization for the newly created `8792 If` node. We find that `8800 If` is an identical test that dominates `8792 If`. All control dependent data nodes are updated to the dominating `8802 IfTrue` projection and the dominated test `8792 If` is removed. But in this step, we do not find the `6694 CastPP` at `8794 IfTrue` because we have not updated its control input and `6691 If` was not yet removed such that it could have ended up at `8794 IfTrue`. > > The fix is to rewire any data dependencies of the always taken projection of the subsumed test in `IfNode::simple_subsuming()` to the corresponding projection of the dominating test. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1410 From thartmann at openjdk.java.net Wed Nov 25 08:23:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 25 Nov 2020 08:23:57 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 15:31:39 GMT, Jie Fu wrote: > Hi all, > > The bug was found while I was learning @iwanowww 's patch [1]. > > As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. > So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. > > It would be better to fix it. > > Testing: > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/1132 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From thartmann at openjdk.java.net Wed Nov 25 08:23:58 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 25 Nov 2020 08:23:58 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 23:43:56 GMT, Jie Fu wrote: >> Indeed. I've searched for the wrong value. > > Hi @TobiHartmann and @shipilev , > > Thanks for looking at this. > > The wrong value really confused me while I was trying to understand the code. > > Before this fix, RegisterImpl::max_slots_per_register seems not to be used directly on x86. > That may be why the bug didn't get exposed before. > > I've notice that RegisterImpl::max_slots_per_register is used to calculate the value of ConcreteRegisterImpl::number_of_registers in aarch64 [1]. > So I think it a good idea to use it in x86 to eliminate '#ifdef AMD64' [2], which seems to improve the readability of the code. > > What do you think? > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/register_aarch64.hpp#L295 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L260 Right, so `RegisterImpl::max_slots_per_register` which was introduced in JDK 9 by [JDK-8076276](https://bugs.openjdk.java.net/browse/JDK-8076276) does indeed not have any usages in current code. That explains why this was never an issue. Your fix looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From chagedorn at openjdk.java.net Wed Nov 25 08:39:54 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 25 Nov 2020 08:39:54 GMT Subject: RFR: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 17:33:39 GMT, Vladimir Kozlov wrote: >> In the replay compilation, a `CastPP` node belonging to a null-check `If` node floats above its null-check and is then split through a phi. This results in a broken control input for the `CastPP` node which eventually makes the graph unschedulable. >> >> The problem can be traced back to the optimization done in `IfNode::simple_subsuming()`: >> >> ![missing_data_dependencies](https://user-images.githubusercontent.com/17833009/100085013-cce64500-2e4b-11eb-8773-1364577da8f2.png) >> >> The dominating test `8792 If` subsumes the original null-check `6691 If` and its condition is replaced by the constant `230 ConI`. However, we forget to rewire data dependencies, in this case `6694 CastPP`, to the `8794 IfTrue` projection of the dominating test. This is a problem because `6691 If` is not yet removed by igvn but we already apply the Ideal optimization for the newly created `8792 If` node. We find that `8800 If` is an identical test that dominates `8792 If`. All control dependent data nodes are updated to the dominating `8802 IfTrue` projection and the dominated test `8792 If` is removed. But in this step, we do not find the `6694 CastPP` at `8794 IfTrue` because we have not updated its control input and `6691 If` was not yet removed such that it could have ended up at `8794 IfTrue`. >> >> The fix is to rewire any data dependencies of the always taken projection of the subsumed test in `IfNode::simple_subsuming()` to the corresponding projection of the dominating test. > > Looks good. @vnkozlov @iwanowww Thanks for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1410 From adinn at openjdk.java.net Wed Nov 25 10:29:00 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 25 Nov 2020 10:29:00 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 19:34:06 GMT, Aleksey Shipilev wrote: > But more importantly, I'd like to see if doing this in .ad is even acceptable. That's my major question here. The problem being that this introduces extra call handling logic into the ad file which clouds the implementation of the arch-specific handling of a `CallJavaNode` Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a `CallJavaNode` vs reducing it to nothing? ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From redestad at openjdk.java.net Wed Nov 25 11:05:02 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 11:05:02 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation Message-ID: - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. ------------- Commit messages: - Merge branch 'master' into verify_section_alloc - Merge branch 'master' into verify_section_alloc - Ptrs schmtrs - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) - 8254360: Re-examine use of CodeBuffer::verify_section_allocation Changes: https://git.openjdk.java.net/jdk/pull/1421/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1421&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254360 Stats: 23 lines in 2 files changed: 7 ins; 2 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/1421.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1421/head:pull/1421 PR: https://git.openjdk.java.net/jdk/pull/1421 From shade at openjdk.java.net Wed Nov 25 11:58:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 11:58:55 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 10:25:50 GMT, Andrew Dinn wrote: > Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a `CallJavaNode` vs reducing it to nothing? Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching. We still have to match `CallJavaNode` to something, if not to `MachCallJavaNode`. For that, we need `.ad` match rules, AFAICS. I am exploring if we I can instead do `CallBlackholeNode` as the subclass of `CallJavaNode`, and match it directly. It would also allow ask Matcher if underlying `.ad` supports blackholes by doing `is_match_rule_supported`. It would still add match rules for blackholes to `.ad`-s, but at least it would not need to inject into current match rules. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From vlivanov at openjdk.java.net Wed Nov 25 12:20:53 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 12:20:53 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 15:31:39 GMT, Jie Fu wrote: > Hi all, > > The bug was found while I was learning @iwanowww 's patch [1]. > > As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. > So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. > > It would be better to fix it. > > Testing: > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/1132 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1413 From vlivanov at openjdk.java.net Wed Nov 25 12:45:59 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 12:45:59 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 11:56:12 GMT, Aleksey Shipilev wrote: >>> But more importantly, I'd like to see if doing this in .ad is even acceptable. That's my major question here. >> >> The problem being that this introduces extra call handling logic into the ad file which clouds the implementation of the arch-specific handling of a `CallJavaNode` >> >> Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a `CallJavaNode` vs reducing it to nothing? > >> Modulo the supports_blackholes switch that logic is essentially the same on all platforms, isn't it? Why can you not just fold the relevant case handling into the generated matcher code? Wouldn't that just require a special case switch to decide whether to use the AD file rule to reduce a `CallJavaNode` vs reducing it to nothing? > > Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching. We still have to match `CallJavaNode` to something, if not to `MachCallJavaNode`. For that, we need `.ad` match rules, AFAICS. > > I am exploring if we I can instead do `CallBlackholeNode` as the subclass of `CallJavaNode`, and match it directly. It would also allow ask Matcher if underlying `.ad` supports blackholes by doing `is_match_rule_supported`. It would still add match rules for blackholes to `.ad`-s, but at least it would not need to inject into current match rules. Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. Can you elaborate on your experiment with introducing custom node you mentioned? Have you tried introducing new control node and just wire data nodes to it? ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From neliasso at openjdk.java.net Wed Nov 25 13:36:58 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 25 Nov 2020 13:36:58 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:36:22 GMT, Claes Redestad wrote: > - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. > - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1421 From chagedorn at openjdk.java.net Wed Nov 25 14:03:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 25 Nov 2020 14:03:57 GMT Subject: Integrated: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 13:05:32 GMT, Christian Hagedorn wrote: > In the replay compilation, a `CastPP` node belonging to a null-check `If` node floats above its null-check and is then split through a phi. This results in a broken control input for the `CastPP` node which eventually makes the graph unschedulable. > > The problem can be traced back to the optimization done in `IfNode::simple_subsuming()`: > > ![missing_data_dependencies](https://user-images.githubusercontent.com/17833009/100085013-cce64500-2e4b-11eb-8773-1364577da8f2.png) > > The dominating test `8792 If` subsumes the original null-check `6691 If` and its condition is replaced by the constant `230 ConI`. However, we forget to rewire data dependencies, in this case `6694 CastPP`, to the `8794 IfTrue` projection of the dominating test. This is a problem because `6691 If` is not yet removed by igvn but we already apply the Ideal optimization for the newly created `8792 If` node. We find that `8800 If` is an identical test that dominates `8792 If`. All control dependent data nodes are updated to the dominating `8802 IfTrue` projection and the dominated test `8792 If` is removed. But in this step, we do not find the `6694 CastPP` at `8794 IfTrue` because we have not updated its control input and `6691 If` was not yet removed such that it could have ended up at `8794 IfTrue`. > > The fix is to rewire any data dependencies of the always taken projection of the subsumed test in `IfNode::simple_subsuming()` to the corresponding projection of the dominating test. This pull request has now been integrated. Changeset: 7aed9b65 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/7aed9b65 Stats: 17 lines in 1 file changed: 16 ins; 0 del; 1 mod 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph" Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1410 From aph at openjdk.java.net Wed Nov 25 14:12:08 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 25 Nov 2020 14:12:08 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails Message-ID: The problem is that the caller's expression SP is restored before throw_delayed_StackOverflowError(). This is wrong: it leaves ESP pointing into the caller's frame, and this triggers an assert failure. The fix here delays updating ESP until we know we're not going to call throw_delayed_StackOverflowError(). This makes the logic the same as x86. ------------- Commit messages: - 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails Changes: https://git.openjdk.java.net/jdk/pull/1435/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1435&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256359 Stats: 8 lines in 2 files changed: 5 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1435.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1435/head:pull/1435 PR: https://git.openjdk.java.net/jdk/pull/1435 From aph at redhat.com Wed Nov 25 14:22:35 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 25 Nov 2020 14:22:35 +0000 Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: <306af634-5b29-2bfa-971b-2e894a004da5@redhat.com> On 25/11/2020 12:45, Vladimir Ivanov wrote: > Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. Is that even a downside? It does at least allow everything in flight to become visible. It's certainly an improvement over what we have at the moment. But Aleksey, there is an alternative: a store that doesn't do anything. Did you consider that instead? I guess the problem is that there'd be a lot more nodes. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From neliasso at openjdk.java.net Wed Nov 25 14:23:58 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 25 Nov 2020 14:23:58 GMT Subject: Integrated: 8256508: Improve CompileCommand flag In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:41:23 GMT, Nils Eliasson wrote: > The current implementation of compile command has two types of options. Types pre-defined options like "compileonly" and the general 'option' type. > > 'option'-type are not defined, they can accidentally be used with the wrong value type, and the syntax is prone to error. > > By pre-defining all compile commands used and giving them types the parsing can be simplified, proper parsing errors can be given and and reasonable syntax can be used. > > This: > -XX:CompileCommand=option,java/util/String.toString,int,RepeatCompilation,5 > > Is superseded by: > -XX:CompileCommand=RepeatCompilation,java/util/String.toString,5 > > Attention check: Did you spot the error in the old command? > > In order not to break anything - the old syntax is kept for now. But even the old command format is improved with verification for the option name and the type of the value. This pull request has now been integrated. Changeset: cfb175df Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/cfb175df Stats: 945 lines in 22 files changed: 465 ins; 151 del; 329 mod 8256508: Improve CompileCommand flag Reviewed-by: redestad, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1276 From adinn at openjdk.java.net Wed Nov 25 14:26:59 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 25 Nov 2020 14:26:59 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 11:56:12 GMT, Aleksey Shipilev wrote: > Is there a guide rail how to do that? Because I cannot see how we can "reduce to nothing" during matching. Sorry, I should have said 'generated code' not 'generated matcher code'. I wasn't actually thinking of doing anything different during matching per se. What I thought was you could do something different at emit time. One way would be to redefine the logic of the call node's (generated) emit method. If that code tested the method it was being asked to call and found it marked as a black hole then it could skip executing the emit statements culled from matching rules. That would involve changing the code in adlc/output_c/h.cpp. It might perhaps also require tweaking the code in adlc/formssel.cpp/hpp. Anyway, given what @iwanowww has said this may be the wrong way to go about it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From thartmann at openjdk.java.net Wed Nov 25 14:56:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 25 Nov 2020 14:56:00 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation In-Reply-To: References: Message-ID: <4r6EFpaxCAFfVQDyJDMD8ksLZr8fU4sARluuRuQgO24=.59f474ba-435f-4488-9294-f7951b565537@github.com> On Tue, 24 Nov 2020 23:36:22 GMT, Claes Redestad wrote: > - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. > - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. Looks good to me. src/hotspot/share/asm/codeBuffer.cpp line 959: > 957: CodeSection* sect = code_section(n); > 958: if (!sect->is_allocated() || sect->is_empty()) { > 959: continue; Indentation is wrong. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1421 From vladimir.x.ivanov at oracle.com Wed Nov 25 15:03:41 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 25 Nov 2020 18:03:41 +0300 Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: <306af634-5b29-2bfa-971b-2e894a004da5@redhat.com> References: <306af634-5b29-2bfa-971b-2e894a004da5@redhat.com> Message-ID: >> Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. > > Is that even a downside? It does at least allow everything in flight to > become visible. It's certainly an improvement over what we have at the > moment. I'm under the impression that the main driver for this feature is performance, so I consider anything which affects performance a downside. > But Aleksey, there is an alternative: a store that doesn't do anything. > Did you consider that instead? I guess the problem is that there'd be a lot > more nodes. Yes, wiring the node into memory graph should work as well. I don't see why a single node (covering all basic types) can't do the job. Best regards, Vladimir Ivanov From redestad at openjdk.java.net Wed Nov 25 15:13:13 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 15:13:13 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: Message-ID: > - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. > - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Fix copyright - Fix indentation, copyright - Merge branch 'master' into verify_section_alloc - Merge branch 'master' into verify_section_alloc - Merge branch 'master' into verify_section_alloc - Ptrs schmtrs - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) - 8254360: Re-examine use of CodeBuffer::verify_section_allocation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1421/files - new: https://git.openjdk.java.net/jdk/pull/1421/files/91665791..7a972eb2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1421&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1421&range=00-01 Stats: 2189 lines in 72 files changed: 1478 ins; 339 del; 372 mod Patch: https://git.openjdk.java.net/jdk/pull/1421.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1421/head:pull/1421 PR: https://git.openjdk.java.net/jdk/pull/1421 From redestad at openjdk.java.net Wed Nov 25 15:13:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 15:13:15 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: <4r6EFpaxCAFfVQDyJDMD8ksLZr8fU4sARluuRuQgO24=.59f474ba-435f-4488-9294-f7951b565537@github.com> References: <4r6EFpaxCAFfVQDyJDMD8ksLZr8fU4sARluuRuQgO24=.59f474ba-435f-4488-9294-f7951b565537@github.com> Message-ID: On Wed, 25 Nov 2020 14:51:35 GMT, Tobias Hartmann wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Fix copyright >> - Fix indentation, copyright >> - Merge branch 'master' into verify_section_alloc >> - Merge branch 'master' into verify_section_alloc >> - Merge branch 'master' into verify_section_alloc >> - Ptrs schmtrs >> - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) >> - 8254360: Re-examine use of CodeBuffer::verify_section_allocation > > src/hotspot/share/asm/codeBuffer.cpp line 959: > >> 957: CodeSection* sect = code_section(n); >> 958: if (!sect->is_allocated() || sect->is_empty()) { >> 959: continue; > > Indentation is wrong. Good catch! ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From redestad at openjdk.java.net Wed Nov 25 15:28:05 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 15:28:05 GMT Subject: RFR: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator Message-ID: This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. ------------- Commit messages: - Back-peddle a bit.. - Assert we don't trigger UB by shifting too much - More clarity. More test. - Merge branch 'master' into regmask_clarity - More tests, clean-ups - 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator Changes: https://git.openjdk.java.net/jdk/pull/1437/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1437&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257069 Stats: 197 lines in 3 files changed: 181 ins; 0 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/1437.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1437/head:pull/1437 PR: https://git.openjdk.java.net/jdk/pull/1437 From jvernee at openjdk.java.net Wed Nov 25 18:22:55 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 25 Nov 2020 18:22:55 GMT Subject: RFR: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:23:53 GMT, Claes Redestad wrote: > This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. > > While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. > > The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. Okay-ing the changes in RegMaskIterator, since we've had some offline discussion about it already :) ------------- Marked as reviewed by jvernee (Committer). PR: https://git.openjdk.java.net/jdk/pull/1437 From kvn at openjdk.java.net Wed Nov 25 19:34:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 19:34:59 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 15:13:13 GMT, Claes Redestad wrote: >> - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. >> - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Fix copyright > - Fix indentation, copyright > - Merge branch 'master' into verify_section_alloc > - Merge branch 'master' into verify_section_alloc > - Merge branch 'master' into verify_section_alloc > - Ptrs schmtrs > - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) > - 8254360: Re-examine use of CodeBuffer::verify_section_allocation Changes requested by kvn (Reviewer). src/hotspot/share/asm/codeBuffer.hpp line 494: > 492: initialize_misc("static buffer"); > 493: initialize(code_start, code_size); > 494: debug_only(verify_section_allocation();) You also need #ifdef ASSERT around verify_section_allocation() in codeBuffer.cpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From redestad at openjdk.java.net Wed Nov 25 19:43:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 19:43:57 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 19:32:30 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Fix copyright >> - Fix indentation, copyright >> - Merge branch 'master' into verify_section_alloc >> - Merge branch 'master' into verify_section_alloc >> - Merge branch 'master' into verify_section_alloc >> - Ptrs schmtrs >> - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) >> - 8254360: Re-examine use of CodeBuffer::verify_section_allocation > > src/hotspot/share/asm/codeBuffer.hpp line 494: > >> 492: initialize_misc("static buffer"); >> 493: initialize(code_start, code_size); >> 494: debug_only(verify_section_allocation();) > > You also need #ifdef ASSERT around verify_section_allocation() in codeBuffer.cpp. There's still a verification in product code in `CodeBuffer::~CodeBuffer()`. Or do we want to take leap and make this ASSERT-only altogether? ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From kvn at openjdk.java.net Wed Nov 25 19:47:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 19:47:56 GMT Subject: RFR: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:23:53 GMT, Claes Redestad wrote: > This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. > > While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. > > The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. src/hotspot/share/opto/regmask.hpp line 170: > 168: void set_AllStack() { > 169: _RM_UP[_RM_SIZE - 1U] |= (uintptr_t(1) << (_WordBits - 1U)); > 170: } `_WordBits - 1U` is used in several places in this file. Should we add and use new _WordBitsM1 value instead? ------------- PR: https://git.openjdk.java.net/jdk/pull/1437 From kvn at openjdk.java.net Wed Nov 25 19:53:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 19:53:57 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: Message-ID: <8RTg9QWFPK3VGIaWgX5EMg6WDKzyMQcorI40b_uPZmE=.f70d1465-1111-473c-b106-953254aa3694@github.com> On Wed, 25 Nov 2020 19:41:13 GMT, Claes Redestad wrote: >> src/hotspot/share/asm/codeBuffer.hpp line 494: >> >>> 492: initialize_misc("static buffer"); >>> 493: initialize(code_start, code_size); >>> 494: debug_only(verify_section_allocation();) >> >> You also need #ifdef ASSERT around verify_section_allocation() in codeBuffer.cpp. > > There's still a verification in product code in `CodeBuffer::~CodeBuffer()`. Or do we want to take leap and make this ASSERT-only altogether? But you put debug_only() around declaration here. ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From shade at openjdk.java.net Wed Nov 25 20:03:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 20:03:57 GMT Subject: Integrated: 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 13:06:30 GMT, Aleksey Shipilev wrote: > Reproduces like this: > > $ CONF=linux-x86-server-fastdebug make images run-test TEST=compiler/floatingpoint/NaNTest.java TEST_VM_OPTS="-XX:UseSSE=1" > > STDOUT: > ### NanTest started > Written and read back float values match > 0x7F800001 0x7F800001 > STDERR: > java.lang.RuntimeException: Original and read back double values mismatch > 0xFFF0000000000001 0xFFF8000000000001 > > at compiler.floatingpoint.NaNTest.testDouble(NaNTest.java:56) > at compiler.floatingpoint.NaNTest.main(NaNTest.java:69) > > After reading through [JDK-8076373](https://bugs.openjdk.java.net/browse/JDK-8076373), I think the test cannot be expected to pass without SSE >= 2 for doubles, and SSE >= 1 for floats on x86_32. This change adds the platform and UseSSE sensing to test. > > Additional testing: > - [x] Affected test on Linux x86_32 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux x86_64 with `-XX:UseSSE={0,1,2}` > - [x] Affected test on Linux AArch64 This pull request has now been integrated. Changeset: a14f02d8 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a14f02d8 Stats: 38 lines in 1 file changed: 33 ins; 0 del; 5 mod 8256267: Relax compiler/floatingpoint/NaNTest.java for x86_32 and lower -XX:+UseSSE Reviewed-by: kvn, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/1187 From redestad at openjdk.java.net Wed Nov 25 20:22:01 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 20:22:01 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: <8RTg9QWFPK3VGIaWgX5EMg6WDKzyMQcorI40b_uPZmE=.f70d1465-1111-473c-b106-953254aa3694@github.com> References: <8RTg9QWFPK3VGIaWgX5EMg6WDKzyMQcorI40b_uPZmE=.f70d1465-1111-473c-b106-953254aa3694@github.com> Message-ID: On Wed, 25 Nov 2020 19:51:24 GMT, Vladimir Kozlov wrote: >> There's still a verification in product code in `CodeBuffer::~CodeBuffer()`. Or do we want to take leap and make this ASSERT-only altogether? > > But you put debug_only() around declaration here. The declaration is on [line 466](https://github.com/openjdk/jdk/pull/1421/files/7a972eb203ab61054d789d6e8494d90460498b66#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8aeR466). I see that GitHub folds the diffs in this file a bit oddly, so I can see how you mistook this call for the declaration. ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From kvn at openjdk.java.net Wed Nov 25 20:42:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 20:42:03 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 15:13:13 GMT, Claes Redestad wrote: >> - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. >> - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Fix copyright > - Fix indentation, copyright > - Merge branch 'master' into verify_section_alloc > - Merge branch 'master' into verify_section_alloc > - Merge branch 'master' into verify_section_alloc > - Ptrs schmtrs > - Factor out disjoint property, avoid redundant checks (disjointedness is symmetric) > - 8254360: Re-examine use of CodeBuffer::verify_section_allocation Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From kvn at openjdk.java.net Wed Nov 25 20:42:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 20:42:04 GMT Subject: RFR: 8254360: Re-examine use of CodeBuffer::verify_section_allocation [v2] In-Reply-To: References: <8RTg9QWFPK3VGIaWgX5EMg6WDKzyMQcorI40b_uPZmE=.f70d1465-1111-473c-b106-953254aa3694@github.com> Message-ID: <1oiExOxsAgV0ouzyCyO6WfzYEsTC63RoOfYzyNuD5hM=.929b27d8-53e9-4bae-8509-cb6146dabe01@github.com> On Wed, 25 Nov 2020 20:18:38 GMT, Claes Redestad wrote: >> But you put debug_only() around declaration here. > > The declaration is on [line 466](https://github.com/openjdk/jdk/pull/1421/files/7a972eb203ab61054d789d6e8494d90460498b66#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8aeR466). I see that GitHub folds the diffs in this file a bit oddly, so I can see how you mistook this call for the declaration. Shoot! Yes, I mistook it. Changes are good then. ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From redestad at openjdk.java.net Wed Nov 25 20:42:11 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 20:42:11 GMT Subject: RFR: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator [v2] In-Reply-To: References: Message-ID: > This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. > > While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. > > The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Simplify and use type declared constants, minor cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1437/files - new: https://git.openjdk.java.net/jdk/pull/1437/files/8c226d22..3320c73a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1437&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1437&range=00-01 Stats: 21 lines in 1 file changed: 0 ins; 1 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/1437.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1437/head:pull/1437 PR: https://git.openjdk.java.net/jdk/pull/1437 From kvn at openjdk.java.net Wed Nov 25 21:20:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 21:20:58 GMT Subject: RFR: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 20:42:11 GMT, Claes Redestad wrote: >> This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. >> >> While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. >> >> The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Simplify and use type declared constants, minor cleanups Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1437 From redestad at openjdk.java.net Wed Nov 25 21:53:56 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 25 Nov 2020 21:53:56 GMT Subject: Integrated: 8254360: Re-examine use of CodeBuffer::verify_section_allocation In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:36:22 GMT, Claes Redestad wrote: > - In release builds, only call `verify_section_allocation` in `~CodeBuffer`. This reduce overhead, while retaining most of the verification promises in product builds. > - Minor touch-ups the implementation of `verify_section_allocation`: Use is_aligned. Add a `disjoint` predicate. Only compare two sections once per verification. Remove some redundant checks. This pull request has now been integrated. Changeset: 20020d15 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/20020d15 Stats: 25 lines in 2 files changed: 7 ins; 2 del; 16 mod 8254360: Re-examine use of CodeBuffer::verify_section_allocation Reviewed-by: neliasso, thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1421 From jiefu at openjdk.java.net Wed Nov 25 23:52:15 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 25 Nov 2020 23:52:15 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 [v2] In-Reply-To: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: > Hi all, > > The bug was found while I was learning @iwanowww 's patch [1]. > > As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. > So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. > > It would be better to fix it. > > Testing: > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/1132 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8256956 - 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1413/files - new: https://git.openjdk.java.net/jdk/pull/1413/files/915e653a..5de63d0c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1413&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1413&range=00-01 Stats: 9467 lines in 221 files changed: 2934 ins; 1175 del; 5358 mod Patch: https://git.openjdk.java.net/jdk/pull/1413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1413/head:pull/1413 PR: https://git.openjdk.java.net/jdk/pull/1413 From psandoz at openjdk.java.net Thu Nov 26 00:19:01 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 26 Nov 2020 00:19:01 GMT Subject: RFR: 8256995: [vector] Improve broadcast operations Message-ID: Use the raw FP to bits conversion methods (to avoid NaN checks). Improve the x64 code generation when broadcasting a value. ------------- Commit messages: - Update to support all replicate patterns. - Improve IntVector.broadcast on x64 platforms. - Use raw bit conversions for FP values. Changes: https://git.openjdk.java.net/jdk/pull/1445/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1445&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256995 Stats: 20 lines in 4 files changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1445/head:pull/1445 PR: https://git.openjdk.java.net/jdk/pull/1445 From kvn at openjdk.java.net Thu Nov 26 00:23:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 00:23:57 GMT Subject: RFR: 8256995: [vector] Improve broadcast operations In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 00:06:23 GMT, Paul Sandoz wrote: > Use the raw FP to bits conversion methods (to avoid NaN checks). Improve the x64 code generation when broadcasting a value. Assembler changes seems fine. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1445 From xliu at openjdk.java.net Thu Nov 26 01:47:15 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 26 Nov 2020 01:47:15 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v7] In-Reply-To: References: Message-ID: > 8247732: validate user-input intrinsic_ids in ControlIntrinsic Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - 8247732: validate user-input intrinsic_ids in ControlIntrinsic fix typos in JDK-8256508 - 8247732: validate user-input intrinsic_ids in ControlIntrinsic avoid a warning of stringop-overflow - 8247732: validate user-input intrinsic_ids in ControlIntrinsic rebase chagnes to tip. make use of JDK-8256508 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1179&range=06 Stats: 541 lines in 32 files changed: 516 ins; 2 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/1179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1179/head:pull/1179 PR: https://git.openjdk.java.net/jdk/pull/1179 From xliu at openjdk.java.net Thu Nov 26 01:55:57 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 26 Nov 2020 01:55:57 GMT Subject: RFR: 8247732: validate user-input intrinsic_ids in ControlIntrinsic [v7] In-Reply-To: References: <7t8UOVLtgcnt5mZpLgqEJGBDefo9eYj1ESU7LRLREPE=.89523f6b-da8b-4733-bd2f-267dbd1254ef@github.com> Message-ID: On Thu, 19 Nov 2020 00:19:35 GMT, Xin Liu wrote: >> In general - I like it a lot. I will take it for a spin and go through the PR. >> >> A heads up is that this PR will clash a bit with https://github.com/openjdk/jdk/pull/1276 which adds validation of the compile commands. Your change to compilerOracle.*pp seems well contained so there should be no problem to merge them. > > Hi, Nils, > > Thank you for reviewing the lengthy PR! > The major part is to extend the testing framework to include ControlIntrinsic. It may be also useful to test other compiler directives whose arguments are ccstr/ccstrlist. > > I am watching JDK-8256508. I will update this PR accordingly if it needs. > > thanks, > --lx to adapt JDK-8256508, I change compilerOracle.cpp as follows. I also merge 2 if constructs into one. if (option == CompileCommand::ControlIntrinsic || option == CompileCommand::DisableIntrinsic) { ControlIntrinsicValidator validator(value, (option == CompileCommand::DisableIntrinsic)); if (!validator.is_valid()) { jio_snprintf(errorbuf, buf_size, "Unrecognized intrinsic detected in %s: %s", option2name(option), validator.what()); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/1179 From jiefu at openjdk.java.net Thu Nov 26 02:36:57 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 26 Nov 2020 02:36:57 GMT Subject: RFR: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 [v2] In-Reply-To: References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Wed, 25 Nov 2020 08:21:07 GMT, Tobias Hartmann wrote: >> Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8256956 >> - 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 > > Marked as reviewed by thartmann (Reviewer). Thanks @TobiHartmann and @iwanowww for your review. Will push it soon since there seems no regression (tier1~tier3) on Linux/x86 due to this change. @shipilev ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From jiefu at openjdk.java.net Thu Nov 26 02:45:55 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 26 Nov 2020 02:45:55 GMT Subject: Integrated: 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 In-Reply-To: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> References: <8mIcADIZgeuTzqA6hHUqmo2FoDmblTA3IXMmWLJfHYQ=.8b2c4a2f-0d20-463d-b743-386e15db3f64@github.com> Message-ID: On Tue, 24 Nov 2020 15:31:39 GMT, Jie Fu wrote: > Hi all, > > The bug was found while I was learning @iwanowww 's patch [1]. > > As we know, the slot-size is 32-bit and the registers are 64-bit on AMD64. > So RegisterImpl::max_slots_per_register [2] should be 2 on AMD64. > > It would be better to fix it. > > Testing: > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/1132 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/register_x86.hpp#L53 This pull request has now been integrated. Changeset: b1d14993 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/b1d14993 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod 8256956: RegisterImpl::max_slots_per_register is incorrect on AMD64 Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1413 From thartmann at openjdk.java.net Thu Nov 26 07:29:54 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 26 Nov 2020 07:29:54 GMT Subject: RFR: 8256655: rework long counted loop handling [v3] In-Reply-To: References: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> <9m4nY3umpkpmP8bsWAnCWs6GJmkH3QLo0T6oVURrTCg=.2b0ec719-d311-426d-8130-f866cc078fdd@github.com> Message-ID: On Fri, 20 Nov 2020 13:29:26 GMT, Roland Westrelin wrote: >> Did some quick sanity testing and `compiler/regalloc/TestC1OverlappingRegisterHint` fails with `assert(init_n->get_int() + cl->stride_con() >= cl->limit()->get_int()) failed: should be one iteration`. > >> Did some quick sanity testing and `compiler/regalloc/TestC1OverlappingRegisterHint` fails with `assert(init_n->get_int() + cl->stride_con() >= cl->limit()->get_int()) failed: should be one iteration`. > > Thanks. That assert is broken AFAICT. It doesn't seem to account for a downward loop (which would seem to indicate that IdealLoopTree::do_one_iteration_loop() never triggered). I pushed a fix. I ran this through some more stress testing with different values of `StressLongCountedLoop`. It all looks good except for the following crash with `compiler/c2/Test8217359.java`: `assert(!in->is_CFG()) failed: CFG Node with no controlling input?` Current CompileTask: C2: 295 2 % compiler.c2.Test8217359::test @ 43 (164 bytes) Stack: [0x00007f4abcffc000,0x00007f4abd0fd000], sp=0x00007f4abd0f7490, free space=1005k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1324824] PhaseIdealLoop::build_loop_early(VectorSet&, Node_List&, Node_Stack&)+0x6e4 V [libjvm.so+0x132ccca] PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x4aa V [libjvm.so+0xa16698] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x328 V [libjvm.so+0xa12955] Compile::Optimize()+0x13c5 V [libjvm.so+0xa147c8] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1838 V [libjvm.so+0x846cec] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc V [libjvm.so+0xa24a28] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 V [libjvm.so+0xa25578] CompileBroker::compiler_thread_loop()+0x5a8 V [libjvm.so+0x18ab7c6] JavaThread::thread_main_inner()+0x256 V [libjvm.so+0x18b2150] Thread::call_run()+0x100 V [libjvm.so+0x15950f6] thread_native_entry(Thread*)+0x116 Happens with `-XX:+UseParallelGC -XX:+UseNUMA -XX:StressLongCountedLoop=200000000`. Let me know if you need more info to reproduce. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From vlivanov at openjdk.java.net Thu Nov 26 10:04:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 10:04:58 GMT Subject: RFR: 8256995: [vector] Improve broadcast operations In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 00:06:23 GMT, Paul Sandoz wrote: > Use the raw FP to bits conversion methods (to avoid NaN checks). Improve the x64 code generation when broadcasting a value. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1445 From shade at openjdk.java.net Thu Nov 26 10:28:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 26 Nov 2020 10:28:55 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: References: Message-ID: <031Pr6RkXbR5QRG_qQDc78o60K5mUvXH_GuHpNf-YHk=.987f00e5-54c0-43ab-aba0-715ef5917ef1@github.com> On Wed, 25 Nov 2020 14:06:32 GMT, Andrew Haley wrote: > The problem is that the caller's expression SP is restored before throw_delayed_StackOverflowError(). This is wrong: it leaves ESP pointing into the caller's frame, and this triggers an assert failure. The fix here delays updating ESP until we know we're not going to call throw_delayed_StackOverflowError(). This makes the logic the same as x86. Looks fine to me. But do you know why GH actions are not testing your PR? Does that mean your base revision in master is too old? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1435 From aph at openjdk.java.net Thu Nov 26 10:55:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 26 Nov 2020 10:55:59 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: <031Pr6RkXbR5QRG_qQDc78o60K5mUvXH_GuHpNf-YHk=.987f00e5-54c0-43ab-aba0-715ef5917ef1@github.com> References: <031Pr6RkXbR5QRG_qQDc78o60K5mUvXH_GuHpNf-YHk=.987f00e5-54c0-43ab-aba0-715ef5917ef1@github.com> Message-ID: On Thu, 26 Nov 2020 10:26:04 GMT, Aleksey Shipilev wrote: > Looks fine to me. But do you know why GH actions are not testing your PR? Does that mean your base revision in master is too old? I've no idea. It's not too old: I pulled master into my fork yesterday, and branched from there. ------------- PR: https://git.openjdk.java.net/jdk/pull/1435 From shade at openjdk.java.net Thu Nov 26 11:00:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 26 Nov 2020 11:00:55 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: References: <031Pr6RkXbR5QRG_qQDc78o60K5mUvXH_GuHpNf-YHk=.987f00e5-54c0-43ab-aba0-715ef5917ef1@github.com> Message-ID: On Thu, 26 Nov 2020 10:53:15 GMT, Andrew Haley wrote: > > Looks fine to me. But do you know why GH actions are not testing your PR? Does that mean your base revision in master is too old? > > I've no idea. It's not too old: I pulled master into my fork yesterday, and branched from there. Look at https://github.com/theRealAph/jdk/actions -- does it say anything odd? ------------- PR: https://git.openjdk.java.net/jdk/pull/1435 From aph at openjdk.java.net Thu Nov 26 11:00:56 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 26 Nov 2020 11:00:56 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: References: <031Pr6RkXbR5QRG_qQDc78o60K5mUvXH_GuHpNf-YHk=.987f00e5-54c0-43ab-aba0-715ef5917ef1@github.com> Message-ID: On Thu, 26 Nov 2020 10:56:27 GMT, Aleksey Shipilev wrote: > > > Looks fine to me. But do you know why GH actions are not testing your PR? Does that mean your base revision in master is too old? > > > > > > I've no idea. It's not too old: I pulled master into my fork yesterday, and branched from there. > > Look at https://github.com/theRealAph/jdk/actions -- does it say anything odd? Yes! Workflows aren?t being run on this forked repository Because this repository contained workflow files when it was forked, we have disabled them from running on this fork. The error is caused by a file in .github/workflows:: @shipilev shipilev 8257056: Submit workflow should apt-get update to avoid package insta? 22 hours ago .?. submit.yml ------------- PR: https://git.openjdk.java.net/jdk/pull/1435 From chagedorn at openjdk.java.net Thu Nov 26 11:24:01 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 26 Nov 2020 11:24:01 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts Message-ID: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. Thanks, Christian ------------- Commit messages: - 8256807: C2: Not marking stores correctly as mismatched in string opts Changes: https://git.openjdk.java.net/jdk/pull/1450/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1450&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256807 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1450.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1450/head:pull/1450 PR: https://git.openjdk.java.net/jdk/pull/1450 From vlivanov at openjdk.java.net Thu Nov 26 11:29:55 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 11:29:55 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts In-Reply-To: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> References: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> Message-ID: On Thu, 26 Nov 2020 11:17:11 GMT, Christian Hagedorn wrote: > The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1450 From chagedorn at openjdk.java.net Thu Nov 26 11:34:06 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 26 Nov 2020 11:34:06 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected Message-ID: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> `test1()` fails while creating pre/main/post loops when copying skeleton predicates to the main loop. The problem is that we find phi nodes when following a skeleton `Opaque4` bool node, when trying to find the `OpaqueLoopInit` and `OpaqueLoopStride` nodes. This is unexpected and lets the assertion fail. This happens due to the following reason: An inner loop of a nested loop is first unswitched and the skeleton predicates are copied to the slow and fast loop by just creating a new `If` node that share the same `Opaque4` node: https://github.com/openjdk/jdk/blob/973255c469d794afe8ee74b24ddb5048bfcaadf7/src/hotspot/share/opto/loopPredicate.cpp#L268-L273 The loop tree building algorithm recognizes both loops as children of the parent loop: Loop: N0/N0 has_sfpt Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt Loop: N459/N458 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (slow loop) Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (fast loop) Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: Loop: N0/N0 has_sfpt Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5274 iters) has_sfpt Now the original parent loop (N338) gets peeled. The fast and the slow loop still both share skeleton `Opaque4` bool nodes with all their inputs nodes up to and including the `OpaqueLoopInit/Stride` nodes. Let's look at one of the skeleton `If` nodes for the fast loop that uses such a `Opaque4` node. The skeleton `If` is no longer part of the original parent loop and is therefore not peeled. But now we need some phi nodes to select the correct nodes either from the peeled iteration or from N338 for this skeleton `If` of the fast loop. This is done in `PhaseIdealLoop::clone_iff()` which creates a new `Opaque4` node together with new `Bool` and `Cmp` nodes and then inserts some phi nodes to do the selection. When afterwards creating pre/main/post loops for the fast loop (N343) that is no child anymore, we find these phi nodes on the path to the `OpaqueLoopInit/Stride` nodes which lets the assertion fail. A more detailed explanation why this happens can be find at `test1()` in [TestUnswitchCloneSkeletonPredicates](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L51). I propose to copy the skeleton predicates to the unswitched loops in the same way as we copy the skeleton predicates to the main loop by cloning all the nodes on the path to the` OpaqueLoopInit/Stride` nodes with a small adaptation: We should also copy the `OpaqueLoopInit/Stride` nodes and we should keep the uncommon traps because we only want to replace them by `Halt` nodes once we create pre/main/post loops. Thanks, Christian ------------- Commit messages: - Fixing whitespaces - 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected Changes: https://git.openjdk.java.net/jdk/pull/1448/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1448&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253644 Stats: 315 lines in 4 files changed: 263 ins; 7 del; 45 mod Patch: https://git.openjdk.java.net/jdk/pull/1448.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1448/head:pull/1448 PR: https://git.openjdk.java.net/jdk/pull/1448 From chagedorn at openjdk.java.net Thu Nov 26 11:35:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 26 Nov 2020 11:35:57 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts In-Reply-To: References: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> Message-ID: On Thu, 26 Nov 2020 11:27:39 GMT, Vladimir Ivanov wrote: >> The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. >> >> Thanks, >> Christian > > Looks good. @iwanowww Thank you for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1450 From roland at openjdk.java.net Thu Nov 26 11:41:56 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 26 Nov 2020 11:41:56 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts In-Reply-To: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> References: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> Message-ID: On Thu, 26 Nov 2020 11:17:11 GMT, Christian Hagedorn wrote: > The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1450 From thartmann at openjdk.java.net Thu Nov 26 12:37:54 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 26 Nov 2020 12:37:54 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts In-Reply-To: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> References: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> Message-ID: On Thu, 26 Nov 2020 11:17:11 GMT, Christian Hagedorn wrote: > The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. > > Thanks, > Christian Looks good, thanks for fixing! I assume you were not able to create a regression test? Maybe add the noreg-* label to the bug then. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1450 From roland at openjdk.java.net Thu Nov 26 12:38:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 26 Nov 2020 12:38:58 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected In-Reply-To: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Thu, 26 Nov 2020 11:13:09 GMT, Christian Hagedorn wrote: > Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. Does one round of loop opts stop here... and another starts below? > As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: > [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] I don't understand that part. Is N459 no longer a loop? ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From shade at openjdk.java.net Thu Nov 26 13:04:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 26 Nov 2020 13:04:01 GMT Subject: RFR: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 Message-ID: JDK-8254231 added new assert in `output.cpp`: assert(!is_mcall || (call_returns[block->_pre_order] <= (uint) current_offset)) Which verifies that the offset returned by MachCallNode::ret_addr_offset() (and sub-types) at matches the emitted code, to avoid potential conflicts between oop maps of different calls. It caught the failure running `compiler/intrinsics/string/TestStringLatin1IndexOfChar.java` on Linux x86_32, because it forces lower SSE settings. But more tests fail if you run with lower SSE settings. The real issue is `MachCallRuntimeNode::ret_addr_offset()` computing the offset incorrectly for `CallLeafNoFP`: the match rule for it does not include `FFree_Float_Stack_All`. See the definitions: // Call runtime without safepoint instruct CallLeafDirect(method meth) %{ match(CallLeaf); effect(USE meth); ins_cost(300); format %{ "CALL_LEAF,runtime " %} opcode(0xE8); /* E8 cd */ ins_encode( pre_call_resets, FFree_Float_Stack_All, Java_To_Runtime( meth ), Verify_FPU_For_Leaf, post_call_FPU ); ins_pipe( pipe_slow ); %} instruct CallLeafNoFPDirect(method meth) %{ match(CallLeafNoFP); effect(USE meth); ins_cost(300); format %{ "CALL_LEAF_NOFP,runtime " %} opcode(0xE8); /* E8 cd */ ins_encode(pre_call_resets, Java_To_Runtime(meth)); ins_pipe( pipe_slow ); %} Testing: - [x] A few known failing tests on Linux x86_32 - [x] Linux x86_32 `tier1` - [ ] Linux x86_32 `tier1` with `-UseSSE=1` ------------- Commit messages: - Formatting, comments - 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP Changes: https://git.openjdk.java.net/jdk/pull/1452/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1452&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256757 Stats: 9 lines in 5 files changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1452.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1452/head:pull/1452 PR: https://git.openjdk.java.net/jdk/pull/1452 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 13:34:14 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 13:34:14 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: References: Message-ID: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255351: Add detection for Graviton 2 CPUs This commit adds detection for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for it. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1315/files - new: https://git.openjdk.java.net/jdk/pull/1315/files/204f1f68..60e28f57 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1315.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1315/head:pull/1315 PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 13:42:55 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 13:42:55 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> References: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> Message-ID: On Mon, 23 Nov 2020 23:37:44 GMT, Vladimir Kozlov wrote: >> Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255351: Add detection for Graviton 2 CPUs >> >> This commit adds detection for Graviton 2 (as Neoverse N1) >> and enables UseSIMDForMemoryOps for it. > > Marked as reviewed by kvn (Reviewer). I've got additional performance data. It shows that Graviton 1 has different behaviour regarding SIMD for memory ops than Graviton 2. Especially array lengths, when it is worth to use SIMD instructions. We will address the issues in a separate PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From redestad at openjdk.java.net Thu Nov 26 13:45:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 26 Nov 2020 13:45:59 GMT Subject: Integrated: 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:23:53 GMT, Claes Redestad wrote: > This patch adds a sanity test to RegMask. It's not (yet) exhaustive, but it covers most common operations and iteration. > > While implementing this test I noticed that the code have a strong, implied assumption that RM_SIZE is even on 64-bit platforms, and a few things could have broken badly if RM_SIZE was odd (mismatch between set_AllStack/is_AllStack, extra bits after CHUNK_SIZE). That RM_SIZE is even is thankfully an invariant, since the AD preprocessor aligns up RM_SIZE. Add a static assert to this effect, and prefer _RM_SIZE for clarity. > > The iteration algorithm in RegMaskIterator, which I borrowed from IndexSetIterator, has garnered a few raised eyebrows. I think the quirky scheme of not shifting out the bit of interest but instead clear is mainly explained by the need to avoid undefined behavior in the corner case when only the highest bit is set. I've added some commentary to try and clarify the quirks. This pull request has now been integrated. Changeset: 2d30a101 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/2d30a101 Stats: 214 lines in 3 files changed: 180 ins; 0 del; 34 mod 8257069: C2: Clarify and sanity test RegMask/RegMaskIterator Reviewed-by: jvernee, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1437 From vlivanov at openjdk.java.net Thu Nov 26 13:49:01 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 13:49:01 GMT Subject: RFR: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers Message-ID: Method linkage of `invokehandle` instructions involve an upcall into Java (`MethodHandleNatives::linkMethod`), but the result is not cached. Unless the upcall behaves idempotently (which is highly desirable, but not guaranteed), repeated invokehandle resolution attempts enter a vicious cycle in tiered mode: switching to a higher tier involves call site re-resolution in generated code, but re-resolution installs a fresh method which puts execution back into interpreter. (Another prerequisite is no inlining through a `invokehandle` which doesn't normally happen in practice - relevant methods are marked w/ `@ForceInline` - except some testing modes, `-Xcomp` in particular.) Proposed fix is to inspect `ConstantPoolCache` first. Previous resolution attempts from interpreter and C1 records their results there and it stabilises the execution. Testing: - failing test - tier1-4 ------------- Commit messages: - Remove affected test from the problem list - Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers. Changes: https://git.openjdk.java.net/jdk/pull/1453/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1453&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256368 Stats: 51 lines in 5 files changed: 26 ins; 12 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/1453.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1453/head:pull/1453 PR: https://git.openjdk.java.net/jdk/pull/1453 From jiefu at openjdk.java.net Thu Nov 26 14:05:53 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 26 Nov 2020 14:05:53 GMT Subject: RFR: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 12:22:31 GMT, Aleksey Shipilev wrote: > JDK-8254231 added new assert in `output.cpp`: > > assert(!is_mcall || (call_returns[block->_pre_order] <= (uint) current_offset)) > > Which verifies that the offset returned by MachCallNode::ret_addr_offset() (and sub-types) at matches the emitted code, to avoid potential conflicts between oop maps of different calls. > > It caught the failure running `compiler/intrinsics/string/TestStringLatin1IndexOfChar.java` on Linux x86_32. I think that happened because the test forced lower SSE settings. But more tests fail if you force with lower SSE settings with e.g. `TEST_VM_OPTS=-XX:UseSSE=1`. The real issue is `MachCallRuntimeNode::ret_addr_offset()` computing the offset incorrectly for `CallLeafNoFP`: the match rule for it does not include `FFree_Float_Stack_All`. With `UseSSE < 2`, `FFree_Float_Stack_All` becomes non-zero. > > See the definitions in `x86_32.ad`: > > enc_class FFree_Float_Stack_All %{ // Free_Float_Stack_All > MacroAssembler masm(&cbuf); > int start = masm.offset(); > if (UseSSE >= 2) { > if (VerifyFPU) { > masm.verify_FPU(0, "must be empty in SSE2+ mode"); > } > } else { // <---- taking this with UseSSE < 2 > // External c_calling_convention expects the FPU stack to be 'clean'. > // Compiled code leaves it dirty. Do cleanup now. > masm.empty_FPU_stack(); > } > if (sizeof_FFree_Float_Stack_All == -1) { > sizeof_FFree_Float_Stack_All = masm.offset() - start; > } else { > assert(masm.offset() - start == sizeof_FFree_Float_Stack_All, "wrong size"); > } > %} > > // Call runtime without safepoint > instruct CallLeafDirect(method meth) %{ > match(CallLeaf); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode( pre_call_resets, > FFree_Float_Stack_All, > Java_To_Runtime( meth ), > Verify_FPU_For_Leaf, post_call_FPU ); > ins_pipe( pipe_slow ); > %} > > instruct CallLeafNoFPDirect(method meth) %{ > match(CallLeafNoFP); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF_NOFP,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode(pre_call_resets, Java_To_Runtime(meth)); > ins_pipe( pipe_slow ); > %} > > This does not affect `x86_64`, AFAICS. > > Additional testing: > - [x] A few known failing tests on Linux x86_32 > - [x] Linux x86_32 `tier1` > - [x] Linux x86_32 `tier1` with `-XX:UseSSE=1` Looks good. And thanks for fixing it. How about replacing '5' with 'NativeCall::instruction_size' and also keeping the code generation order like this: return pre_call_resets_size() + (_leaf_no_fp ? 0 : sizeof_FFree_Float_Stack_All) + NativeCall::instruction_size; ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/1452 From shade at openjdk.java.net Thu Nov 26 14:09:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 26 Nov 2020 14:09:56 GMT Subject: RFR: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 14:03:34 GMT, Jie Fu wrote: > How about replacing '5' with 'NativeCall::instruction_size' and also keeping the code generation order like this: > return pre_call_resets_size() + (_leaf_no_fp ? 0 : sizeof_FFree_Float_Stack_All) + NativeCall::instruction_size; New style follows what is done for `MachCallDynamicJavaNode` and `MachCallStaticJavaNode`. I'd prefer to make this a targeted fix not to risk more regressions. ------------- PR: https://git.openjdk.java.net/jdk/pull/1452 From jiefu at openjdk.java.net Thu Nov 26 14:29:54 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 26 Nov 2020 14:29:54 GMT Subject: RFR: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 14:07:29 GMT, Aleksey Shipilev wrote: > > How about replacing '5' with 'NativeCall::instruction_size' and also keeping the code generation order like this: > > return pre_call_resets_size() + (_leaf_no_fp ? 0 : sizeof_FFree_Float_Stack_All) + NativeCall::instruction_size; > > New style follows what is done for `MachCallDynamicJavaNode` and `MachCallStaticJavaNode`. I'd prefer to make this a targeted fix not to risk more regressions. Okay, I see. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1452 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 16:13:58 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 16:13:58 GMT Subject: Integrated: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 14:10:48 GMT, Evgeny Astigeevich wrote: > This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. > This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. This pull request has now been integrated. Changeset: 6e006223 Author: Evgeny Astigeevich Committer: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/6e006223 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory Reviewed-by: simonis ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From adinn at openjdk.java.net Thu Nov 26 16:41:55 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 26 Nov 2020 16:41:55 GMT Subject: RFR: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:06:32 GMT, Andrew Haley wrote: > The problem is that the caller's expression SP is restored before throw_delayed_StackOverflowError() gets called. This is wrong: it leaves ESP pointing into the caller's frame, and this triggers an assert failure. The fix here delays updating ESP until we know we're not going to call throw_delayed_StackOverflowError(). This makes the logic the same as x86. Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1435 From vlivanov at openjdk.java.net Thu Nov 26 17:02:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 17:02:03 GMT Subject: RFR: 8257164: Share LambdaForms for VH linkers/invokers. Message-ID: Introduce sharing of `LambdaForms` for `VarHandle` linkers and invokers. It reduces the number of LambdaForms needed at runtime. Testing: tier1-4 ------------- Commit messages: - Share LambdaForms for VH linkers/invokers. Changes: https://git.openjdk.java.net/jdk/pull/1455/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1455&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257164 Stats: 48 lines in 3 files changed: 21 ins; 10 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/1455.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1455/head:pull/1455 PR: https://git.openjdk.java.net/jdk/pull/1455 From vlivanov at openjdk.java.net Thu Nov 26 17:39:01 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 17:39:01 GMT Subject: RFR: 8257057: C2: Improve safepoint processing during vector scalarization pass Message-ID: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> Cast nodes (`CheckCastPP`/`CastPP`) hinders scalarization of vectors since they aren't taken into account when affected safepoints are enumerated. Proposed fix implements a reversed variant of `Node::uncast()` to find all safepoints which have vectors referenced from their debug info and then uses `Node::uncast()` when iterating over debug edges. It is safe to ignore cast nodes (even the ones which carry control dependency): `VectorBox` already contains the most precise type information and the vector value it represents is immutable. So, it's safe to replace a fully constructed boxed vector instance with the vector value it contains and rematerialize the equivalent box instance if deoptimization happens. Testing: - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); - tier1-4 ------------- Commit messages: - Improve safepoint processing during vector scalarization pass. Changes: https://git.openjdk.java.net/jdk/pull/1456/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1456&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257057 Stats: 20 lines in 1 file changed: 9 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1456.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1456/head:pull/1456 PR: https://git.openjdk.java.net/jdk/pull/1456 From aph at openjdk.java.net Thu Nov 26 17:55:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 26 Nov 2020 17:55:59 GMT Subject: Integrated: 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:06:32 GMT, Andrew Haley wrote: > The problem is that the caller's expression SP is restored before throw_delayed_StackOverflowError() gets called. This is wrong: it leaves ESP pointing into the caller's frame, and this triggers an assert failure. The fix here delays updating ESP until we know we're not going to call throw_delayed_StackOverflowError(). This makes the logic the same as x86. This pull request has now been integrated. Changeset: 4e43b288 Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/4e43b288 Stats: 8 lines in 2 files changed: 5 ins; 1 del; 2 mod 8256359: AArch64: runtime/ReservedStack/ReservedStackTestCompiler.java fails Reviewed-by: shade, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/1435 From vlivanov at openjdk.java.net Thu Nov 26 18:20:07 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 18:20:07 GMT Subject: RFR: 8257165: C2: Improve box elimination for vector masks and shuffles Message-ID: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> Introduce VectorMask/VectorShuffle-specific transformations to reduce reboxing by eliminating `VectorBox`/`VectorUnbox` pairs. It's a trivial transformation when the types on both ends perfectly match, but when type mismatch occurs there are additional steps needed (see `PhaseVector::expand_vunbox_node()` for more details on vector unboxing) . Testing: - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); - tier1-4 ------------- Commit messages: - Improve box elimination for vector masks and vector shuffles. Changes: https://git.openjdk.java.net/jdk/pull/1457/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1457&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257165 Stats: 103 lines in 3 files changed: 73 ins; 13 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/1457.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1457/head:pull/1457 PR: https://git.openjdk.java.net/jdk/pull/1457 From kvn at openjdk.java.net Thu Nov 26 19:52:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 19:52:58 GMT Subject: RFR: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 12:22:31 GMT, Aleksey Shipilev wrote: > Importance: fixes one of `tier1` tests, including the runs in GH Actions. > > JDK-8254231 added new assert in `output.cpp`: > > assert(!is_mcall || (call_returns[block->_pre_order] <= (uint) current_offset)) > > Which verifies that the offset returned by MachCallNode::ret_addr_offset() (and sub-types) at matches the emitted code, to avoid potential conflicts between oop maps of different calls. > > It caught the failure running `compiler/intrinsics/string/TestStringLatin1IndexOfChar.java` on Linux x86_32. I think that happened because the test forced lower SSE settings. But more tests fail if you force with lower SSE settings with e.g. `TEST_VM_OPTS=-XX:UseSSE=1`. The real issue is `MachCallRuntimeNode::ret_addr_offset()` computing the offset incorrectly for `CallLeafNoFP`: the match rule for it does not include `FFree_Float_Stack_All`. With `UseSSE < 2`, `FFree_Float_Stack_All` becomes non-zero. > > See the definitions in `x86_32.ad`: > > enc_class FFree_Float_Stack_All %{ // Free_Float_Stack_All > MacroAssembler masm(&cbuf); > int start = masm.offset(); > if (UseSSE >= 2) { > if (VerifyFPU) { > masm.verify_FPU(0, "must be empty in SSE2+ mode"); > } > } else { // <---- taking this with UseSSE < 2 > // External c_calling_convention expects the FPU stack to be 'clean'. > // Compiled code leaves it dirty. Do cleanup now. > masm.empty_FPU_stack(); > } > if (sizeof_FFree_Float_Stack_All == -1) { > sizeof_FFree_Float_Stack_All = masm.offset() - start; > } else { > assert(masm.offset() - start == sizeof_FFree_Float_Stack_All, "wrong size"); > } > %} > > // Call runtime without safepoint > instruct CallLeafDirect(method meth) %{ > match(CallLeaf); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode( pre_call_resets, > FFree_Float_Stack_All, > Java_To_Runtime( meth ), > Verify_FPU_For_Leaf, post_call_FPU ); > ins_pipe( pipe_slow ); > %} > > instruct CallLeafNoFPDirect(method meth) %{ > match(CallLeafNoFP); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF_NOFP,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode(pre_call_resets, Java_To_Runtime(meth)); > ins_pipe( pipe_slow ); > %} > > This does not affect `x86_64`, AFAICS. > > Additional testing: > - [x] A few known failing tests on Linux x86_32 > - [x] Linux x86_32 `tier1` > - [x] Linux x86_32 `tier1` with `-XX:UseSSE=1` Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1452 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 19:58:59 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 19:58:59 GMT Subject: Integrated: 8255351: Add detection for Graviton 2 CPUs In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. This pull request has now been integrated. Changeset: 2215e5a4 Author: Evgeny Astigeevich Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2215e5a4 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8255351: Add detection for Graviton 2 CPUs Reviewed-by: simonis, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Thu Nov 26 19:58:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 19:58:58 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:34:14 GMT, Evgeny Astigeevich wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Thu Nov 26 19:59:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 19:59:58 GMT Subject: RFR: 8257057: C2: Improve safepoint processing during vector scalarization pass In-Reply-To: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> References: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> Message-ID: On Thu, 26 Nov 2020 13:14:26 GMT, Vladimir Ivanov wrote: > Cast nodes (`CheckCastPP`/`CastPP`) hinders scalarization of vectors since they aren't taken into account when affected safepoints are enumerated. > > Proposed fix implements a reversed variant of `Node::uncast()` to find all safepoints which have vectors referenced from their debug info and then uses `Node::uncast()` when iterating over debug edges. It is safe to ignore cast nodes (even the ones which carry control dependency): `VectorBox` already contains the most precise type information and the vector value it represents is immutable. So, it's safe to replace a fully constructed boxed vector instance with the vector value it contains and rematerialize the equivalent box instance if deoptimization happens. > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1456 From kvn at openjdk.java.net Thu Nov 26 20:01:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 20:01:54 GMT Subject: RFR: 8257165: C2: Improve box elimination for vector masks and shuffles In-Reply-To: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> References: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> Message-ID: On Thu, 26 Nov 2020 13:17:05 GMT, Vladimir Ivanov wrote: > Introduce VectorMask/VectorShuffle-specific transformations to reduce reboxing by eliminating `VectorBox`/`VectorUnbox` pairs. > > It's a trivial transformation when the types on both ends perfectly match, but when type mismatch occurs there are additional steps needed (see `PhaseVector::expand_vunbox_node()` for more details on vector unboxing) . > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1457 From rcastanedalo at openjdk.java.net Thu Nov 26 21:43:00 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Nov 2020 21:43:00 GMT Subject: RFR: 8257146: C2: extend the scope of StressGCM Message-ID: Extend the scope of StressGCM by allowing it to move instructions into basic blocks with worse frequency. This should improve StressGCM's ability to expose bugs where C2 relies on GCM heuristics for correctness. Tested with StressGCM enabled on `hs-tier1-3` and a single repetition, the following test cases fail on `linux-aarch64-debug` and require further investigation: - `serviceability/sa/ClhsdbJstackXcompStress.java` - `compiler/c2/cr6663848/Tester.java` Also tested with StressGCM disabled (default) on `hs-tier1-3`, all tests pass. ------------- Commit messages: - 8257146: C2: extend the scope of StressGCM Changes: https://git.openjdk.java.net/jdk/pull/1469/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1469&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257146 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1469.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1469/head:pull/1469 PR: https://git.openjdk.java.net/jdk/pull/1469 From rcastanedalo at openjdk.java.net Thu Nov 26 21:43:00 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Nov 2020 21:43:00 GMT Subject: RFR: 8257146: C2: extend the scope of StressGCM In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 20:36:53 GMT, Roberto Casta?eda Lozano wrote: > Extend the scope of StressGCM by allowing it to move instructions into basic blocks with worse frequency. This should improve StressGCM's ability to expose bugs where C2 relies on GCM heuristics for correctness. > > Tested with StressGCM enabled on `hs-tier1-3` and a single repetition, the following test cases fail on `linux-aarch64-debug` and require further investigation: > - `serviceability/sa/ClhsdbJstackXcompStress.java` > - `compiler/c2/cr6663848/Tester.java` > > Also tested with StressGCM disabled (default) on `hs-tier1-3`, all tests pass. Extend the scope of StressGCM by allowing it to move instructions into basic blocks with worse frequency. ------------- PR: https://git.openjdk.java.net/jdk/pull/1469 From vlivanov at openjdk.java.net Thu Nov 26 21:59:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 21:59:03 GMT Subject: RFR: 8257189: Handle concurrent updates of MH.form better Message-ID: <1-iH7fvNLXAo2FzbAihuQV8PbDYiZsIHqZwPIcb6HxE=.d89f60fd-aad6-45bb-8704-363e299f96d8@github.com> Concurrent updates may lead to redundant LambdaForms created and unnecessary class loading when those are compiled. Most notably, it severely affects MethodHandle customization: when a MethodHandle is called from multiple threads, every thread starts customization which takes enough time for other threads to join, but only one of those customizations will be picked. Coordination between threads requesting the updates and letting a single thread proceed avoids the aforementioned problem. Moreover, there's no need to wait until the update in-flight is over: all other threads (except the one performing the update) can just proceed with the invocation using the existing MH.form. Testing: - manually monitored the behavior on a stress test from [JDK-8252049](https://bugs.openjdk.java.net/browse/JDK-8252049) - tier1-4 ------------- Commit messages: - Improve concurrent LF customization Changes: https://git.openjdk.java.net/jdk/pull/1472/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1472&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257189 Stats: 90 lines in 5 files changed: 38 ins; 20 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/1472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1472/head:pull/1472 PR: https://git.openjdk.java.net/jdk/pull/1472 From xliu at openjdk.java.net Fri Nov 27 00:14:02 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 00:14:02 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors Message-ID: 8257190: simplify PhaseIdealLoop constructors ------------- Commit messages: - 8257190: simplify PhaseIdealLoop constructors Changes: https://git.openjdk.java.net/jdk/pull/1473/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1473&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257190 Stats: 19 lines in 1 file changed: 4 ins; 11 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1473.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1473/head:pull/1473 PR: https://git.openjdk.java.net/jdk/pull/1473 From ngasson at openjdk.java.net Fri Nov 27 04:01:05 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 27 Nov 2020 04:01:05 GMT Subject: RFR: 8257143: Enable JVMCI code installation tests on AArch64 Message-ID: This set of jtreg tests test JVMCI code installation independently of Graal. Currently they only run on x86 as the minimal assembler required is only implemented for that platform. This patch implements the TestAssembler for AArch64 to ensure JVMCI test coverage even if the Graal embedded in OpenJDK is disabled/removed. ------------- Commit messages: - 8257143: Enable JVMCI code installation tests on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/1475/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1475&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257143 Stats: 573 lines in 10 files changed: 556 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/1475.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1475/head:pull/1475 PR: https://git.openjdk.java.net/jdk/pull/1475 From ngasson at openjdk.java.net Fri Nov 27 04:36:10 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 27 Nov 2020 04:36:10 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest Message-ID: There was a question on the SVE review thread on build-dev [1] a few months ago about why there is a Python script and test code under src/hotspot/cpu/aarch64. The script generates code to check the Assembler instruction encodings against those of the system assembler. The test runs every time the debug VM is started. AFAIK there's no precedent in the rest of Hotspot for having functional tests that run on startup, and we have the existing gtest framework for testing internal C++ modules. This patch (perhaps more of an RFC) moves the assembler test under test/hotspot/gtest/aarch64. The test will now run in tier1, including for release builds. The downside is that debug builds won't catch assembler encoding errors immediately on startup. Tested by injecting an error in one of the instruction encodings and verifying `make test TEST="gtest"` fails. [1] https://mail.openjdk.java.net/pipermail/build-dev/2020-August/028048.html ------------- Commit messages: - 8252684: Move the AArch64 assember test under test/hotspot/gtest Changes: https://git.openjdk.java.net/jdk/pull/1476/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1476&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252684 Stats: 2427 lines in 5 files changed: 1196 ins; 1227 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1476.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1476/head:pull/1476 PR: https://git.openjdk.java.net/jdk/pull/1476 From xliu at openjdk.java.net Fri Nov 27 05:12:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 05:12:56 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors In-Reply-To: References: Message-ID: <6FXaxwyx-wMtW0kfo7JBsACHgoTbwvyeJtE7zqc0IhM=.f94ee017-7526-4383-952e-86b1394b6fa5@github.com> On Fri, 27 Nov 2020 00:07:53 GMT, Xin Liu wrote: > 8257190: simplify PhaseIdealLoop constructors a. PhaseIdealLoop( PhaseIterGVN &igvn) b. PhaseIdealLoop(PhaseIterGVN &igvn, const PhaseIdealLoop *verify_me) c. PhaseIdealLoop(PhaseIterGVN &igvn, LoopOptsMode mode) I propose 3 changes to simplify them. 1. add assertion in the constructor c. C2 shouldn't use mode = LoopOptsVerify for it. 2. merge a and b into one constructor. 3. make the merged verification ctor only for debug builds. ------------- PR: https://git.openjdk.java.net/jdk/pull/1473 From xliu at openjdk.java.net Fri Nov 27 06:15:01 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 06:15:01 GMT Subject: RFR: fake commit: make some change which may conflict with JDK-8256858 Message-ID: please ignore this PR. skara CLI needs to cc a real maillist, even for a draft PR. ------------- Commit messages: - fake commit: make some change which may conflict with JDK-8256858 Changes: https://git.openjdk.java.net/jdk/pull/1477/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1477&range=00 Stats: 36 lines in 2 files changed: 3 ins; 9 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/1477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1477/head:pull/1477 PR: https://git.openjdk.java.net/jdk/pull/1477 From github.com+10233373+jhe33 at openjdk.java.net Fri Nov 27 06:26:56 2020 From: github.com+10233373+jhe33 at openjdk.java.net (Jie He) Date: Fri, 27 Nov 2020 06:26:56 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley wrote: >> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. >> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. > > I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others. Hi @theRealAph, I also have a patch to fix the unaligned copy small memory (< 16 bytes) when copy a big chunk of memory (> 96 bytes) in this function copy_memory_small(), but it couldn't impact the performance too much, I'm not sure if it is worth pushing to upstream. please refer to [1]. 1. [JBS-8149448](https://bugs.openjdk.java.net/browse/JDK-8149448) ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From shade at openjdk.java.net Fri Nov 27 06:51:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 27 Nov 2020 06:51:06 GMT Subject: Integrated: 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 12:22:31 GMT, Aleksey Shipilev wrote: > Importance: fixes one of `tier1` tests, including the runs in GH Actions. > > JDK-8254231 added new assert in `output.cpp`: > > assert(!is_mcall || (call_returns[block->_pre_order] <= (uint) current_offset)) > > Which verifies that the offset returned by MachCallNode::ret_addr_offset() (and sub-types) at matches the emitted code, to avoid potential conflicts between oop maps of different calls. > > It caught the failure running `compiler/intrinsics/string/TestStringLatin1IndexOfChar.java` on Linux x86_32. I think that happened because the test forced lower SSE settings. But more tests fail if you force with lower SSE settings with e.g. `TEST_VM_OPTS=-XX:UseSSE=1`. The real issue is `MachCallRuntimeNode::ret_addr_offset()` computing the offset incorrectly for `CallLeafNoFP`: the match rule for it does not include `FFree_Float_Stack_All`. With `UseSSE < 2`, `FFree_Float_Stack_All` becomes non-zero. > > See the definitions in `x86_32.ad`: > > enc_class FFree_Float_Stack_All %{ // Free_Float_Stack_All > MacroAssembler masm(&cbuf); > int start = masm.offset(); > if (UseSSE >= 2) { > if (VerifyFPU) { > masm.verify_FPU(0, "must be empty in SSE2+ mode"); > } > } else { // <---- taking this with UseSSE < 2 > // External c_calling_convention expects the FPU stack to be 'clean'. > // Compiled code leaves it dirty. Do cleanup now. > masm.empty_FPU_stack(); > } > if (sizeof_FFree_Float_Stack_All == -1) { > sizeof_FFree_Float_Stack_All = masm.offset() - start; > } else { > assert(masm.offset() - start == sizeof_FFree_Float_Stack_All, "wrong size"); > } > %} > > // Call runtime without safepoint > instruct CallLeafDirect(method meth) %{ > match(CallLeaf); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode( pre_call_resets, > FFree_Float_Stack_All, > Java_To_Runtime( meth ), > Verify_FPU_For_Leaf, post_call_FPU ); > ins_pipe( pipe_slow ); > %} > > instruct CallLeafNoFPDirect(method meth) %{ > match(CallLeafNoFP); > effect(USE meth); > > ins_cost(300); > format %{ "CALL_LEAF_NOFP,runtime " %} > opcode(0xE8); /* E8 cd */ > ins_encode(pre_call_resets, Java_To_Runtime(meth)); > ins_pipe( pipe_slow ); > %} > > This does not affect `x86_64`, AFAICS. > > Additional testing: > - [x] A few known failing tests on Linux x86_32 > - [x] Linux x86_32 `tier1` > - [x] Linux x86_32 `tier1` with `-XX:UseSSE=1` This pull request has now been integrated. Changeset: 9a468d85 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/9a468d85 Stats: 9 lines in 5 files changed: 7 ins; 0 del; 2 mod 8256757: Incorrect MachCallRuntimeNode::ret_addr_offset() for CallLeafNoFP on x86_32 Reviewed-by: jiefu, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1452 From xliu at openjdk.java.net Fri Nov 27 07:11:09 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 07:11:09 GMT Subject: RFR: fake commit: make some change which may conflict with JDK-8256858 [v2] In-Reply-To: References: Message-ID: > please ignore this PR. skara CLI needs to cc a real maillist, even for a draft PR. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: fake commit: make some change which may conflict with JDK-8256858 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1477/files - new: https://git.openjdk.java.net/jdk/pull/1477/files/624978b2..6fd0b03e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1477&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1477&range=00-01 Stats: 10 lines in 2 files changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1477/head:pull/1477 PR: https://git.openjdk.java.net/jdk/pull/1477 From xliu at openjdk.java.net Fri Nov 27 07:17:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 07:17:21 GMT Subject: RFR: fake commit: make some change which may conflict with JDK-8256858 [v3] In-Reply-To: References: Message-ID: > please ignore this PR. skara CLI needs to cc a real maillist, even for a draft PR. Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into branch_for_fake_pr - fake commit: make some change which may conflict with JDK-8256858 - fake commit: make some change which may conflict with JDK-8256858 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1477/files - new: https://git.openjdk.java.net/jdk/pull/1477/files/6fd0b03e..66e99e09 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1477&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1477&range=01-02 Stats: 10619 lines in 300 files changed: 3722 ins; 1301 del; 5596 mod Patch: https://git.openjdk.java.net/jdk/pull/1477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1477/head:pull/1477 PR: https://git.openjdk.java.net/jdk/pull/1477 From xliu at openjdk.java.net Fri Nov 27 07:22:57 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 27 Nov 2020 07:22:57 GMT Subject: Withdrawn: fake commit: make some change which may conflict with JDK-8256858 In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 06:00:13 GMT, Xin Liu wrote: > please ignore this PR. skara CLI needs to cc a real maillist, even for a draft PR. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1477 From roland at openjdk.java.net Fri Nov 27 08:15:16 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 27 Nov 2020 08:15:16 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: Message-ID: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - test failure - Merge branch 'master' into JDK-8256655 - assert fix - Merge branch 'master' into JDK-8256655 - build fixes - fix trailing whitespace - fix comment - long counted loop refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1316/files - new: https://git.openjdk.java.net/jdk/pull/1316/files/be681d51..bd4570a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=02-03 Stats: 77806 lines in 565 files changed: 72811 ins; 3179 del; 1816 mod Patch: https://git.openjdk.java.net/jdk/pull/1316.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1316/head:pull/1316 PR: https://git.openjdk.java.net/jdk/pull/1316 From roland at openjdk.java.net Fri Nov 27 08:17:57 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 27 Nov 2020 08:17:57 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: <3VkYxRjUxrbs1sUDk3P8p04iBK7OU5JrDYdzA7MqRnY=.f985d028-4c0e-4ed5-9fe3-b6ea5c0fe098@github.com> <9m4nY3umpkpmP8bsWAnCWs6GJmkH3QLo0T6oVURrTCg=.2b0ec719-d311-426d-8130-f866cc078fdd@github.com> Message-ID: On Thu, 26 Nov 2020 07:26:59 GMT, Tobias Hartmann wrote: > I ran this through some more stress testing with different values of `StressLongCountedLoop`. It all looks good except for the following crash with `compiler/c2/Test8217359.java`: > > `assert(!in->is_CFG()) failed: CFG Node with no controlling input?` > > ``` > Current CompileTask: > C2: 295 2 % compiler.c2.Test8217359::test @ 43 (164 bytes) > > Stack: [0x00007f4abcffc000,0x00007f4abd0fd000], sp=0x00007f4abd0f7490, free space=1005k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1324824] PhaseIdealLoop::build_loop_early(VectorSet&, Node_List&, Node_Stack&)+0x6e4 > V [libjvm.so+0x132ccca] PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x4aa > V [libjvm.so+0xa16698] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x328 > V [libjvm.so+0xa12955] Compile::Optimize()+0x13c5 > V [libjvm.so+0xa147c8] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1838 > V [libjvm.so+0x846cec] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc > V [libjvm.so+0xa24a28] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 > V [libjvm.so+0xa25578] CompileBroker::compiler_thread_loop()+0x5a8 > V [libjvm.so+0x18ab7c6] JavaThread::thread_main_inner()+0x256 > V [libjvm.so+0x18b2150] Thread::call_run()+0x100 > V [libjvm.so+0x15950f6] thread_native_entry(Thread*)+0x116 > ``` > > Happens with `-XX:+UseParallelGC -XX:+UseNUMA -XX:StressLongCountedLoop=200000000`. Let me know if you need more info to reproduce. Thanks for running tests. I pushed a fix for that one. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From vlivanov at openjdk.java.net Fri Nov 27 08:50:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 27 Nov 2020 08:50:06 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor Message-ID: `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. Proposed fix is to run Mutex destructor right away. Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. Testing: - [x] verified that no memory leak observed with the reported test - [x] tier1-4 [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html ------------- Commit messages: - Fix native memory leak in ciMethodData ctor Changes: https://git.openjdk.java.net/jdk/pull/1478/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1478&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252049 Stats: 20 lines in 4 files changed: 15 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1478.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1478/head:pull/1478 PR: https://git.openjdk.java.net/jdk/pull/1478 From chagedorn at openjdk.java.net Fri Nov 27 08:50:56 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 27 Nov 2020 08:50:56 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Thu, 26 Nov 2020 12:35:48 GMT, Roland Westrelin wrote: > > Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. > > Does one round of loop opts stop here... > and another starts below? > > > As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: > > > ``` > > [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] > > ``` > > I don't understand that part. Is N459 no longer a loop? Yes, exactly. So, we unswitch in one round A. Then we have the first loop tree shown with the slow and fast loop at the start of the next round B. In this round B, we apply Split If that is then able to remove the [`IfNode` (2)](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L101). We always take the true projection ( `x == 100` is true) which means that we immediately exit the fast loop N459 on the first iteration. The `CountedLoopNode` is therefore removed. The second loop tree shown is at the start of the next round C where we then finally apply the peeling and pre/main/post steps which lets the assertion fail. This description of N459 is indeed a bit misleading. When afterwards talking about the fast loop I just mean where the fast loop was before to keep things simpler. ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From chagedorn at openjdk.java.net Fri Nov 27 08:55:10 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 27 Nov 2020 08:55:10 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected [v2] In-Reply-To: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: > `test1()` fails while creating pre/main/post loops when copying skeleton predicates to the main loop. The problem is that we find phi nodes when following a skeleton `Opaque4` bool node, when trying to find the `OpaqueLoopInit` and `OpaqueLoopStride` nodes. This is unexpected and lets the assertion fail. This happens due to the following reason: > > An inner loop of a nested loop is first unswitched and the skeleton predicates are copied to the slow and fast loop by just creating a new `If` node that share the same `Opaque4` node: > https://github.com/openjdk/jdk/blob/973255c469d794afe8ee74b24ddb5048bfcaadf7/src/hotspot/share/opto/loopPredicate.cpp#L268-L273 > > The loop tree building algorithm recognizes both loops as children of the parent loop: > Loop: N0/N0 has_sfpt > Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt > Loop: N459/N458 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (slow loop) > Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (fast loop) > > Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: > Loop: N0/N0 has_sfpt > Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt > [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] > Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5274 iters) has_sfpt > > Now the original parent loop (N338) gets peeled. The fast and the slow loop still both share skeleton `Opaque4` bool nodes with all their inputs nodes up to and including the `OpaqueLoopInit/Stride` nodes. Let's look at one of the skeleton `If` nodes for the fast loop that uses such a `Opaque4` node. The skeleton `If` is no longer part of the original parent loop and is therefore not peeled. But now we need some phi nodes to select the correct nodes either from the peeled iteration or from N338 for this skeleton `If` of the fast loop. This is done in `PhaseIdealLoop::clone_iff()` which creates a new `Opaque4` node together with new `Bool` and `Cmp` nodes and then inserts some phi nodes to do the selection. > > When afterwards creating pre/main/post loops for the fast loop (N343) that is no child anymore, we find these phi nodes on the path to the `OpaqueLoopInit/Stride` nodes which lets the assertion fail. A more detailed explanation why this happens can be find at `test1()` in [TestUnswitchCloneSkeletonPredicates](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L51). > > I propose to copy the skeleton predicates to the unswitched loops in the same way as we copy the skeleton predicates to the main loop by cloning all the nodes on the path to the` OpaqueLoopInit/Stride` nodes with a small adaptation: We should also copy the `OpaqueLoopInit/Stride` nodes and we should keep the uncommon traps because we only want to replace them by `Halt` nodes once we create pre/main/post loops. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add back accidentally removed execution of test2-5. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1448/files - new: https://git.openjdk.java.net/jdk/pull/1448/files/0e7c30b8..c831c121 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1448&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1448&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1448.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1448/head:pull/1448 PR: https://git.openjdk.java.net/jdk/pull/1448 From chagedorn at openjdk.java.net Fri Nov 27 08:59:59 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 27 Nov 2020 08:59:59 GMT Subject: RFR: 8256807: C2: Not marking stores correctly as mismatched in string opts In-Reply-To: References: <3EaFufXDKEon7WuGqHOTU-rN7sbVLVjxhWRz0fJQrNM=.09989674-89b4-406d-b182-f2da47d1bd04@github.com> Message-ID: On Thu, 26 Nov 2020 11:39:30 GMT, Roland Westrelin wrote: >> The observed assertion failure can be traced back to not marking stores correctly as mismatched when creating them in string opts. [JDK-8140390](https://bugs.openjdk.java.net/browse/JDK-8140390) actually tried to fix that but accidentally set `require_atomic_access` to true instead of `mismatched`. This change fixes that. >> >> Thanks, >> Christian > > Looks good to me. @rwestrel @TobiHartmann Thanks for your reviews! I added the label. ------------- PR: https://git.openjdk.java.net/jdk/pull/1450 From roland at openjdk.java.net Fri Nov 27 09:03:00 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 27 Nov 2020 09:03:00 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Fri, 27 Nov 2020 08:48:29 GMT, Christian Hagedorn wrote: > Yes, exactly. So, we unswitch in one round A. Then we have the first loop tree shown with the slow and fast loop at the start of the next round B. In this round B, we apply Split If that is then able to remove the [`IfNode` (2)](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L101). We always take the true projection ( `x == 100` is true) which means that we immediately exit the fast loop N459 on the first iteration. The `CountedLoopNode` is therefore removed. The second loop tree shown is at the start of the next round C where we then finally apply the peeling and pre/main/post steps which lets the assertion fail. This description of N459 is indeed a bit misleading. When afterwards talking about the fast loop I just mean where the fast loop was before to keep things simpler. Thanks for the clarification. On round C, N459 is not part of the loop tree anymore then? If that's case shouldn't we remove useless skeleton predicates because then don't guard a loop anymore (PhaseIdealLoop::eliminate_useless_predicates())? ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From dholmes at openjdk.java.net Fri Nov 27 09:14:02 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 27 Nov 2020 09:14:02 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: References: Message-ID: <5jmPyMo3L6Z7ESUTxDCH_PMuOhEVePoy2_xn5lFswxc=.b76a5438-1822-4c39-a44b-bbbbe3bcfe52@github.com> On Fri, 27 Nov 2020 08:05:09 GMT, Vladimir Ivanov wrote: > `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. > But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. > The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. > > Proposed fix is to run Mutex destructor right away. > > Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. > > In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. > > Testing: > - [x] verified that no memory leak observed with the reported test > - [x] tier1-4 > > [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html I can't say this looks pretty. It is also very hard to be able to determine when the mutex was last used and whether this running of the destructor is guaranteed to be safe. What is the lifecycle of the original MethodData? ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From luhenry at openjdk.java.net Fri Nov 27 09:30:58 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Fri, 27 Nov 2020 09:30:58 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 04:31:46 GMT, Nick Gasson wrote: > There was a question on the SVE review thread on build-dev [1] a few > months ago about why there is a Python script and test code under > src/hotspot/cpu/aarch64. The script generates code to check the > Assembler instruction encodings against those of the system assembler. > The test runs every time the debug VM is started. > > AFAIK there's no precedent in the rest of Hotspot for having functional > tests that run on startup, and we have the existing gtest framework for > testing internal C++ modules. This patch (perhaps more of an RFC) moves > the assembler test under test/hotspot/gtest/aarch64. > > The test will now run in tier1, including for release builds. The > downside is that debug builds won't catch assembler encoding errors > immediately on startup. > > Tested by injecting an error in one of the instruction encodings and > verifying `make test TEST="gtest"` fails. > > [1] https://mail.openjdk.java.net/pipermail/build-dev/2020-August/028048.html >From having worked on the Windows and macOS port to AArch64, I concur that moving the test to the already dedicated gtest test suite makes more sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From ngasson at openjdk.java.net Fri Nov 27 09:54:56 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 27 Nov 2020 09:54:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 09:28:30 GMT, Ludovic Henry wrote: > > I now wonder if we could even get a step further and have the Python script generate the entirety of `test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp`. It would be about adding just the copyright, the headesr `#include`, and some glue code. That would make it easier to regenerate the file instead of depending on a copy-paste. I suppose it doesn't need to generate the whole file: the Python script can just write its output to a file that gets `#include`d into `test_assembler_aarch64.cpp`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From aph at redhat.com Fri Nov 27 10:00:24 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 27 Nov 2020 10:00:24 +0000 Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: Hi, On 11/27/20 6:26 AM, Jie He wrote: > I also have a patch to fix the unaligned copy small memory (< 16 bytes) when copy a big chunk of memory (> 96 bytes) in this function copy_memory_small(), but it couldn't impact the performance too much, I'm not sure if it is worth pushing to upstream. please refer to [1]. > > 1. [JBS-8149448](https://bugs.openjdk.java.net/browse/JDK-8149448) Thank you. >From what I remember, that was about optimizing for machines with poor performance for misaligned loads. As far as I understand it, AArch64 manufacturers have seen the error of their ways, understand that if they want to compete with Intel they have to fix unaligned memory performance, and have mostly done so. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Fri Nov 27 10:17:56 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 27 Nov 2020 10:17:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 09:51:53 GMT, Nick Gasson wrote: > I suppose it doesn't need to generate the whole file: the Python script can just write its output to a file that gets `#include`d into `test_assembler_aarch64.cpp`. Indeed. I did wonder about regenerating everything by running the Python script at test time, but reasoned that doing so would introduce a dependency on build tools ("as", for example) that I didn't want to have. But moving the output of aarch64_asmtest.py into a separate file is a good idea, if you like. I'm minded to just approve this patch as it is now. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From vlivanov at openjdk.java.net Fri Nov 27 10:24:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 27 Nov 2020 10:24:57 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: <5jmPyMo3L6Z7ESUTxDCH_PMuOhEVePoy2_xn5lFswxc=.b76a5438-1822-4c39-a44b-bbbbe3bcfe52@github.com> References: <5jmPyMo3L6Z7ESUTxDCH_PMuOhEVePoy2_xn5lFswxc=.b76a5438-1822-4c39-a44b-bbbbe3bcfe52@github.com> Message-ID: On Fri, 27 Nov 2020 09:11:23 GMT, David Holmes wrote: >> `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. >> But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. >> The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. >> >> Proposed fix is to run Mutex destructor right away. >> >> Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. >> >> In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. >> >> Testing: >> - [x] verified that no memory leak observed with the reported test >> - [x] tier1-4 >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html > > I can't say this looks pretty. It is also very hard to be able to determine when the mutex was last used and whether this running of the destructor is guaranteed to be safe. What is the lifecycle of the original MethodData? I agree that it's not pretty. I'd like to stress that the destruction happens on the freshly allocated `Mutex` and it is not related/copied from original `MethodData`: MethodData::MethodData(ciMethodData& data) : _extra_data_lock(Mutex::leaf, "unused") { _extra_data_lock.~Mutex(); // release allocated resources before zeroing So, I don't see any evident issues related to concurrent usage (since it shouldn't happen). Original `MethodData` lifecycle is tied to the `Method` it relates to. (But I don't see how it can matter here.) `ciMethodData` is allocated in compiler arena on per-compilarion task basis and is deallocated en masse when compilation finishes. The alternative fix would be to introduce new ctors on `Mutex` and `os::PlatformMonitor` which omit allocation of native condition variable, but that is more intrusive IMO. Let me know what you prefer. ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From mdoerr at openjdk.java.net Fri Nov 27 10:35:08 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 27 Nov 2020 10:35:08 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode [v2] In-Reply-To: References: Message-ID: > observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. > That node represents a leaf call and does not safepoint. > > In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. > This was already fixed on some other platforms. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Also fix CallDynamicJavaDirectSchedNode. Add assertion. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1418/files - new: https://git.openjdk.java.net/jdk/pull/1418/files/8d247269..8ae15b07 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1418&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1418&range=00-01 Stats: 5 lines in 1 file changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1418.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1418/head:pull/1418 PR: https://git.openjdk.java.net/jdk/pull/1418 From chagedorn at openjdk.java.net Fri Nov 27 11:10:01 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 27 Nov 2020 11:10:01 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Fri, 27 Nov 2020 09:00:03 GMT, Roland Westrelin wrote: > > Yes, exactly. So, we unswitch in one round A. Then we have the first loop tree shown with the slow and fast loop at the start of the next round B. In this round B, we apply Split If that is then able to remove the [`IfNode` (2)](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L101). We always take the true projection ( `x == 100` is true) which means that we immediately exit the fast loop N459 on the first iteration. The `CountedLoopNode` is therefore removed. The second loop tree shown is at the start of the next round C where we then finally apply the peeling and pre/main/post steps which lets the assertion fail. This description of N459 is indeed a bit misleading. When afterwards talking about the fast loop I just mean where the fast loop was before to keep things simpler. > > Thanks for the clarification. > On round C, N459 is not part of the loop tree anymore then? > If that's case shouldn't we remove useless skeleton predicates because then don't guard a loop anymore (PhaseIdealLoop::eliminate_useless_predicates())? I had a look at that method. Apparently, we are only removing `Opaque1` predicates but no `Opaque4` skeleton predicates: https://github.com/openjdk/jdk/blob/973255c469d794afe8ee74b24ddb5048bfcaadf7/src/hotspot/share/opto/loopnode.cpp#L3573-L3579 Could that be improved? But as of the current implementation, the `Opaque4` node with its inputs is shared between a fast and a slow loop. So we could not remove the `Opaque4` node when either the fast or slow loop does not need it anymore while the other one does. ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From github.com+42899633+eastig at openjdk.java.net Fri Nov 27 11:30:55 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 27 Nov 2020 11:30:55 GMT Subject: RFR: 8256488: [aarch64] Use ldpq/stpq instead of ld4/st4 for small copies in StubGenerator::copy_memory In-Reply-To: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> References: <440eHUMlYu9ucCmeL4v4817J6i9K4FU-0e-oS5Av5Xk=.7b4eb66b-4a98-49bc-bf2c-a69d3f54eafc@github.com> Message-ID: On Tue, 24 Nov 2020 10:08:37 GMT, Andrew Haley wrote: >> This patch fixes 27%-48% performance regressions of small arraycopies on Graviton2 (Neoverse N1) when UseSIMDForMemoryOps is enabled. For such copies ldpq/stpq are used instead of ld4/st4. >> This follows what the Arm Optimization Guide, including for Neoverse N1, recommends: Use discrete, non-writeback forms of load and store instructions while interleaving them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build and UseSIMDForMemoryOps enabled. > > I think we need also some non-Neoverse N1 numbers. We need to keep in mind that this software runs on many implementations. I'll have a look at some others. > Hi @theRealAph, > > I also have a patch to fix the unaligned copy small memory (< 16 bytes) when copy a big chunk of memory (> 96 bytes) in this function copy_memory_small(), but it couldn't impact the performance too much, I'm not sure if it is worth pushing to upstream. please refer to [1]. > > 1. [JBS-8149448](https://bugs.openjdk.java.net/browse/JDK-8149448) Hi Jie, Thank you for the information. As Andrew wrote, nowadays most of unaligned memory accesses don't have penalties on the Armv8 implementations. However, some accesses have penalties. For example these are the most common: 1. Load operations that cross a cache-line (64-byte) boundary. 2. Store operations that cross a 16B boundary. On some Armv8 implementations quad-word load operations can have penalties if they are not at least 4B aligned. Regarding the unaligned copy small memory, I think getting it aligned improves the function by a few percent (~2-5%%). As the most time is spent in copying big chunks of memory this improvement won't be noticeable. For example if copy_memory_small takes 1% of time and it is improved by 5% then the total improvement will be: 1 / (0.99 + (0.01/1.05)) = 1.000476 or 0.0476% which is almost impossible to detect. BTW, I tried to improve COPY_SMALL, _Copy_conjoint_words and _Copy_disjoint_words based on results of comparison with memcpy from the Arm optimised routines but I did not get any overall performance improvements. See [JDK-8255795](https://bugs.openjdk.java.net/browse/JDK-8255795) for more information. ------------- PR: https://git.openjdk.java.net/jdk/pull/1293 From vlivanov at openjdk.java.net Fri Nov 27 11:34:04 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 27 Nov 2020 11:34:04 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 08:15:16 GMT, Roland Westrelin wrote: >> Currently the transformation of a long counted loop into a loop nest >> with an inner int counted loop is performed in 2 steps: >> >> 1- recognize the counted loop shape and build the loop nest >> 2- transform the inner loop into a counted loop >> >> I propose changing this to a 3 steps process: >> >> 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) >> 2- build the loop nest >> 3- transform the inner loop into a counted loop >> >> The benefits are: >> >> - the logic is cleaner because step 1 and 2 are now separated >> - Simple optimizations (loop iv type, empty loop elimination, parallel >> iv) can be implemented for LongCountedLoop by refactoring existing >> code >> >> 1- above is achieved by refactoring the >> PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the >> kind of counted loop, int or long). >> >> 2- is the existing loop nest construction logic. But now that it takes >> a LongCountedLoop as input, the shape of the loop is known to be that >> of a canonicalized counted loop. As a result, the loop nest >> construction is simpler. >> >> This change also refactors PhiNode::Value() so that it works for both >> CountedLoop and LongCountedLoop. >> >> I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type >> node) and: >> >> jlong init_p = (jlong)init_t->_lo + stride_con; >> if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) >> return false; // cyclic loop or this loop trips only once >> >> to: >> >> if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { >> >> because if the loop has a single iteration transforming it to a >> CountedLoop should allow the backedge to be optimized out. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - test failure > - Merge branch 'master' into JDK-8256655 > - assert fix > - Merge branch 'master' into JDK-8256655 > - build fixes > - fix trailing whitespace > - fix comment > - long counted loop refactoring Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1316 From shade at openjdk.java.net Fri Nov 27 13:20:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 27 Nov 2020 13:20:58 GMT Subject: RFR: 8257146: C2: extend the scope of StressGCM In-Reply-To: References: Message-ID: <-t5kGAKFJuUw0YyrDrDfH5U8mjHpVIocW2dzF6h7KdE=.e7b4d6f8-7db1-4761-ba63-e32849b559cd@github.com> On Thu, 26 Nov 2020 20:37:21 GMT, Roberto Casta?eda Lozano wrote: >> Extend the scope of StressGCM by allowing it to move instructions into basic blocks with worse frequency. This should improve StressGCM's ability to expose bugs where C2 relies on GCM heuristics for correctness. >> >> Tested with StressGCM enabled on `hs-tier1-3` and a single repetition, the following test cases fail on `linux-aarch64-debug` and require further investigation: >> - `serviceability/sa/ClhsdbJstackXcompStress.java` >> - `compiler/c2/cr6663848/Tester.java` >> >> Also tested with StressGCM disabled (default) on `hs-tier1-3`, all tests pass. > > Extend the scope of StressGCM by allowing it to move instructions into basic > blocks with worse frequency. >From the archives, the original discussion about StressGCM: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-February/009777.html It was explicitly putting the loads in lower frequency block only, because as Vladimir says: > There is an other old bug which you may hit if you randomize placement > in gcm: 6831314. I still did not find a solution which does not > introduce performance regression. What saves us is placing loads into > "cheaper" low frequency block (which is most nested block where load's > result is used). Is that not a problem anymore? ------------- PR: https://git.openjdk.java.net/jdk/pull/1469 From redestad at openjdk.java.net Fri Nov 27 14:48:58 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 27 Nov 2020 14:48:58 GMT Subject: RFR: 8257189: Handle concurrent updates of MH.form better In-Reply-To: <1-iH7fvNLXAo2FzbAihuQV8PbDYiZsIHqZwPIcb6HxE=.d89f60fd-aad6-45bb-8704-363e299f96d8@github.com> References: <1-iH7fvNLXAo2FzbAihuQV8PbDYiZsIHqZwPIcb6HxE=.d89f60fd-aad6-45bb-8704-363e299f96d8@github.com> Message-ID: On Thu, 26 Nov 2020 21:23:16 GMT, Vladimir Ivanov wrote: > Concurrent updates may lead to redundant LambdaForms created and unnecessary class loading when those are compiled. > > Most notably, it severely affects MethodHandle customization: when a MethodHandle is called from multiple threads, every thread starts customization which takes enough time for other threads to join, but only one of those customizations will be picked. > > Coordination between threads requesting the updates and letting a single thread proceed avoids the aforementioned problem. Moreover, there's no need to wait until the update in-flight is over: all other threads (except the one performing the update) can just proceed with the invocation using the existing MH.form. > > Testing: > - manually monitored the behavior on a stress test from [JDK-8252049](https://bugs.openjdk.java.net/browse/JDK-8252049) > - tier1-4 Looks good to me! A couple of trivial nits inline.. src/java.base/share/classes/java/lang/invoke/MethodHandle.java line 459: > 457: // asTypeCache is not private so that invokers can easily fetch it > 458: > 459: /*non-public*/ Remove `/*non-public*/` src/java.base/share/classes/java/lang/invoke/MethodHandle.java line 1769: > 1767: */ > 1768: /*non-public*/ > 1769: void updateForm(Function updater) { `Function` src/java.base/share/classes/java/lang/invoke/MethodHandle.java line 1754: > 1752: updateForm(new Function<>() { > 1753: public LambdaForm apply(LambdaForm oldForm) { > 1754: return oldForm.customize(mh); I think you can drop line 1751 and write this as `return oldForm.customize(MethodHandle.this);`. Simpler and might avoid an additional, implicit capture. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1472 From redestad at openjdk.java.net Fri Nov 27 14:52:53 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 27 Nov 2020 14:52:53 GMT Subject: RFR: 8257164: Share LambdaForms for VH linkers/invokers. In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:13:43 GMT, Vladimir Ivanov wrote: > Introduce sharing of `LambdaForms` for `VarHandle` linkers and invokers. > It reduces the number of LambdaForms needed at runtime. > > Testing: tier1-4 LGTM! ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1455 From redestad at openjdk.java.net Fri Nov 27 15:09:54 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 27 Nov 2020 15:09:54 GMT Subject: RFR: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers In-Reply-To: References: Message-ID: <5VboXx5wXUazSyIunD0gAU8sSVXjYDvt1FShPWNh5Qs=.018d058b-9883-445b-bf2f-763a7f8b1149@github.com> On Thu, 26 Nov 2020 13:10:02 GMT, Vladimir Ivanov wrote: > Method linkage of `invokehandle` instructions involve an upcall into Java (`MethodHandleNatives::linkMethod`), but the result is not cached. Unless the upcall behaves idempotently (which is highly desirable, but not guaranteed), repeated invokehandle resolution attempts enter a vicious cycle in tiered mode: switching to a higher tier involves call site re-resolution in generated code, but re-resolution installs a fresh method which puts execution back into interpreter. > > (Another prerequisite is no inlining through a `invokehandle` which doesn't normally happen in practice - relevant methods are marked w/ `@ForceInline` - except some testing modes, `-Xcomp` in particular.) > > Proposed fix is to inspect `ConstantPoolCache` first. Previous resolution attempts from interpreter and C1 records their results there and it stabilises the execution. > > Testing: > - failing test > - tier1-4 Nice! ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1453 From github.com+670087+jrziviani at openjdk.java.net Fri Nov 27 15:22:59 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Fri, 27 Nov 2020 15:22:59 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode [v2] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 10:35:08 GMT, Martin Doerr wrote: >> observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. >> That node represents a leaf call and does not safepoint. >> "_guaranteed_safepoint" needs to get set correctly for all MachCallNodes after JDK-8254231. >> >> In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. >> This was already fixed on some other platforms. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Also fix CallDynamicJavaDirectSchedNode. Add assertion. Looks good to me ------------- Marked as reviewed by jrziviani at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/1418 From github.com+670087+jrziviani at openjdk.java.net Fri Nov 27 15:36:56 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Fri, 27 Nov 2020 15:36:56 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode [v2] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 10:35:08 GMT, Martin Doerr wrote: >> observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. >> That node represents a leaf call and does not safepoint. >> "_guaranteed_safepoint" needs to get set correctly for all MachCallNodes after JDK-8254231. >> >> In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. >> This was already fixed on some other platforms. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Also fix CallDynamicJavaDirectSchedNode. Add assertion. Thanks for fixing it. Before this patch I was getting # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007ffff73a50ec, pid=3812898, tid=3812899 I was trying to find the issue. Thank you! ------------- Marked as reviewed by jrziviani at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/1418 From clanger at openjdk.java.net Fri Nov 27 15:46:59 2020 From: clanger at openjdk.java.net (Christoph Langer) Date: Fri, 27 Nov 2020 15:46:59 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode [v2] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 10:35:08 GMT, Martin Doerr wrote: >> observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. >> That node represents a leaf call and does not safepoint. >> "_guaranteed_safepoint" needs to get set correctly for all MachCallNodes after JDK-8254231. >> >> In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. >> This was already fixed on some other platforms. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Also fix CallDynamicJavaDirectSchedNode. Add assertion. LGTM ------------- Marked as reviewed by clanger (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1418 From mdoerr at openjdk.java.net Fri Nov 27 15:47:00 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 27 Nov 2020 15:47:00 GMT Subject: RFR: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode [v2] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 15:42:02 GMT, Christoph Langer wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Also fix CallDynamicJavaDirectSchedNode. Add assertion. > > LGTM Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1418 From mdoerr at openjdk.java.net Fri Nov 27 15:47:02 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 27 Nov 2020 15:47:02 GMT Subject: Integrated: 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 17:52:53 GMT, Martin Doerr wrote: > observe_safepoint() is called with jvms == NULL from fill_buffer with mach = CallLeafDirectNode. > That node represents a leaf call and does not safepoint. > "_guaranteed_safepoint" needs to get set correctly for all MachCallNodes after JDK-8254231. > > In addition MachCallRuntimeNode::ret_addr_offset() need update for the new assertion in output.cpp. > This was already fixed on some other platforms. This pull request has now been integrated. Changeset: d51e2ab2 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/d51e2ab2 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod 8256986: [PPC64] C2 crashes when accessing nonexisting jvms of CallLeafDirectNode Reviewed-by: clanger ------------- PR: https://git.openjdk.java.net/jdk/pull/1418 From coleenp at openjdk.java.net Fri Nov 27 19:06:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 27 Nov 2020 19:06:01 GMT Subject: RFR: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:10:02 GMT, Vladimir Ivanov wrote: > Method linkage of `invokehandle` instructions involve an upcall into Java (`MethodHandleNatives::linkMethod`), but the result is not cached. Unless the upcall behaves idempotently (which is highly desirable, but not guaranteed), repeated invokehandle resolution attempts enter a vicious cycle in tiered mode: switching to a higher tier involves call site re-resolution in generated code, but re-resolution installs a fresh method which puts execution back into interpreter. > > (Another prerequisite is no inlining through a `invokehandle` which doesn't normally happen in practice - relevant methods are marked w/ `@ForceInline` - except some testing modes, `-Xcomp` in particular.) > > Proposed fix is to inspect `ConstantPoolCache` first. Previous resolution attempts from interpreter and C1 records their results there and it stabilises the execution. > > Testing: > - failing test > - tier1-4 Looks great. I had one question in the code. src/hotspot/share/interpreter/linkResolver.cpp line 1705: > 1703: int cache_index = ConstantPool::decode_cpcache_index(index, true); > 1704: ConstantPoolCacheEntry* cpce = pool->cache()->entry_at(cache_index); > 1705: if (!cpce->is_f1_null()) { If f1 is non-null any racing resolution is complete. Can you double check if that's also the case for indy_resolution_failed? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1453 From redestad at openjdk.java.net Fri Nov 27 20:20:10 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 27 Nov 2020 20:20:10 GMT Subject: RFR: 8257221: C2: Improve RegMask::is_bound Message-ID: This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. This also optimizes RegMask::is_bound (~2x) by removing use of Size() in is_bound1 and streamlining the three component methods (is_bound1, is_bound_pair, is_bound_set) so that they all do a cheap check that the remaining words are all zero after a matching bit, pair or set has been found. ------------- Commit messages: - Add is_bound_set test, fix bug caused by 8221404 - Add is_bound_pair test, remove redundant test in else branch, fix accidentally 32-bit uint bit - Merge branch 'master' into regmask_bound - Merge branch 'master' into regmask_bound - C2: Optimize is_bound Changes: https://git.openjdk.java.net/jdk/pull/1486/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1486&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257221 Stats: 112 lines in 2 files changed: 87 ins; 3 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/1486.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1486/head:pull/1486 PR: https://git.openjdk.java.net/jdk/pull/1486 From dnsimon at openjdk.java.net Fri Nov 27 20:32:06 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 27 Nov 2020 20:32:06 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash Message-ID: As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: > java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer Did you mean one of the following? jvmci.InitTimer= at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 # # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) This is too heavy-weight. This PR makes it similar how incorrectly specified -XX options are reported: > java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true Error parsing JVMCI options: Could not find option jvmci.InitTiimer Did you mean one of the following? jvmci.InitTimer= Error: A fatal exception has occurred. Program will exit. ------------- Commit messages: - JVMCI option parsing should not result in a VM crash producing a hs-err log Changes: https://git.openjdk.java.net/jdk/pull/1487/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1487&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257220 Stats: 58 lines in 2 files changed: 54 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1487.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1487/head:pull/1487 PR: https://git.openjdk.java.net/jdk/pull/1487 From kvn at openjdk.java.net Fri Nov 27 22:05:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 27 Nov 2020 22:05:01 GMT Subject: RFR: 8257221: C2: Improve RegMask::is_bound In-Reply-To: References: Message-ID: <7MrKapn9zzBXWDZhNev_W4CIH1Bv8RAm4NUPGmAHtPo=.ced689ea-ef7d-42c8-92e2-d9d4f4234d96@github.com> On Fri, 27 Nov 2020 20:12:47 GMT, Claes Redestad wrote: > This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. > > This also optimizes RegMask::is_bound (~2x) by removing use of Size() in is_bound1 and streamlining the three component methods (is_bound1, is_bound_pair, is_bound_set) so that they all do a cheap check that the remaining words are all zero after a matching bit, pair or set has been found. okay. src/hotspot/share/opto/regmask.cpp line 144: > 142: } > 143: > 144: // Return TRUE iff the mask contains a single bit I thought it is typo but: `iff - conjunction Logic & Mathematics if and only if (as a written abbreviation).` ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1486 From kvn at openjdk.java.net Fri Nov 27 22:06:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 27 Nov 2020 22:06:56 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 20:27:13 GMT, Doug Simon wrote: > As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) > at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 > # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > # > # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) > > This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: > >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Error parsing JVMCI options: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > Error: A fatal exception has occurred. Program will exit. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1487 From burban at openjdk.java.net Fri Nov 27 23:02:59 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 27 Nov 2020 23:02:59 GMT Subject: RFR: 8257143: Enable JVMCI code installation tests on AArch64 In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 03:56:22 GMT, Nick Gasson wrote: > This set of jtreg tests test JVMCI code installation independently of > Graal. Currently they only run on x86 as the minimal assembler required > is only implemented for that platform. This patch implements the > TestAssembler for AArch64 to ensure JVMCI test coverage even if the > Graal embedded in OpenJDK is disabled/removed. Tested on Windows+Arm64, the tests enabled in this PR are passing. Looks good to me (but I'm not a reviewer). ------------- Marked as reviewed by burban (Author). PR: https://git.openjdk.java.net/jdk/pull/1475 From redestad at openjdk.java.net Sat Nov 28 00:32:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 28 Nov 2020 00:32:08 GMT Subject: RFR: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 [v2] In-Reply-To: References: Message-ID: > This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. > > This also optimizes RegMask::is_bound (~2x) by removing use of Size() in is_bound1 and streamlining the three component methods (is_bound1, is_bound_pair, is_bound_set) so that they all do a cheap check that the remaining words are all zero after a matching bit, pair or set has been found. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Split out the optimization ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1486/files - new: https://git.openjdk.java.net/jdk/pull/1486/files/3245e387..2937939d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1486&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1486&range=00-01 Stats: 66 lines in 1 file changed: 3 ins; 44 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/1486.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1486/head:pull/1486 PR: https://git.openjdk.java.net/jdk/pull/1486 From redestad at openjdk.java.net Sat Nov 28 00:34:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 28 Nov 2020 00:34:57 GMT Subject: RFR: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 [v2] In-Reply-To: <7MrKapn9zzBXWDZhNev_W4CIH1Bv8RAm4NUPGmAHtPo=.ced689ea-ef7d-42c8-92e2-d9d4f4234d96@github.com> References: <7MrKapn9zzBXWDZhNev_W4CIH1Bv8RAm4NUPGmAHtPo=.ced689ea-ef7d-42c8-92e2-d9d4f4234d96@github.com> Message-ID: On Fri, 27 Nov 2020 22:02:45 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Split out the optimization > > okay. @vnkozlov - as we discussed I split out the optimizations to a separate RFE, leaving the new tests and the one-liner bug fix for this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1486 From ngasson at openjdk.java.net Sat Nov 28 11:45:17 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Sat, 28 Nov 2020 11:45:17 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: References: Message-ID: > There was a question on the SVE review thread on build-dev [1] a few > months ago about why there is a Python script and test code under > src/hotspot/cpu/aarch64. The script generates code to check the > Assembler instruction encodings against those of the system assembler. > The test runs every time the debug VM is started. > > AFAIK there's no precedent in the rest of Hotspot for having functional > tests that run on startup, and we have the existing gtest framework for > testing internal C++ modules. This patch (perhaps more of an RFC) moves > the assembler test under test/hotspot/gtest/aarch64. > > The test will now run in tier1, including for release builds. The > downside is that debug builds won't catch assembler encoding errors > immediately on startup. > > Tested by injecting an error in one of the instruction encodings and > verifying `make test TEST="gtest"` fails. > > [1] https://mail.openjdk.java.net/pipermail/build-dev/2020-August/028048.html Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Move the asmtest output into a separate file ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1476/files - new: https://git.openjdk.java.net/jdk/pull/1476/files/d7330dcb..f0604b61 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1476&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1476&range=00-01 Stats: 1208 lines in 2 files changed: 1 ins; 1205 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1476.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1476/head:pull/1476 PR: https://git.openjdk.java.net/jdk/pull/1476 From ngasson at openjdk.java.net Sat Nov 28 11:47:56 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Sat, 28 Nov 2020 11:47:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 10:15:39 GMT, Andrew Haley wrote: > > Indeed. I did wonder about regenerating everything by running the Python script at test time, but reasoned that doing so would introduce a dependency on build tools ("as", for example) that I didn't want to have. But moving the output of aarch64_asmtest.py into a separate file is a good idea, if you like. > I moved the output to a separate file `asmtest.out.h` with note on how to generate it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From dholmes at openjdk.java.net Sat Nov 28 12:48:55 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 28 Nov 2020 12:48:55 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: References: <5jmPyMo3L6Z7ESUTxDCH_PMuOhEVePoy2_xn5lFswxc=.b76a5438-1822-4c39-a44b-bbbbe3bcfe52@github.com> Message-ID: <3tBpkqtJc4u02udB4Ba34DSlB1mPaVgOfrh-EFIG8_A=.4452852f-dafb-459f-be76-e482cacfb9bb@github.com> On Fri, 27 Nov 2020 10:21:55 GMT, Vladimir Ivanov wrote: >> I can't say this looks pretty. It is also very hard to be able to determine when the mutex was last used and whether this running of the destructor is guaranteed to be safe. What is the lifecycle of the original MethodData? > > I agree that it's not pretty. > > I'd like to stress that the destruction happens on the freshly allocated `Mutex` and it is not related/copied from original `MethodData`: > MethodData::MethodData(ciMethodData& data) > : _extra_data_lock(Mutex::leaf, "unused") { > _extra_data_lock.~Mutex(); // release allocated resources before zeroing > > So, I don't see any evident issues related to concurrent usage (since it shouldn't happen). Original `MethodData` lifecycle is tied to the `Method` it relates to. (But I don't see how it can matter here.) `ciMethodData` is allocated in compiler arena on per-compilarion task basis and is deallocated en masse when compilation finishes. > > The alternative fix would be to introduce new ctors on `Mutex` and `os::PlatformMonitor` which omit allocation of native condition variable, but that is more intrusive IMO. Let me know what you prefer. Ah now I understand a bit better. The code creates a copy of the MethodData to preserve the data and gets the Mutex in the process. The Mutex is not wanted/needed and so it is immediately "destructed". Okay that works. A possible alternative might be to refactor MethodData so that the data is in MethodDataBase, and the Mutex added in MethodData, then only snapshot the MethodDataBase part in ciMethodData. But I don't know if that is feasible, nor worth the effort. ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From aph at openjdk.java.net Sat Nov 28 15:18:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 28 Nov 2020 15:18:57 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 11:45:17 GMT, Nick Gasson wrote: >> There was a question on the SVE review thread on build-dev [1] a few >> months ago about why there is a Python script and test code under >> src/hotspot/cpu/aarch64. The script generates code to check the >> Assembler instruction encodings against those of the system assembler. >> The test runs every time the debug VM is started. >> >> AFAIK there's no precedent in the rest of Hotspot for having functional >> tests that run on startup, and we have the existing gtest framework for >> testing internal C++ modules. This patch (perhaps more of an RFC) moves >> the assembler test under test/hotspot/gtest/aarch64. >> >> The test will now run in tier1, including for release builds. The >> downside is that debug builds won't catch assembler encoding errors >> immediately on startup. >> >> Tested by injecting an error in one of the instruction encodings and >> verifying `make test TEST="gtest"` fails. >> >> [1] https://mail.openjdk.java.net/pipermail/build-dev/2020-August/028048.html > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Move the asmtest output into a separate file Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From ngasson at openjdk.java.net Sat Nov 28 15:40:58 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Sat, 28 Nov 2020 15:40:58 GMT Subject: Integrated: 8252684: Move the AArch64 assember test under test/hotspot/gtest In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 04:31:46 GMT, Nick Gasson wrote: > There was a question on the SVE review thread on build-dev [1] a few > months ago about why there is a Python script and test code under > src/hotspot/cpu/aarch64. The script generates code to check the > Assembler instruction encodings against those of the system assembler. > The test runs every time the debug VM is started. > > AFAIK there's no precedent in the rest of Hotspot for having functional > tests that run on startup, and we have the existing gtest framework for > testing internal C++ modules. This patch (perhaps more of an RFC) moves > the assembler test under test/hotspot/gtest/aarch64. > > The test will now run in tier1, including for release builds. The > downside is that debug builds won't catch assembler encoding errors > immediately on startup. > > Tested by injecting an error in one of the instruction encodings and > verifying `make test TEST="gtest"` fails. > > [1] https://mail.openjdk.java.net/pipermail/build-dev/2020-August/028048.html This pull request has now been integrated. Changeset: c93f0a07 Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/c93f0a07 Stats: 2418 lines in 6 files changed: 1187 ins; 1227 del; 4 mod 8252684: Move the AArch64 assember test under test/hotspot/gtest Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From kbarrett at openjdk.java.net Sat Nov 28 19:50:54 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 28 Nov 2020 19:50:54 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 08:05:09 GMT, Vladimir Ivanov wrote: > `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. > But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. > The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. > > Proposed fix is to run Mutex destructor right away. > > Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. > > In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. > > Testing: > - [x] verified that no memory leak observed with the reported test > - [x] tier1-4 > > [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html I agree this fixes the immediate problem of a native memory leak. But it all seems pretty horrible from a code understanding and maintenance POV. So while I'm approving this change, I think there is followup work to be done, for which an RFE should be filed. For the specific case of ciMethodData::_orig it seems like it would be better to separate the mutex from the data and have _orig be just the data. That might take some significant refactoring. But more generally, I question the embedded mutex in an MDO. >From the quoted email (https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html) Coleen tried heap managing the mutex rather than having it embedded directly, and found this caused some performace regressions. But does there really need to be a mutex per MDO? Would a single immortal mutex, shared by all MDOs, be adequate for this purpose? If not, maybe the trick used by ObjectMonitor(?) -- have a set of immortal mutexes and "randomly" choose one per MDO. This would also eliminate this part of the walk of metadata objects to release heap resources when dropping them. And it avoids constructing and destructing lots of mutexes (which isn't free, even if not heap managed). ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1478 From dholmes at openjdk.java.net Sat Nov 28 22:10:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 28 Nov 2020 22:10:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: References: Message-ID: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> On Sat, 28 Nov 2020 15:16:38 GMT, Andrew Haley wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Move the asmtest output into a separate file > > Marked as reviewed by aph (Reviewer). This has broken all the regular x86 Windows builds! Please fix or backout immediately! ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From kvn at openjdk.java.net Sat Nov 28 23:23:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 28 Nov 2020 23:23:57 GMT Subject: RFR: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 [v2] In-Reply-To: References: Message-ID: <8lPYU-CfFfiQ2ldq_pQQp69wG8PiABzOBszMdE_qFg0=.1d837fee-bda3-4cc6-9f1a-258f027c3208@github.com> On Sat, 28 Nov 2020 00:32:08 GMT, Claes Redestad wrote: >> This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. >> >> This also optimizes RegMask::is_bound (~2x) by removing use of Size() in is_bound1 and streamlining the three component methods (is_bound1, is_bound_pair, is_bound_set) so that they all do a cheap check that the remaining words are all zero after a matching bit, pair or set has been found. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Split out the optimization Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1486 From jiefu at openjdk.java.net Sat Nov 28 23:39:03 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 28 Nov 2020 23:39:03 GMT Subject: RFR: 8257232: CompileThresholdScaling fails to work on 32-bit platforms Message-ID: Hi all, CompileThresholdScaling is incorrect on 32-bit platforms. If you run the following command on Linux-32: java -XX:CompileThresholdScaling=0.75 -version It gets the following unexpected warnings: intx Tier0InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier0BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier2InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier2BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier3InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier3BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] intx Tier23InlineeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] The failure is that nth_bit(max_freq_bits) [1] = nth_bit(32) [2] = 0 on 32-bit platforms. So the scaling logic is wrong. It would be better to fix it. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L125 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L973 ------------- Commit messages: - 8257232: CompileThresholdScaling fails to work on 32-bit platforms Changes: https://git.openjdk.java.net/jdk/pull/1499/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1499&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257232 Stats: 9 lines in 1 file changed: 6 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1499.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1499/head:pull/1499 PR: https://git.openjdk.java.net/jdk/pull/1499 From dholmes at openjdk.java.net Sun Nov 29 00:11:55 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 29 Nov 2020 00:11:55 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> References: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> Message-ID: <_HlHBztaO5PgtVCFZn2pCPCAaMNHSD8o66zOgUpC-Kc=.fd5ee86d-6aa8-47bb-ade9-898114818507@github.com> On Sat, 28 Nov 2020 22:08:11 GMT, David Holmes wrote: >> Marked as reviewed by aph (Reviewer). > > This has broken all the regular x86 Windows builds! Please fix or backout immediately! IIUC given: #ifdef AARCH64 #include "precompiled.hpp" VS will ignore the ifdef as it appears before the precompiled header include. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From ngasson at openjdk.java.net Sun Nov 29 00:11:56 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Sun, 29 Nov 2020 00:11:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> References: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> Message-ID: <7kSiMPkEWK1nJrDMNLC-PA5JW0jOWQRY6CfhhU5pPVw=.8b1bca79-5624-4d8c-a241-d754f562efef@github.com> On Sat, 28 Nov 2020 22:08:11 GMT, David Holmes wrote: > This has broken all the regular x86 Windows builds! Please fix or backout immediately! @dholmes-ora sorry I don't have access to a computer at the moment, would you mind reverting it for me? The added file is wrapped in #ifdef AARCH64 so I don't know why that affects windows x86. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From dholmes at openjdk.java.net Sun Nov 29 00:15:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 29 Nov 2020 00:15:56 GMT Subject: RFR: 8252684: Move the AArch64 assember test under test/hotspot/gtest [v2] In-Reply-To: <7kSiMPkEWK1nJrDMNLC-PA5JW0jOWQRY6CfhhU5pPVw=.8b1bca79-5624-4d8c-a241-d754f562efef@github.com> References: <0ZFcGw4uaCYg705quoGGflYyXcsSbOYPhyyHkO2VmFI=.3b476384-5cad-49ba-b031-981caeb24e8e@github.com> <7kSiMPkEWK1nJrDMNLC-PA5JW0jOWQRY6CfhhU5pPVw=.8b1bca79-5624-4d8c-a241-d754f562efef@github.com> Message-ID: On Sun, 29 Nov 2020 00:09:47 GMT, Nick Gasson wrote: >> This has broken all the regular x86 Windows builds! Please fix or backout immediately! > >> This has broken all the regular x86 Windows builds! Please fix or backout immediately! > > @dholmes-ora sorry I don't have access to a computer at the moment, would you mind reverting it for me? The added file is wrapped in #ifdef AARCH64 so I don't know why that affects windows x86. The ifdef needs to come after the include of the precompiled header. I'm filing a bug. But it is Sunday for me too and I have very limited time. ------------- PR: https://git.openjdk.java.net/jdk/pull/1476 From dnsimon at openjdk.java.net Sun Nov 29 00:17:09 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sun, 29 Nov 2020 00:17:09 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash [v2] In-Reply-To: References: Message-ID: > As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) > at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 > # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > # > # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) > > This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: > >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Error parsing JVMCI options: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > Error: A fatal exception has occurred. Program will exit. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed test failure on Windows ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1487/files - new: https://git.openjdk.java.net/jdk/pull/1487/files/f4281728..fb69a900 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1487&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1487&range=00-01 Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1487.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1487/head:pull/1487 PR: https://git.openjdk.java.net/jdk/pull/1487 From kvn at openjdk.java.net Sun Nov 29 16:54:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 29 Nov 2020 16:54:57 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash [v2] In-Reply-To: References: Message-ID: On Sun, 29 Nov 2020 00:17:09 GMT, Doug Simon wrote: >> As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >>> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true >> Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 >> java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer >> Did you mean one of the following? >> jvmci.InitTimer= >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) >> at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) >> at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 >> # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 >> # >> # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) >> # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) >> >> This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: >> >>> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true >> Error parsing JVMCI options: Could not find option jvmci.InitTiimer >> Did you mean one of the following? >> jvmci.InitTimer= >> Error: A fatal exception has occurred. Program will exit. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed test failure on Windows Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1487 From dnsimon at openjdk.java.net Sun Nov 29 16:54:59 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sun, 29 Nov 2020 16:54:59 GMT Subject: Integrated: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 20:27:13 GMT, Doug Simon wrote: > As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) > at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 > # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > # > # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) > > This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: > >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Error parsing JVMCI options: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > Error: A fatal exception has occurred. Program will exit. This pull request has now been integrated. Changeset: c5d95071 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/c5d95071 Stats: 66 lines in 2 files changed: 62 ins; 0 del; 4 mod 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1487 From redestad at openjdk.java.net Mon Nov 30 07:06:10 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 30 Nov 2020 07:06:10 GMT Subject: RFR: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 [v3] In-Reply-To: References: Message-ID: > This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into regmask_bound - Clean-up is_bound_set test - Split out the optimization - Add is_bound_set test, fix bug caused by 8221404 - Add is_bound_pair test, remove redundant test in else branch, fix accidentally 32-bit uint bit - Merge branch 'master' into regmask_bound - Merge branch 'master' into regmask_bound - C2: Optimize is_bound ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1486/files - new: https://git.openjdk.java.net/jdk/pull/1486/files/2937939d..6980ee00 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1486&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1486&range=01-02 Stats: 4859 lines in 111 files changed: 2274 ins; 1906 del; 679 mod Patch: https://git.openjdk.java.net/jdk/pull/1486.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1486/head:pull/1486 PR: https://git.openjdk.java.net/jdk/pull/1486 From neliasso at openjdk.java.net Mon Nov 30 08:14:59 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 30 Nov 2020 08:14:59 GMT Subject: RFR: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 [v3] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 07:06:10 GMT, Claes Redestad wrote: >> This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into regmask_bound > - Clean-up is_bound_set test > - Split out the optimization > - Add is_bound_set test, fix bug caused by 8221404 > - Add is_bound_pair test, remove redundant test in else branch, fix accidentally 32-bit uint bit > - Merge branch 'master' into regmask_bound > - Merge branch 'master' into regmask_bound > - C2: Optimize is_bound Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1486 From redestad at openjdk.java.net Mon Nov 30 08:21:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 30 Nov 2020 08:21:55 GMT Subject: Integrated: 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 20:12:47 GMT, Claes Redestad wrote: > This patch adds more tests for RegMask::is_bound* operations, and fixes a bug I introduced in JDK-8221404 where I mixed up the formula for how much we need to shift the "bit" when checking a split set. This pull request has now been integrated. Changeset: 9bcd2695 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/9bcd2695 Stats: 45 lines in 2 files changed: 42 ins; 0 del; 3 mod 8257221: C2: RegMask::is_bound_set split set handling broken since JDK-8221404 Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/1486 From rcastanedalo at openjdk.java.net Mon Nov 30 08:59:55 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Nov 2020 08:59:55 GMT Subject: RFR: 8257146: C2: extend the scope of StressGCM In-Reply-To: <-t5kGAKFJuUw0YyrDrDfH5U8mjHpVIocW2dzF6h7KdE=.e7b4d6f8-7db1-4761-ba63-e32849b559cd@github.com> References: <-t5kGAKFJuUw0YyrDrDfH5U8mjHpVIocW2dzF6h7KdE=.e7b4d6f8-7db1-4761-ba63-e32849b559cd@github.com> Message-ID: On Fri, 27 Nov 2020 13:18:27 GMT, Aleksey Shipilev wrote: > From the archives, the original discussion about StressGCM: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-February/009777.html > > It was explicitly putting the loads in lower frequency block only, because as Vladimir says: > > > There is an other old bug which you may hit if you randomize placement > > in gcm: 6831314. I still did not find a solution which does not > > introduce performance regression. What saves us is placing loads into > > "cheaper" low frequency block (which is most nested block where load's > > result is used). > > Is that not a problem anymore? Thanks for the pointer, Aleksey! The [JIRA bug report for 6831314](https://bugs.openjdk.java.net/browse/JDK-6831314?focusedCommentId=14128897) was closed in 2017 with Vladimir's comment _"Will not fix. As evaluation said it is only observed with modified C2 code"_, so this particular case does not seem to be a problem anymore. In any case, my reasoning is that it is good to know if there are other cases where C2 relies on GCM heuristics for correctness, regardless of what we do about it. @vnkozlov, what do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/1469 From thartmann at openjdk.java.net Mon Nov 30 09:58:01 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 09:58:01 GMT Subject: RFR: 8257398: Enhance debug output in Type::check_symmetrical Message-ID: When working on C2 typesystem changes for project Valhalla, I got annoyed by always having to add debug information for the "meet not commutative" failure in `Type::check_symmetrical`. We should add output similar to the "meet not symmetric" case. Thanks, Tobias ------------- Commit messages: - 8257398: Enhance debug output in Type::check_symmetrical Changes: https://git.openjdk.java.net/jdk/pull/1513/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1513&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257398 Stats: 14 lines in 1 file changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1513.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1513/head:pull/1513 PR: https://git.openjdk.java.net/jdk/pull/1513 From thartmann at openjdk.java.net Mon Nov 30 10:01:04 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 10:01:04 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 08:15:16 GMT, Roland Westrelin wrote: >> Currently the transformation of a long counted loop into a loop nest >> with an inner int counted loop is performed in 2 steps: >> >> 1- recognize the counted loop shape and build the loop nest >> 2- transform the inner loop into a counted loop >> >> I propose changing this to a 3 steps process: >> >> 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) >> 2- build the loop nest >> 3- transform the inner loop into a counted loop >> >> The benefits are: >> >> - the logic is cleaner because step 1 and 2 are now separated >> - Simple optimizations (loop iv type, empty loop elimination, parallel >> iv) can be implemented for LongCountedLoop by refactoring existing >> code >> >> 1- above is achieved by refactoring the >> PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the >> kind of counted loop, int or long). >> >> 2- is the existing loop nest construction logic. But now that it takes >> a LongCountedLoop as input, the shape of the loop is known to be that >> of a canonicalized counted loop. As a result, the loop nest >> construction is simpler. >> >> This change also refactors PhiNode::Value() so that it works for both >> CountedLoop and LongCountedLoop. >> >> I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type >> node) and: >> >> jlong init_p = (jlong)init_t->_lo + stride_con; >> if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) >> return false; // cyclic loop or this loop trips only once >> >> to: >> >> if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { >> >> because if the loop has a single iteration transforming it to a >> CountedLoop should allow the backedge to be optimized out. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - test failure > - Merge branch 'master' into JDK-8256655 > - assert fix > - Merge branch 'master' into JDK-8256655 > - build fixes > - fix trailing whitespace > - fix comment > - long counted loop refactoring Looks good to me. I'll re-run testing and report back once it finished. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1316 From thartmann at openjdk.java.net Mon Nov 30 10:12:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 10:12:00 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 08:15:16 GMT, Roland Westrelin wrote: >> Currently the transformation of a long counted loop into a loop nest >> with an inner int counted loop is performed in 2 steps: >> >> 1- recognize the counted loop shape and build the loop nest >> 2- transform the inner loop into a counted loop >> >> I propose changing this to a 3 steps process: >> >> 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) >> 2- build the loop nest >> 3- transform the inner loop into a counted loop >> >> The benefits are: >> >> - the logic is cleaner because step 1 and 2 are now separated >> - Simple optimizations (loop iv type, empty loop elimination, parallel >> iv) can be implemented for LongCountedLoop by refactoring existing >> code >> >> 1- above is achieved by refactoring the >> PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the >> kind of counted loop, int or long). >> >> 2- is the existing loop nest construction logic. But now that it takes >> a LongCountedLoop as input, the shape of the loop is known to be that >> of a canonicalized counted loop. As a result, the loop nest >> construction is simpler. >> >> This change also refactors PhiNode::Value() so that it works for both >> CountedLoop and LongCountedLoop. >> >> I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type >> node) and: >> >> jlong init_p = (jlong)init_t->_lo + stride_con; >> if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) >> return false; // cyclic loop or this loop trips only once >> >> to: >> >> if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { >> >> because if the loop has a single iteration transforming it to a >> CountedLoop should allow the backedge to be optimized out. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - test failure > - Merge branch 'master' into JDK-8256655 > - assert fix > - Merge branch 'master' into JDK-8256655 > - build fixes > - fix trailing whitespace > - fix comment > - long counted loop refactoring Changes requested by thartmann (Reviewer). src/hotspot/share/opto/loopnode.cpp line 1243: > 1241: if (x->in(LoopNode::LoopBackControl)->Opcode() == Op_SafePoint && > 1242: (iv_bt == T_INT && LoopStripMiningIter != 0) || > 1243: iv_bt == T_LONG) { This fails to build on mac: 51e416a91494/workspace/open/src/hotspot/share/opto/loopnode.cpp:1241:66: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses] [2020-11-30T10:06:24,243Z] if (x->in(LoopNode::LoopBackControl)->Opcode() == Op_SafePoint && [2020-11-30T10:06:24,243Z] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From thartmann at openjdk.java.net Mon Nov 30 10:15:54 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 10:15:54 GMT Subject: RFR: 8257057: C2: Improve safepoint processing during vector scalarization pass In-Reply-To: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> References: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> Message-ID: On Thu, 26 Nov 2020 13:14:26 GMT, Vladimir Ivanov wrote: > Cast nodes (`CheckCastPP`/`CastPP`) hinders scalarization of vectors since they aren't taken into account when affected safepoints are enumerated. > > Proposed fix implements a reversed variant of `Node::uncast()` to find all safepoints which have vectors referenced from their debug info and then uses `Node::uncast()` when iterating over debug edges. It is safe to ignore cast nodes (even the ones which carry control dependency): `VectorBox` already contains the most precise type information and the vector value it represents is immutable. So, it's safe to replace a fully constructed boxed vector instance with the vector value it contains and rematerialize the equivalent box instance if deoptimization happens. > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1456 From thartmann at openjdk.java.net Mon Nov 30 10:31:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 10:31:00 GMT Subject: RFR: 8257165: C2: Improve box elimination for vector masks and shuffles In-Reply-To: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> References: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> Message-ID: On Thu, 26 Nov 2020 13:17:05 GMT, Vladimir Ivanov wrote: > Introduce VectorMask/VectorShuffle-specific transformations to reduce reboxing by eliminating `VectorBox`/`VectorUnbox` pairs. > > It's a trivial transformation when the types on both ends perfectly match, but when type mismatch occurs there are additional steps needed (see `PhaseVector::expand_vunbox_node()` for more details on vector unboxing) . > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 Looks reasonable. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1457 From vlivanov at openjdk.java.net Mon Nov 30 10:33:52 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 10:33:52 GMT Subject: RFR: 8257057: C2: Improve safepoint processing during vector scalarization pass In-Reply-To: References: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> Message-ID: <59LkBJAieBBqEeUjJPPj1YqWw-gVWp-8KmveMYVutwo=.7be4792e-31b8-4c29-82ed-f64f0f046a1b@github.com> On Mon, 30 Nov 2020 10:12:48 GMT, Tobias Hartmann wrote: >> Cast nodes (`CheckCastPP`/`CastPP`) hinders scalarization of vectors since they aren't taken into account when affected safepoints are enumerated. >> >> Proposed fix implements a reversed variant of `Node::uncast()` to find all safepoints which have vectors referenced from their debug info and then uses `Node::uncast()` when iterating over debug edges. It is safe to ignore cast nodes (even the ones which carry control dependency): `VectorBox` already contains the most precise type information and the vector value it represents is immutable. So, it's safe to replace a fully constructed boxed vector instance with the vector value it contains and rematerialize the equivalent box instance if deoptimization happens. >> >> Testing: >> - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); >> - tier1-4 > > Looks good. Thanks for the reviews, Vladimir and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/1456 From vlivanov at openjdk.java.net Mon Nov 30 10:33:53 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 10:33:53 GMT Subject: Integrated: 8257057: C2: Improve safepoint processing during vector scalarization pass In-Reply-To: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> References: <0ZEXiwhY-ddrUID4J_az-SOz2MMPDjGcduqxD-WZa6c=.be138f38-db20-45fd-b299-22f2023b0645@github.com> Message-ID: On Thu, 26 Nov 2020 13:14:26 GMT, Vladimir Ivanov wrote: > Cast nodes (`CheckCastPP`/`CastPP`) hinders scalarization of vectors since they aren't taken into account when affected safepoints are enumerated. > > Proposed fix implements a reversed variant of `Node::uncast()` to find all safepoints which have vectors referenced from their debug info and then uses `Node::uncast()` when iterating over debug edges. It is safe to ignore cast nodes (even the ones which carry control dependency): `VectorBox` already contains the most precise type information and the vector value it represents is immutable. So, it's safe to replace a fully constructed boxed vector instance with the vector value it contains and rematerialize the equivalent box instance if deoptimization happens. > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 This pull request has now been integrated. Changeset: 4e55d0f3 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/4e55d0f3 Stats: 20 lines in 1 file changed: 9 ins; 0 del; 11 mod 8257057: C2: Improve safepoint processing during vector scalarization pass Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1456 From thartmann at openjdk.java.net Mon Nov 30 10:38:57 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 10:38:57 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 00:07:53 GMT, Xin Liu wrote: > 8257190: simplify PhaseIdealLoop constructors Otherwise looks good to me. src/hotspot/share/opto/loopnode.hpp line 955: > 953: // Perform verification that the graph is valid when verify_me is nullptr > 954: // Verify that verify_me made the same decisions as a fresh run. > 955: PhaseIdealLoop(PhaseIterGVN& igvn, const PhaseIdealLoop* verify_me = nullptr) : I think the comment should be something like: // Verify that verify_me made the same decisions as a fresh run // or only verify that the graph is valid if verify_me is null. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1473 From vlivanov at openjdk.java.net Mon Nov 30 10:39:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 10:39:57 GMT Subject: RFR: 8257398: Enhance debug output in Type::check_symmetrical In-Reply-To: References: Message-ID: <-Gj_AQa-MqJW48zrckzY8gCNuddJgz7g2WNUqO96ZME=.d18a755c-c4ea-457a-8a7e-0ffeecd8dfbc@github.com> On Mon, 30 Nov 2020 09:52:32 GMT, Tobias Hartmann wrote: > When working on C2 typesystem changes for project Valhalla, I got annoyed by always having to add debug information for the "meet not commutative" failure in `Type::check_symmetrical`. We should add output similar to the "meet not symmetric" case. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1513 From vlivanov at openjdk.java.net Mon Nov 30 10:39:59 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 10:39:59 GMT Subject: Integrated: 8257165: C2: Improve box elimination for vector masks and shuffles In-Reply-To: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> References: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> Message-ID: On Thu, 26 Nov 2020 13:17:05 GMT, Vladimir Ivanov wrote: > Introduce VectorMask/VectorShuffle-specific transformations to reduce reboxing by eliminating `VectorBox`/`VectorUnbox` pairs. > > It's a trivial transformation when the types on both ends perfectly match, but when type mismatch occurs there are additional steps needed (see `PhaseVector::expand_vunbox_node()` for more details on vector unboxing) . > > Testing: > - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); > - tier1-4 This pull request has now been integrated. Changeset: 337d7bce Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/337d7bce Stats: 103 lines in 3 files changed: 73 ins; 13 del; 17 mod 8257165: C2: Improve box elimination for vector masks and shuffles Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/1457 From vlivanov at openjdk.java.net Mon Nov 30 10:39:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 10:39:58 GMT Subject: RFR: 8257165: C2: Improve box elimination for vector masks and shuffles In-Reply-To: References: <5nh33mpKNP1f9vJgyTuQoP4jCtyGXMwBYxGmard9SzE=.267e0496-ee91-43e8-9b93-832f4c663697@github.com> Message-ID: On Mon, 30 Nov 2020 10:27:42 GMT, Tobias Hartmann wrote: >> Introduce VectorMask/VectorShuffle-specific transformations to reduce reboxing by eliminating `VectorBox`/`VectorUnbox` pairs. >> >> It's a trivial transformation when the types on both ends perfectly match, but when type mismatch occurs there are additional steps needed (see `PhaseVector::expand_vunbox_node()` for more details on vector unboxing) . >> >> Testing: >> - `jdk/incubator/vector` tests w/ different flag combinations (no flags, `-Xcomp`, `-XX:+DeoptimizeALot`); >> - tier1-4 > > Looks reasonable. Thanks for the reviews, Vladimir and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/1457 From chagedorn at openjdk.java.net Mon Nov 30 10:56:55 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 30 Nov 2020 10:56:55 GMT Subject: RFR: 8257398: Enhance debug output in Type::check_symmetrical In-Reply-To: References: Message-ID: <8sAge3IkLwUzL0X7SCrQkH5MXKCB70J1bobz1HUacUA=.3f52d4c8-2c50-47c6-bf70-f3518802f1df@github.com> On Mon, 30 Nov 2020 09:52:32 GMT, Tobias Hartmann wrote: > When working on C2 typesystem changes for project Valhalla, I got annoyed by always having to add debug information for the "meet not commutative" failure in `Type::check_symmetrical`. We should add output similar to the "meet not symmetric" case. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1513 From roland at openjdk.java.net Mon Nov 30 11:56:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 30 Nov 2020 11:56:58 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected [v2] In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Fri, 27 Nov 2020 08:55:10 GMT, Christian Hagedorn wrote: >> `test1()` fails while creating pre/main/post loops when copying skeleton predicates to the main loop. The problem is that we find phi nodes when following a skeleton `Opaque4` bool node, when trying to find the `OpaqueLoopInit` and `OpaqueLoopStride` nodes. This is unexpected and lets the assertion fail. This happens due to the following reason: >> >> An inner loop of a nested loop is first unswitched and the skeleton predicates are copied to the slow and fast loop by just creating a new `If` node that share the same `Opaque4` node: >> https://github.com/openjdk/jdk/blob/973255c469d794afe8ee74b24ddb5048bfcaadf7/src/hotspot/share/opto/loopPredicate.cpp#L268-L273 >> >> The loop tree building algorithm recognizes both loops as children of the parent loop: >> Loop: N0/N0 has_sfpt >> Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt >> Loop: N459/N458 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (slow loop) >> Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (fast loop) >> >> Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: >> Loop: N0/N0 has_sfpt >> Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt >> [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] >> Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5274 iters) has_sfpt >> >> Now the original parent loop (N338) gets peeled. The fast and the slow loop still both share skeleton `Opaque4` bool nodes with all their inputs nodes up to and including the `OpaqueLoopInit/Stride` nodes. Let's look at one of the skeleton `If` nodes for the fast loop that uses such a `Opaque4` node. The skeleton `If` is no longer part of the original parent loop and is therefore not peeled. But now we need some phi nodes to select the correct nodes either from the peeled iteration or from N338 for this skeleton `If` of the fast loop. This is done in `PhaseIdealLoop::clone_iff()` which creates a new `Opaque4` node together with new `Bool` and `Cmp` nodes and then inserts some phi nodes to do the selection. >> >> When afterwards creating pre/main/post loops for the fast loop (N343) that is no child anymore, we find these phi nodes on the path to the `OpaqueLoopInit/Stride` nodes which lets the assertion fail. A more detailed explanation why this happens can be find at `test1()` in [TestUnswitchCloneSkeletonPredicates](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L51). >> >> I propose to copy the skeleton predicates to the unswitched loops in the same way as we copy the skeleton predicates to the main loop by cloning all the nodes on the path to the` OpaqueLoopInit/Stride` nodes with a small adaptation: We should also copy the `OpaqueLoopInit/Stride` nodes and we should keep the uncommon traps because we only want to replace them by `Halt` nodes once we create pre/main/post loops. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add back accidentally removed execution of test2-5. Marked as reviewed by roland (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From roland at openjdk.java.net Mon Nov 30 11:56:59 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 30 Nov 2020 11:56:59 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected [v2] In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Mon, 30 Nov 2020 11:53:01 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Add back accidentally removed execution of test2-5. > > Marked as reviewed by roland (Reviewer). > Could that be improved? But as of the current implementation, the `Opaque4` node with its inputs is shared between a fast and a slow loop. So we could not remove the `Opaque4` node when either the fast or slow loop does not need it anymore while the other one does. Right. Yes, I think it would make sense to remove the skeleton predicates that become useless. Anyway, your fix looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1448 From roland at openjdk.java.net Mon Nov 30 12:06:10 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 30 Nov 2020 12:06:10 GMT Subject: RFR: 8256655: rework long counted loop handling [v5] In-Reply-To: References: Message-ID: > Currently the transformation of a long counted loop into a loop nest > with an inner int counted loop is performed in 2 steps: > > 1- recognize the counted loop shape and build the loop nest > 2- transform the inner loop into a counted loop > > I propose changing this to a 3 steps process: > > 1- recognize the counted loop shape (transform the loop into a LongCountedLoop) > 2- build the loop nest > 3- transform the inner loop into a counted loop > > The benefits are: > > - the logic is cleaner because step 1 and 2 are now separated > - Simple optimizations (loop iv type, empty loop elimination, parallel > iv) can be implemented for LongCountedLoop by refactoring existing > code > > 1- above is achieved by refactoring the > PhaseIdealLoop::is_counted_loop() so it takes an extra parameter (the > kind of counted loop, int or long). > > 2- is the existing loop nest construction logic. But now that it takes > a LongCountedLoop as input, the shape of the loop is known to be that > of a canonicalized counted loop. As a result, the loop nest > construction is simpler. > > This change also refactors PhiNode::Value() so that it works for both > CountedLoop and LongCountedLoop. > > I also changed ConvL2INode to be a TypeNode (ConvI2LNode is a type > node) and: > > jlong init_p = (jlong)init_t->_lo + stride_con; > if (init_p > (jlong)max_jint || init_p > (jlong)limit_t->_hi) > return false; // cyclic loop or this loop trips only once > > to: > > if (init_t->lo_as_long() > max_signed_integer(iv_bt) - stride_con) { > > because if the loop has a single iteration transforming it to a > CountedLoop should allow the backedge to be optimized out. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: build fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1316/files - new: https://git.openjdk.java.net/jdk/pull/1316/files/bd4570a7..e0ffd5d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1316&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1316.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1316/head:pull/1316 PR: https://git.openjdk.java.net/jdk/pull/1316 From redestad at openjdk.java.net Mon Nov 30 13:06:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 30 Nov 2020 13:06:08 GMT Subject: RFR: 8257223: C2: Optimize RegMask::is_bound Message-ID: - Avoid Size() for is_bound1 (does relatively expensive computations) - Refactor loops to more efficiently check that the remainder of the mask is empty after a match. (Using the same structure helped gcc merge the control flows in `is_bound` for an additional benefit) This removes ~80k (~1%) of the C2 bootstrap overhead, and reduces time spent doing Register_Allocate in SimpleRepeatCompilation.largeMethod by about 0.4%. There's a small but statistically insignificant effect (~0.2%) on the score, too: Benchmark Mode Cnt Score Error Units SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8119.278 ? 37.803 ms/op SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8099.331 ? 33.159 ms/op ------------- Commit messages: - 8257223: C2: Optimize RegMask::is_bound Changes: https://git.openjdk.java.net/jdk/pull/1515/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1515&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257223 Stats: 67 lines in 1 file changed: 45 ins; 3 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/1515.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1515/head:pull/1515 PR: https://git.openjdk.java.net/jdk/pull/1515 From pliden at openjdk.java.net Mon Nov 30 13:10:02 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 30 Nov 2020 13:10:02 GMT Subject: RFR: 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode Message-ID: Please review this trivial cleanup, which renames the "barrier data" member in `MemNode` and `LoadStoreNode` from `_barrier` to `_barrier_data`, to better match the names of the setters and getters (`barrier_data()` and `set_barrier_data()`). ------------- Commit messages: - 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode Changes: https://git.openjdk.java.net/jdk/pull/1516/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1516&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257418 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1516.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1516/head:pull/1516 PR: https://git.openjdk.java.net/jdk/pull/1516 From vlivanov at openjdk.java.net Mon Nov 30 13:17:56 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 13:17:56 GMT Subject: RFR: 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 13:05:18 GMT, Per Liden wrote: > Please review this trivial cleanup, which renames the "barrier data" member in `MemNode` and `LoadStoreNode` from `_barrier` to `_barrier_data`, to better match the names of the setters and getters (`barrier_data()` and `set_barrier_data()`). Looks good and trivial ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1516 From coleenp at openjdk.java.net Mon Nov 30 13:45:59 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 13:45:59 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: References: Message-ID: <7en56aB_uqgEGsRpvX-V26aDS28pbMEJhgoRxHRd__g=.4472ecb2-5579-438c-bec6-9a1a3ad4c566@github.com> On Fri, 27 Nov 2020 08:05:09 GMT, Vladimir Ivanov wrote: > `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. > But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. > The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. > > Proposed fix is to run Mutex destructor right away. > > Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. > > In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. > > Testing: > - [x] verified that no memory leak observed with the reported test > - [x] tier1-4 > > [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html Changes requested by coleenp (Reviewer). src/hotspot/share/oops/methodData.cpp line 1208: > 1206: MethodData::MethodData(ciMethodData& data) > 1207: : _extra_data_lock(Mutex::leaf, "unused") { > 1208: _extra_data_lock.~Mutex(); // release allocated resources before zeroing So if _extra_data_lock was a pointer to Mutex, then this statement would be simply "delete _extra_data_lock;" instead, but we'd still have this copy constructor, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From coleenp at openjdk.java.net Mon Nov 30 13:46:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 13:46:02 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: <7en56aB_uqgEGsRpvX-V26aDS28pbMEJhgoRxHRd__g=.4472ecb2-5579-438c-bec6-9a1a3ad4c566@github.com> References: <7en56aB_uqgEGsRpvX-V26aDS28pbMEJhgoRxHRd__g=.4472ecb2-5579-438c-bec6-9a1a3ad4c566@github.com> Message-ID: <6Do6NGKfK_bDrZPMvfrBscLfj2J9OtFjQNEBrXL-Fbo=.8fd00cf4-4247-4c44-af36-723746d6b26c@github.com> On Mon, 30 Nov 2020 13:39:53 GMT, Coleen Phillimore wrote: >> `ciMethodData` embeds `MethodData` to snapshot a state from the original `MethodData` instance. >> But `MethodData` embeds a `Mutex` which allocates a platform-specific implementation on C-heap. >> The `Mutex` is overwritten with `0`s, but the resources aren't deallocated, so the leak occurs. >> >> Proposed fix is to run Mutex destructor right away. >> >> Initially, I thought about switching to `Mutex*`, but then I found that Coleen already tried that and observed a performance regression [1]. So, for now I chose the conservative approach. >> >> In the longer term, I would consider replacing `MethodData::_extra_data_lock` with a lock-free scheme. Having a lock-per-MDO looks kind of excessive. >> >> Testing: >> - [x] verified that no memory leak observed with the reported test >> - [x] tier1-4 >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-October/039783.html > > src/hotspot/share/oops/methodData.cpp line 1208: > >> 1206: MethodData::MethodData(ciMethodData& data) >> 1207: : _extra_data_lock(Mutex::leaf, "unused") { >> 1208: _extra_data_lock.~Mutex(); // release allocated resources before zeroing > > So if _extra_data_lock was a pointer to Mutex, then this statement would be simply "delete _extra_data_lock;" instead, but we'd still have this copy constructor, right? Why isn't this a regular copy constructor taking MethodData, and pass this._orig instead and leave the zeroing in the caller? I don't like that oops/methodData knows about ciMethodData. ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From vlivanov at openjdk.java.net Mon Nov 30 13:59:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 30 Nov 2020 13:59:03 GMT Subject: RFR: 8252049: Native memory leak in ciMethodData ctor In-Reply-To: <6Do6NGKfK_bDrZPMvfrBscLfj2J9OtFjQNEBrXL-Fbo=.8fd00cf4-4247-4c44-af36-723746d6b26c@github.com> References: <7en56aB_uqgEGsRpvX-V26aDS28pbMEJhgoRxHRd__g=.4472ecb2-5579-438c-bec6-9a1a3ad4c566@github.com> <6Do6NGKfK_bDrZPMvfrBscLfj2J9OtFjQNEBrXL-Fbo=.8fd00cf4-4247-4c44-af36-723746d6b26c@github.com> Message-ID: On Mon, 30 Nov 2020 13:41:23 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/methodData.cpp line 1208: >> >>> 1206: MethodData::MethodData(ciMethodData& data) >>> 1207: : _extra_data_lock(Mutex::leaf, "unused") { >>> 1208: _extra_data_lock.~Mutex(); // release allocated resources before zeroing >> >> So if _extra_data_lock was a pointer to Mutex, then this statement would be simply "delete _extra_data_lock;" instead, but we'd still have this copy constructor, right? > > Why isn't this a regular copy constructor taking MethodData, and pass this._orig instead and leave the zeroing in the caller? I don't like that oops/methodData knows about ciMethodData. If `_extra_data_lock` were a pointer, then `MethodData::MethodData` would just `_extra_data_lock(NULL)`. Regarding where to put zeroing, I'm fine doing it either way. Just want to give a try the idea David and Kim proposed with separating the lock from MDO (w/ superclass trick). ------------- PR: https://git.openjdk.java.net/jdk/pull/1478 From pliden at openjdk.java.net Mon Nov 30 14:19:56 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 30 Nov 2020 14:19:56 GMT Subject: RFR: 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 13:15:23 GMT, Vladimir Ivanov wrote: >> Please review this trivial cleanup, which renames the "barrier data" member in `MemNode` and `LoadStoreNode` from `_barrier` to `_barrier_data`, to better match the names of the setters and getters (`barrier_data()` and `set_barrier_data()`). > > Looks good and trivial Thanks for reviewing, @iwanowww! ------------- PR: https://git.openjdk.java.net/jdk/pull/1516 From pliden at openjdk.java.net Mon Nov 30 14:19:58 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 30 Nov 2020 14:19:58 GMT Subject: Integrated: 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 13:05:18 GMT, Per Liden wrote: > Please review this trivial cleanup, which renames the "barrier data" member in `MemNode` and `LoadStoreNode` from `_barrier` to `_barrier_data`, to better match the names of the setters and getters (`barrier_data()` and `set_barrier_data()`). This pull request has now been integrated. Changeset: e3abe51a Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/e3abe51a Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1516 From thartmann at openjdk.java.net Mon Nov 30 14:44:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 30 Nov 2020 14:44:55 GMT Subject: RFR: 8257398: Enhance debug output in Type::check_symmetrical In-Reply-To: <-Gj_AQa-MqJW48zrckzY8gCNuddJgz7g2WNUqO96ZME=.d18a755c-c4ea-457a-8a7e-0ffeecd8dfbc@github.com> References: <-Gj_AQa-MqJW48zrckzY8gCNuddJgz7g2WNUqO96ZME=.d18a755c-c4ea-457a-8a7e-0ffeecd8dfbc@github.com> Message-ID: On Mon, 30 Nov 2020 10:37:21 GMT, Vladimir Ivanov wrote: >> When working on C2 typesystem changes for project Valhalla, I got annoyed by always having to add debug information for the "meet not commutative" failure in `Type::check_symmetrical`. We should add output similar to the "meet not symmetric" case. >> >> Thanks, >> Tobias > > Looks good. @iwanowww, @chhagedorn thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1513 From roland at openjdk.java.net Mon Nov 30 15:25:01 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 30 Nov 2020 15:25:01 GMT Subject: RFR: 8256655: rework long counted loop handling [v4] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 10:09:28 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - test failure >> - Merge branch 'master' into JDK-8256655 >> - assert fix >> - Merge branch 'master' into JDK-8256655 >> - build fixes >> - fix trailing whitespace >> - fix comment >> - long counted loop refactoring > > src/hotspot/share/opto/loopnode.cpp line 1243: > >> 1241: if (x->in(LoopNode::LoopBackControl)->Opcode() == Op_SafePoint && >> 1242: (iv_bt == T_INT && LoopStripMiningIter != 0) || >> 1243: iv_bt == T_LONG) { > > This fails to build on mac: > 51e416a91494/workspace/open/src/hotspot/share/opto/loopnode.cpp:1241:66: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses] > [2020-11-30T10:06:24,243Z] if (x->in(LoopNode::LoopBackControl)->Opcode() == Op_SafePoint && > [2020-11-30T10:06:24,243Z] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ I pushed a fix for that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1316 From kvn at openjdk.java.net Mon Nov 30 17:43:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 17:43:59 GMT Subject: RFR: 8257232: CompileThresholdScaling fails to work on 32-bit platforms In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 23:34:23 GMT, Jie Fu wrote: > Hi all, > > CompileThresholdScaling is incorrect on 32-bit platforms. > > If you run the following command on Linux-32: > java -XX:CompileThresholdScaling=0.75 -version > It gets the following unexpected warnings: > intx Tier0InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier0BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier2InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier2BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier3InvokeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier3BackedgeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > intx Tier23InlineeNotifyFreqLog=32 is outside the allowed range [ 0 ... 30 ] > > The failure is that nth_bit(max_freq_bits) [1] = nth_bit(32) [2] = 0 on 32-bit platforms. > So the scaling logic is wrong. > It would be better to fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L125 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L973 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1499 From kvn at openjdk.java.net Mon Nov 30 18:04:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 18:04:55 GMT Subject: RFR: 8257146: C2: extend the scope of StressGCM In-Reply-To: References: <-t5kGAKFJuUw0YyrDrDfH5U8mjHpVIocW2dzF6h7KdE=.e7b4d6f8-7db1-4761-ba63-e32849b559cd@github.com> Message-ID: On Mon, 30 Nov 2020 08:57:39 GMT, Roberto Casta?eda Lozano wrote: >> From the archives, the original discussion about StressGCM: >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-February/009777.html >> >> It was explicitly putting the loads in lower frequency block only, because as Vladimir says: >> > There is an other old bug which you may hit if you randomize placement >> > in gcm: 6831314. I still did not find a solution which does not >> > introduce performance regression. What saves us is placing loads into >> > "cheaper" low frequency block (which is most nested block where load's >> > result is used). >> >> Is that not a problem anymore? > >> From the archives, the original discussion about StressGCM: >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-February/009777.html >> >> It was explicitly putting the loads in lower frequency block only, because as Vladimir says: >> >> > There is an other old bug which you may hit if you randomize placement >> > in gcm: 6831314. I still did not find a solution which does not >> > introduce performance regression. What saves us is placing loads into >> > "cheaper" low frequency block (which is most nested block where load's >> > result is used). >> >> Is that not a problem anymore? > > Thanks for the pointer, Aleksey! > > The [JIRA bug report for 6831314](https://bugs.openjdk.java.net/browse/JDK-6831314?focusedCommentId=14128897) was closed in 2017 with Vladimir's comment _"Will not fix. As evaluation said it is only observed with modified C2 code"_, so this particular case does not seem to be a problem anymore. In any case, my reasoning is that it is good to know if there are other cases where C2 relies on GCM heuristics for correctness, regardless of what we do about it. > > @vnkozlov, what do you think? First, StressGCM is not used in production only for testing - it should not affect released JDK. Second, I think it would be nice to try reproduce 6831314 issue again with latest JDK. We did several changes with nodes types casting which may fixed it. And it should not prevent us from widening the scope of testing with StressGCM to find other issues now. When @shipilev implemented this flag first, my concern was that we would have to spend a lot of resources to clean up found issues. Now we have more engineers and we should do the clean up. @robcasloz I suggest additionally to run hs-precheckin-comp with StressGCM ON. ------------- PR: https://git.openjdk.java.net/jdk/pull/1469 From kvn at openjdk.java.net Mon Nov 30 18:15:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 18:15:56 GMT Subject: RFR: 8253644: C2: assert(skeleton_predicate_has_opaque(iff)) failed: unexpected [v2] In-Reply-To: References: <3WOtQrecrvY6WH3zeLq5xpVVzlhWV4hZ741wGWEayLg=.dbb57162-35c2-4bde-91e2-bdc9f29faaf3@github.com> Message-ID: On Fri, 27 Nov 2020 08:55:10 GMT, Christian Hagedorn wrote: >> `test1()` fails while creating pre/main/post loops when copying skeleton predicates to the main loop. The problem is that we find phi nodes when following a skeleton `Opaque4` bool node, when trying to find the `OpaqueLoopInit` and `OpaqueLoopStride` nodes. This is unexpected and lets the assertion fail. This happens due to the following reason: >> >> An inner loop of a nested loop is first unswitched and the skeleton predicates are copied to the slow and fast loop by just creating a new `If` node that share the same `Opaque4` node: >> https://github.com/openjdk/jdk/blob/973255c469d794afe8ee74b24ddb5048bfcaadf7/src/hotspot/share/opto/loopPredicate.cpp#L268-L273 >> >> The loop tree building algorithm recognizes both loops as children of the parent loop: >> Loop: N0/N0 has_sfpt >> Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt >> Loop: N459/N458 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (slow loop) >> Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5271 iters) has_sfpt (fast loop) >> >> Some additional optimizations are then applied such that the fast loop no longer has any backedge/path to the parent loop while the slow loop still has. As a result, the loop tree building algorithm only recognizes the slow loop as child while the fast loop is not. The fast loop is treated as a separate loop on the same level as the parent loop: >> Loop: N0/N0 has_sfpt >> Loop: N338/N314 limit_check profile_predicated predicated counted [0,100),+1 (2 iters) has_sfpt >> [N459, but the loop could actually be removed in the meantime but the skeleton predicates are still there] >> Loop: N343/N267 profile_predicated predicated counted [0,10000),+1 (5274 iters) has_sfpt >> >> Now the original parent loop (N338) gets peeled. The fast and the slow loop still both share skeleton `Opaque4` bool nodes with all their inputs nodes up to and including the `OpaqueLoopInit/Stride` nodes. Let's look at one of the skeleton `If` nodes for the fast loop that uses such a `Opaque4` node. The skeleton `If` is no longer part of the original parent loop and is therefore not peeled. But now we need some phi nodes to select the correct nodes either from the peeled iteration or from N338 for this skeleton `If` of the fast loop. This is done in `PhaseIdealLoop::clone_iff()` which creates a new `Opaque4` node together with new `Bool` and `Cmp` nodes and then inserts some phi nodes to do the selection. >> >> When afterwards creating pre/main/post loops for the fast loop (N343) that is no child anymore, we find these phi nodes on the path to the `OpaqueLoopInit/Stride` nodes which lets the assertion fail. A more detailed explanation why this happens can be find at `test1()` in [TestUnswitchCloneSkeletonPredicates](https://github.com/chhagedorn/jdk/blob/cfc7c941ddefd716c9f07bf7e8f4824f00bb7e9b/test/hotspot/jtreg/compiler/loopopts/TestUnswitchCloneSkeletonPredicates.java#L51). >> >> I propose to copy the skeleton predicates to the unswitched loops in the same way as we copy the skeleton predicates to the main loop by cloning all the nodes on the path to the` OpaqueLoopInit/Stride` nodes with a small adaptation: We should also copy the `OpaqueLoopInit/Stride` nodes and we should keep the uncommon traps because we only want to replace them by `Halt` nodes once we create pre/main/post loops. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add back accidentally removed execution of test2-5. Good. I suggest to merge latest source and re-test again. I am concern that some Git testing failed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1448 From psandoz at openjdk.java.net Mon Nov 30 18:56:58 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 30 Nov 2020 18:56:58 GMT Subject: Integrated: 8256995: [vector] Improve broadcast operations In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 00:06:23 GMT, Paul Sandoz wrote: > Use the raw FP to bits conversion methods (to avoid NaN checks). Improve the x64 code generation when broadcasting a value. This pull request has now been integrated. Changeset: 89690699 Author: Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/89690699 Stats: 20 lines in 4 files changed: 17 ins; 0 del; 3 mod 8256995: [vector] Improve broadcast operations Co-authored-by: Paul Sandoz Co-authored-by: Sandhya Viswanathan Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1445 From kvn at openjdk.java.net Mon Nov 30 19:10:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 19:10:56 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 00:07:53 GMT, Xin Liu wrote: > 8257190: simplify PhaseIdealLoop constructors Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1473 From kvn at openjdk.java.net Mon Nov 30 19:26:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 19:26:57 GMT Subject: RFR: 8257223: C2: Optimize RegMask::is_bound In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 13:00:15 GMT, Claes Redestad wrote: > - Avoid Size() for is_bound1 (does relatively expensive computations) > - Refactor loops to more efficiently check that the remainder of the mask is empty after a match. (Using the same structure helped gcc merge the control flows in `is_bound` for an additional benefit) > > This removes ~80k (~1%) of the C2 bootstrap overhead, and reduces time spent doing Register_Allocate in SimpleRepeatCompilation.largeMethod by about 0.4%. There's a small but statistically insignificant effect (~0.2%) on the score, too: > > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8119.278 ? 37.803 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8099.331 ? 33.159 ms/op Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1515 From kvn at openjdk.java.net Mon Nov 30 19:35:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 19:35:56 GMT Subject: RFR: 8257164: Share LambdaForms for VH linkers/invokers. In-Reply-To: References: Message-ID: <53n9zen0qzgiJdKDxHiVUIOPu_1waGe0pNsuVLpFdt4=.3c3facaf-1ce4-4152-a196-ccec11614d9f@github.com> On Thu, 26 Nov 2020 13:13:43 GMT, Vladimir Ivanov wrote: > Introduce sharing of `LambdaForms` for `VarHandle` linkers and invokers. > It reduces the number of LambdaForms needed at runtime. > > Testing: tier1-4 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1455 From psandoz at openjdk.java.net Mon Nov 30 19:35:58 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 30 Nov 2020 19:35:58 GMT Subject: RFR: 8257189: Handle concurrent updates of MH.form better In-Reply-To: <1-iH7fvNLXAo2FzbAihuQV8PbDYiZsIHqZwPIcb6HxE=.d89f60fd-aad6-45bb-8704-363e299f96d8@github.com> References: <1-iH7fvNLXAo2FzbAihuQV8PbDYiZsIHqZwPIcb6HxE=.d89f60fd-aad6-45bb-8704-363e299f96d8@github.com> Message-ID: On Thu, 26 Nov 2020 21:23:16 GMT, Vladimir Ivanov wrote: > Concurrent updates may lead to redundant LambdaForms created and unnecessary class loading when those are compiled. > > Most notably, it severely affects MethodHandle customization: when a MethodHandle is called from multiple threads, every thread starts customization which takes enough time for other threads to join, but only one of those customizations will be picked. > > Coordination between threads requesting the updates and letting a single thread proceed avoids the aforementioned problem. Moreover, there's no need to wait until the update in-flight is over: all other threads (except the one performing the update) can just proceed with the invocation using the existing MH.form. > > Testing: > - manually monitored the behavior on a stress test from [JDK-8252049](https://bugs.openjdk.java.net/browse/JDK-8252049) > - tier1-4 Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1472 From xliu at openjdk.java.net Mon Nov 30 19:39:20 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 30 Nov 2020 19:39:20 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors [v2] In-Reply-To: References: Message-ID: > 8257190: simplify PhaseIdealLoop constructors Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8257190 - 8257190: simplify PhaseIdealLoop constructors update the comments of the verify constructors. - 8257190: simplify PhaseIdealLoop constructors ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1473/files - new: https://git.openjdk.java.net/jdk/pull/1473/files/738546fb..3731d0b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1473&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1473&range=00-01 Stats: 9458 lines in 216 files changed: 6272 ins; 2298 del; 888 mod Patch: https://git.openjdk.java.net/jdk/pull/1473.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1473/head:pull/1473 PR: https://git.openjdk.java.net/jdk/pull/1473 From xliu at openjdk.java.net Mon Nov 30 19:39:22 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 30 Nov 2020 19:39:22 GMT Subject: RFR: 8257190: simplify PhaseIdealLoop constructors [v2] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 10:35:50 GMT, Tobias Hartmann wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8257190 >> - 8257190: simplify PhaseIdealLoop constructors >> >> update the comments of the verify constructors. >> - 8257190: simplify PhaseIdealLoop constructors > > src/hotspot/share/opto/loopnode.hpp line 955: > >> 953: // Perform verification that the graph is valid when verify_me is nullptr >> 954: // Verify that verify_me made the same decisions as a fresh run. >> 955: PhaseIdealLoop(PhaseIterGVN& igvn, const PhaseIdealLoop* verify_me = nullptr) : > > I think the comment should be something like: > // Verify that verify_me made the same decisions as a fresh run > // or only verify that the graph is valid if verify_me is null. hi, Tobias, thanks for reviewing it. I update the comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/1473 From psandoz at openjdk.java.net Mon Nov 30 19:41:53 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 30 Nov 2020 19:41:53 GMT Subject: RFR: 8257164: Share LambdaForms for VH linkers/invokers. In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:13:43 GMT, Vladimir Ivanov wrote: > Introduce sharing of `LambdaForms` for `VarHandle` linkers and invokers. > It reduces the number of LambdaForms needed at runtime. > > Testing: tier1-4 Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1455 From kvn at openjdk.java.net Mon Nov 30 19:47:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 19:47:59 GMT Subject: RFR: 8257143: Enable JVMCI code installation tests on AArch64 In-Reply-To: References: Message-ID: <6y4iXOkeXEXWgXvx38icTMhlyO9rvLLNyNlke4wYhO4=.becea196-bfdb-4582-a15e-6c00e28a04a8@github.com> On Fri, 27 Nov 2020 03:56:22 GMT, Nick Gasson wrote: > This set of jtreg tests test JVMCI code installation independently of > Graal. Currently they only run on x86 as the minimal assembler required > is only implemented for that platform. This patch implements the > TestAssembler for AArch64 to ensure JVMCI test coverage even if the > Graal embedded in OpenJDK is disabled/removed. src/hotspot/cpu/aarch64/relocInfo_aarch64.hpp line 35: > 33: offset_unit = 1, > 34: // Must be at least 1 for RelocInfo::narrow_oop_in_const. > 35: format_width = 1 Did you hit an issue with = 0? Yes, it should be 1 for 64-bit. But I surprise nobody noticed this until now. How we handled embedded narrow_oop_in_const before? ------------- PR: https://git.openjdk.java.net/jdk/pull/1475 From kvn at openjdk.java.net Mon Nov 30 22:40:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 22:40:55 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 20:27:13 GMT, Doug Simon wrote: > As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) > at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 > # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 > # > # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) > # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) > > This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: > >> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true > Error parsing JVMCI options: Could not find option jvmci.InitTiimer > Did you mean one of the following? > jvmci.InitTimer= > Error: A fatal exception has occurred. Program will exit. @dougxc I still got crash with -Xcomp: java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Xcomp -version Exception during JVMCI compiler initialization jdk.vm.ci.common.JVMCIError: no JVMCI compiler selected: default compiler is not found at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig$DummyCompilerFactory.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCICompilerConfig.java:58) at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig$DummyCompilerFactory.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCICompilerConfig.java:48) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCIRuntime.java:807) # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/jvmciRuntime.cpp:1102 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/jdk_git/open/src/hotspot/share/jvmci/jvmciRuntime.cpp:1102), pid=3015811, tid=3015824 # fatal error: Fatal exception in JVMCI: Exception during JVMCI compiler initialization # # JRE version: Java(TM) SE Runtime Environment (16.0) (fastdebug build 16-internal+0-2020-11-30-2007459.vkozlov...) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-internal+0-2020-11-30-2007459.vkozlov..., compiled mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x116b950] JVMCIRuntime::fatal_exception(JVMCIEnv*, char const*)+0x90 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /jdk_git/core.3015811) # # An error report file with more information is saved as: # /jdk_git/hs_err_pid3015811.log ------------- PR: https://git.openjdk.java.net/jdk/pull/1487 From neliasso at openjdk.java.net Mon Nov 30 22:45:04 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 30 Nov 2020 22:45:04 GMT Subject: RFR: 8257460: Further CompilerOracle cleanup Message-ID: <25fwvRpAJRJXdmsYRSSoDPzt2Sjr1zvMp8PipzkNaxQ=.288e8d77-5f87-4be7-a4fb-8d876435384d@github.com> I got some additional feedback on 8256508: "Improve CompileCommand flag" from Claes. 1. The _type field in TypedMethodMatcher is not used any more. It is derived from the option. 2. The verify_type argument is only used from whitebox API and the check should be pushed out to that code. The check is needed because the option is passed from java as a String and we need to verify that the value type matches the option type. 3. The type parameter to TypedMethodMatcher::match isn't used. 4. Some code only used for asserts was moved into assert ------------- Commit messages: - Further CompilerOracle cleanup Changes: https://git.openjdk.java.net/jdk/pull/1528/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1528&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257460 Stats: 65 lines in 3 files changed: 25 ins; 18 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/1528.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1528/head:pull/1528 PR: https://git.openjdk.java.net/jdk/pull/1528 From neliasso at openjdk.java.net Mon Nov 30 22:47:00 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 30 Nov 2020 22:47:00 GMT Subject: RFR: 8257223: C2: Optimize RegMask::is_bound In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 13:00:15 GMT, Claes Redestad wrote: > - Avoid Size() for is_bound1 (does relatively expensive computations) > - Refactor loops to more efficiently check that the remainder of the mask is empty after a match. (Using the same structure helped gcc merge the control flows in `is_bound` for an additional benefit) > > This removes ~80k (~1%) of the C2 bootstrap overhead, and reduces time spent doing Register_Allocate in SimpleRepeatCompilation.largeMethod by about 0.4%. There's a small but statistically insignificant effect (~0.2%) on the score, too: > > Benchmark Mode Cnt Score Error Units > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8119.278 ? 37.803 ms/op > SimpleRepeatCompilation.largeMethod_repeat_c2 ss 20 8099.331 ? 33.159 ms/op Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1515 From dnsimon at openjdk.java.net Mon Nov 30 22:54:00 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 30 Nov 2020 22:54:00 GMT Subject: RFR: 8257220: [JVMCI] option validation should not result in a heavy-weight VM crash In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 22:37:29 GMT, Vladimir Kozlov wrote: >> As a result of [JDK-8253228](https://bugs.openjdk.java.net/browse/JDK-8253228), a heavy-weight VM crash occurs for incorrectly specified JVMCI options. For example: >>> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true >> Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 >> java.lang.IllegalArgumentException: Could not find option jvmci.InitTiimer >> Did you mean one of the following? >> jvmci.InitTimer= >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$Option.parse(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:405) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:534) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci/HotSpotJVMCIRuntime.java:174) >> at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci/Native Method) >> at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci/JVMCI.java:65) >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (jvmciRuntime.cpp:1102), pid=55794, tid=7939 >> # fatal error: Fatal exception in JVMCI: Uncaught exception exiting JVMCIEnv scope entered at src/hotspot/share/jvmci/jvmciRuntime.cpp:626 >> # >> # JRE version: OpenJDK Runtime Environment (16.0) (build 16-internal+0-adhoc.dnsimon.open) >> # Java VM: OpenJDK 64-Bit Server VM (16-internal+0-adhoc.dnsimon.open, mixed mode, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /Users/dnsimon/jdk-jdk/open/hs_err_pid55794.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> fish: 'build/macosx-x86_64-server-rele?' terminated by signal SIGABRT (Abort) >> >> This is too heavy-weight. This PR makes it similar to how incorrectly specified -XX options are reported: >> >>> java -XX:+UnlockExperimentalVMOptions -XX:+EagerJVMCI -XX:+UseJVMCICompiler -Djvmci.InitTiimer=true >> Error parsing JVMCI options: Could not find option jvmci.InitTiimer >> Did you mean one of the following? >> jvmci.InitTimer= >> Error: A fatal exception has occurred. Program will exit. > > @dougxc I still got crash with -Xcomp: > java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Xcomp -version > Exception during JVMCI compiler initialization > jdk.vm.ci.common.JVMCIError: no JVMCI compiler selected: default compiler is not found > at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig$DummyCompilerFactory.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCICompilerConfig.java:58) > at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig$DummyCompilerFactory.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCICompilerConfig.java:48) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(jdk.internal.vm.ci at 16-internal/HotSpotJVMCIRuntime.java:807) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/jvmciRuntime.cpp:1102 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/jdk_git/open/src/hotspot/share/jvmci/jvmciRuntime.cpp:1102), pid=3015811, tid=3015824 > # fatal error: Fatal exception in JVMCI: Exception during JVMCI compiler initialization > # > # JRE version: Java(TM) SE Runtime Environment (16.0) (fastdebug build 16-internal+0-2020-11-30-2007459.vkozlov...) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-internal+0-2020-11-30-2007459.vkozlov..., compiled mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x116b950] JVMCIRuntime::fatal_exception(JVMCIEnv*, char const*)+0x90 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /jdk_git/core.3015811) > # > # An error report file with more information is saved as: > # /jdk_git/hs_err_pid3015811.log This PR was about improving the failure mode when incorrect `-Djvmci` properties are specified. The case you are highlighting is when misconfiguration happens with JVMCI related VM flags. In this case, I guess JVMCI initialization could detect the conflict between `-XX:+UseJVMCICompiler` and the lack of a JVMCI compiler in the VM (now the default). ------------- PR: https://git.openjdk.java.net/jdk/pull/1487 From kvn at openjdk.java.net Mon Nov 30 23:34:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 23:34:56 GMT Subject: RFR: 8257460: Further CompilerOracle cleanup In-Reply-To: <25fwvRpAJRJXdmsYRSSoDPzt2Sjr1zvMp8PipzkNaxQ=.288e8d77-5f87-4be7-a4fb-8d876435384d@github.com> References: <25fwvRpAJRJXdmsYRSSoDPzt2Sjr1zvMp8PipzkNaxQ=.288e8d77-5f87-4be7-a4fb-8d876435384d@github.com> Message-ID: On Mon, 30 Nov 2020 22:37:32 GMT, Nils Eliasson wrote: > I got some additional feedback on 8256508: "Improve CompileCommand flag" from Claes. > > 1. The _type field in TypedMethodMatcher is not used any more. It is derived from the option. > > 2. The verify_type argument is only used from whitebox API and the check should be pushed out to that code. The check is needed because the option is passed from java as a String and we need to verify that the value type matches the option type. > > 3. The type parameter to TypedMethodMatcher::match isn't used. > > 4. Some code only used for asserts was moved into assert Good clean up. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1528 From redestad at openjdk.java.net Mon Nov 30 23:41:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 30 Nov 2020 23:41:57 GMT Subject: RFR: 8257460: Further CompilerOracle cleanup In-Reply-To: <25fwvRpAJRJXdmsYRSSoDPzt2Sjr1zvMp8PipzkNaxQ=.288e8d77-5f87-4be7-a4fb-8d876435384d@github.com> References: <25fwvRpAJRJXdmsYRSSoDPzt2Sjr1zvMp8PipzkNaxQ=.288e8d77-5f87-4be7-a4fb-8d876435384d@github.com> Message-ID: On Mon, 30 Nov 2020 22:37:32 GMT, Nils Eliasson wrote: > I got some additional feedback on 8256508: "Improve CompileCommand flag" from Claes. > > 1. The _type field in TypedMethodMatcher is not used any more. It is derived from the option. > > 2. The verify_type argument is only used from whitebox API and the check should be pushed out to that code. The check is needed because the option is passed from java as a String and we need to verify that the value type matches the option type. > > 3. The type parameter to TypedMethodMatcher::match isn't used. > > 4. Some code only used for asserts was moved into assert Thanks for doing this cleanup! src/hotspot/share/compiler/compilerOracle.hpp line 158: > 156: static bool option_matches_type(enum CompileCommand option, T& value); > 157: > 158: // Reads from string instead of file indentation off src/hotspot/share/compiler/compilerOracle.cpp line 309: > 307: bool CompilerOracle::has_option_value(const methodHandle& method, enum CompileCommand option, T& value) { > 308: enum OptionType type = option2type(option); > 309: if (type == OptionType::Unknown) { Could this be an assertion done inside option_matches_type? ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1528