From rwestrel at redhat.com Sun Dec 1 14:54:49 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Sun, 01 Dec 2019 15:54:49 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <871ru2diu9.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> Message-ID: <8736e4ayfa.fsf@redhat.com> > http://cr.openjdk.java.net/~roland/8234350/webrev.00/ Anyone for a second review? Thanks, Roland. From patrick at os.amperecomputing.com Mon Dec 2 03:18:58 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Mon, 2 Dec 2019 03:18:58 +0000 Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in generate_compare_long_string_different_encoding In-Reply-To: References: Message-ID: Ping... Please help review, thanks. Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Patrick Zhang OS Sent: Friday, November 15, 2019 6:54 PM To: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RFR (trivial): 8234228: AArch64: Clean up redundant temp vars in generate_compare_long_string_different_encoding Hi Reviewers, This is a simple patch which cleans up some redundant temp vars and related instructions in generate_compare_long_string_different_encoding. JBS: https://bugs.openjdk.java.net/browse/JDK-8234228 Webrev: http://cr.openjdk.java.net/~qpzhang/8234228/webrev.01 In generate_compare_long_string_different_encoding, the two Register vars strU and strL were used to record the pointers of the last 4 characters for the final comparisons. strU has been no use since the latest code updates as the chars got pre-loaded (r12) by compare_string_16_x_LU early, and strL is redundant too since the pointer is available in r11. Cleaning up these can save two add, two temp vars, and replace two sub with mov. In addition, r10 in compare_string_16_x_LU is not used, cleaned the temp var too. Tested jtreg tier1, and hotspot runtime/compiler, no new failures found. Double checked with string intrinsics cases under [1], no regression found. Ran [2] CompareToBench LU/UL as performance check, no regression found, and slight gains with some input sizes [1] http://hg.openjdk.java.net/jdk/jdk/file/3df2bf731a87/test/hotspot/jtreg/compiler/intrinsics/string [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.jar Regards Patrick From tobias.hartmann at oracle.com Mon Dec 2 05:57:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Dec 2019 06:57:01 +0100 Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to missing narrowing conversion In-Reply-To: References: Message-ID: Thanks Vladimir! Best regards, Tobias On 29.11.19 15:19, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> Writing an (integer) value to a boolean, byte, char or short field includes an implicit narrowing >> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit field loads by caching >> and reusing the last written value. The problem is that this value is not necessarily converted to >> the field type and we end up using an incorrect value. >> >> For example, for the field store/load in testShort, C1 emits: >> ?? [...] >> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi) >> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax >> ?? [...] >> >> The field load has been eliminated and the non-converted integer value (%rdx) is returned. >> >> The fix is to emit an explicit conversion to get the correct field value after the write: >> ?? [...] >> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi) >> ?? 0x00007ff07982bd70:?? movswl %dx,%edx >> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax >> ?? [...] >> >> Thanks, >> Tobias >> >> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.putfield >> From nick.gasson at arm.com Mon Dec 2 07:55:14 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Mon, 2 Dec 2019 15:55:14 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> Message-ID: <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com> On 29/11/2019 18:10, Andrew Haley wrote: >> How about we exit with a fatal error if we can't find a suitably aligned >> region? Then we can remove the code in decode_klass_non_null that uses >> R27 and this patch is much simpler. That code path is poorly tested at >> the moment so it seems risky to leave it in. With a hard error at least >> users will report it to us so we can fix it. > > That is starting to sound very attractive. With a 64-bit address space I'm > finding it very hard to imagine a scenario in which we don't find a > suitable address. I think AOT-compiled code would still be OK, because it > generates different code, but we'd have to do some testing. > There's another little wrinkle: even after updating the shared metaspace to use the search algorithm to find a 4G-aligned location, we can still end up with something like: CompressedKlassPointers::base() => 0x1100000000 CompressedKlassPointers::shift() => 3 (The shift is always set to LogKlassAlignmentInBytes when CDS is on.) Here we can't use EOR because 0x1100000000 doesn't fit in the immediate, and we can't use MOVK because the shift is non-zero. I think the solution is to adjust the algorithm to search in increments of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which we can no longer use a zero base). Then we can decode like this: const uint64_t shifted_base = (uint64_t)CompressedKlassPointers::base() >> CompressedKlassPointers::shift(); guarantee((shifted_base & 0xffffffff) == 0, "compressed class base bad alignment"); if (dst != src) movw(dst, src); movk(dst, shifted_base >> 32, 32); if (CompressedKlassPointers::shift() != 0) { lsl(dst, dst, LogKlassAlignmentInBytes); } What do you think? I'll do some testing with AOT later. Thanks, Nick From martin.doerr at sap.com Mon Dec 2 10:14:23 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 2 Dec 2019 10:14:23 +0000 Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to missing narrowing conversion In-Reply-To: References: Message-ID: +1 Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Tobias Hartmann > Sent: Montag, 2. Dezember 2019 06:57 > To: Vladimir Ivanov ; hotspot compiler > > Subject: Re: [14] RFR(S): 8234617: C1: Incorrect result of field load due to > missing narrowing conversion > > Thanks Vladimir! > > Best regards, > Tobias > > On 29.11.19 15:19, Vladimir Ivanov wrote: > > > >> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/ > > > > Looks good. > > > > Best regards, > > Vladimir Ivanov > > > >> > >> Writing an (integer) value to a boolean, byte, char or short field includes > an implicit narrowing > >> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit > field loads by caching > >> and reusing the last written value. The problem is that this value is not > necessarily converted to > >> the field type and we end up using an incorrect value. > >> > >> For example, for the field store/load in testShort, C1 emits: > >> ?? [...] > >> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi) > >> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax > >> ?? [...] > >> > >> The field load has been eliminated and the non-converted integer value > (%rdx) is returned. > >> > >> The fix is to emit an explicit conversion to get the correct field value after > the write: > >> ?? [...] > >> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi) > >> ?? 0x00007ff07982bd70:?? movswl %dx,%edx > >> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax > >> ?? [...] > >> > >> Thanks, > >> Tobias > >> > >> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms- > 6.html#jvms-6.5.putfield > >> From tobias.hartmann at oracle.com Mon Dec 2 11:28:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Dec 2019 12:28:31 +0100 Subject: [14] RFR(S): 8234617: C1: Incorrect result of field load due to missing narrowing conversion In-Reply-To: References: Message-ID: Thanks Martin! Best regards, Tobias On 02.12.19 11:14, Doerr, Martin wrote: > +1 > > Best regards, > Martin > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Tobias Hartmann >> Sent: Montag, 2. Dezember 2019 06:57 >> To: Vladimir Ivanov ; hotspot compiler >> >> Subject: Re: [14] RFR(S): 8234617: C1: Incorrect result of field load due to >> missing narrowing conversion >> >> Thanks Vladimir! >> >> Best regards, >> Tobias >> >> On 29.11.19 15:19, Vladimir Ivanov wrote: >>> >>>> http://cr.openjdk.java.net/~thartmann/8234617/webrev.00/ >>> >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> Writing an (integer) value to a boolean, byte, char or short field includes >> an implicit narrowing >>>> conversion [1]. With -XX:+EliminateFieldAccess (default), C1 tries to omit >> field loads by caching >>>> and reusing the last written value. The problem is that this value is not >> necessarily converted to >>>> the field type and we end up using an incorrect value. >>>> >>>> For example, for the field store/load in testShort, C1 emits: >>>> ?? [...] >>>> ?? 0x00007f0fc582bd6c:?? mov??? %dx,0x12(%rsi) >>>> ?? 0x00007f0fc582bd70:?? mov??? %rdx,%rax >>>> ?? [...] >>>> >>>> The field load has been eliminated and the non-converted integer value >> (%rdx) is returned. >>>> >>>> The fix is to emit an explicit conversion to get the correct field value after >> the write: >>>> ?? [...] >>>> ?? 0x00007ff07982bd6c:?? mov??? %dx,0x12(%rsi) >>>> ?? 0x00007ff07982bd70:?? movswl %dx,%edx >>>> ?? 0x00007ff07982bd73:?? mov??? %rdx,%rax >>>> ?? [...] >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://docs.oracle.com/javase/specs/jvms/se13/html/jvms- >> 6.html#jvms-6.5.putfield >>>> From markus.gronlund at oracle.com Mon Dec 2 11:57:38 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Mon, 2 Dec 2019 03:57:38 -0800 (PST) Subject: RFR(M): 8216041: [Event Request] - Deoptimization Message-ID: Greetings, Please review the following changeset to introduce a Deoptimization (uncommon trap) event to JFR. Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2]. Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01 Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc. [1] https://bugs.openjdk.java.net/browse/JDK-8225554 [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html Thank you Markus From tobias.hartmann at oracle.com Mon Dec 2 13:07:34 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Dec 2019 14:07:34 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: References: Message-ID: <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com> Hi Christian, looks reasonable to me. Best regards, Tobias On 20.11.19 15:14, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8231501 > http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/ > > The bug could be traced back to the concurrent cleaning of method data with its extra data in > MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in > ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans > method data by using the whitebox API [2]. > > Before loading and copying the extra data from the MDO to the ciMDO in > ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all > SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead > entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This > release of the lock triggers the bug: There can be cases where one thread A is waiting in the > whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for > which another thread B just released the lock at [4]. If that MDO actually contained > SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing, > still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO > earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries > to load the now cleaned extra data and immediately finishes at [6] since there are no > SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the > ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry > (since a SpeculativeTrapData entry has an additional cell for the method) and possible other > remaining SpeculativeTrapData entries: > > > Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to > methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry: > > ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO): > 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 > dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63 > dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68 > dp+32: tag = 0 -> end of extra data > > MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is > unchanged): > 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 > dp: tag = 0 -> end of extra data > > > Returning at [6] when the extra data loading from MDO to ciMDO is finished: > MDO extra data: > 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 > dp: tag = 0 -> end of extra data > > ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes): > 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 > dp: tag = 0 -> next entry = dp+8 > dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal... > > > The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it > reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which > causes a crash since there is no tag 99 available. > > > The fix is to completely zero out the current and all following SpeculativeTrapData entries if we > encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other > cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage > but occurs very rarely. > > Thank you! > > Best regards, > Christian > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java > > [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137 > [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137 > [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115 > [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219 > [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191 > [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176 From tobias.hartmann at oracle.com Mon Dec 2 13:38:42 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Dec 2019 14:38:42 +0100 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type In-Reply-To: References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> Message-ID: Hi Vladimir, On 29.11.19 16:28, Vladimir Ivanov wrote: > ? http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/ This looks good to me. Best regards, Tobias From vladimir.x.ivanov at oracle.com Mon Dec 2 14:56:44 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 2 Dec 2019 17:56:44 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <024ee660-a0e5-e899-5c48-4ca12ffa37fa@redhat.com> <0330b8f0-28c7-1372-2d18-2b5d0ced05c3@oracle.com> <7776c872-e5a1-2dc4-4cf7-b1c733b6a314@redhat.com> Message-ID: <07e1d2ae-a96a-a551-7db6-85d260223e64@oracle.com> > Please carry on with your changes then. Just in case: if the bug severely affects Shenandoah and the fix requires time, as the stop-the-gap solution I can keep barriers around off-heap accesses when Shenandoah is used until proper fix arrives. Best regards, Vladimir Ivanov > > Thanks, > Roman > >> Roman, >> >> JDK-8220714 looks like a bug in Shenandoah barrier expansion. >> >> I slightly modified the test to simplify the analysis [1]. >> >> While running the test [2] I'm seeing the following: >> >> ? 127 LoadI? === 44?? 7 125 >> ? 160 StoreI === 44?? 7? 91 127 >> ?? 94 LoadI? === 44?? 7? 91 >> ? 193 StoreI === 44 160 125? 94 >> >> After expansion is over, it looks as follows: >> >> ? 127 LoadI? ===? 44 209 125 >> ? 160 StoreI ===? 44 209? 91 127 >> ?? 94 LoadI? ===? 44 160? 91 >> ? 193 StoreI ===? 44 160 125? 94 >> >> Note that 94 LoadI depends on 160 StoreI memory now. Before the >> expansion they were independent (7 Parm == initial memory state). >> >> And then 94 goes away, since it now reads updated value: >> >> >> int???????? 127??? LoadI??? ===? 44? 209? 125? [[ 160? 193 ]] >> @rawptr:BotPTR, idx=Raw; unsafe #int (does not depend only on test) >> !orig=[94] !jvms: Unsafe::getInt @ bci:3 >> TestUnsafeOffheapSwap$Memory::getInt @ bci:14 >> TestUnsafeOffheapSwap$Memory::swap @ bci:10 >> TestUnsafeOffheapSwap::testUnsafeHelper @ bci:7 >> >> The final graph is evidently wrong: >> >> ? 127 LoadI? ===? 44 209 125 >> ? 160 StoreI ===? 44 209? 91 127 >> ? 193 StoreI ===? 44 160 125 127 >> >> Let me know how you want to proceed with it. >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> >> diff --git >> a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java >> b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java >> --- a/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java >> +++ b/test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeOffheapSwap.java >> @@ -57,6 +57,16 @@ >> ???????? } >> ???? } >> >> +??? static void testUnsafeHelper(int i) { >> +??????? mem.swap(i - 1, i); >> +??? } >> + >> +??? static void testArrayHelper(int[] arr, int i) { >> +??????? int tmp = arr[i - 1]; >> +??????? arr[i - 1] = arr[i]; >> +??????? arr[i] = tmp; >> +??? } >> + >> ???? static void test() { >> ???????? Random rnd = new Random(SEED); >> ???????? for (int i = 0; i < SIZE; i++) { >> @@ -72,10 +82,8 @@ >> ???????? } >> >> ???????? for (int i = 1; i < SIZE; i++) { >> -??????????? mem.swap(i - 1, i); >> -??????????? int tmp = arr[i - 1]; >> -??????????? arr[i - 1] = arr[i]; >> -??????????? arr[i] = tmp; >> +??????????? testUnsafeHelper(i); >> +??????????? testArrayHelper(arr, i); >> ???????? } >> >> ???????? for (int i = 0; i < SIZE; i++) { >> >> [2] $ java -cp >> JTwork/classes/gc/shenandoah/compiler/TestUnsafeOffheapSwap.d/ >> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED >> -XX:-UseOnStackReplacement -XX:-BackgroundCompilation >> -XX:-TieredCompilation? -XX:+UnlockExperimentalVMOptions >> -XX:+UseShenandoahGC -XX:+PrintCompilation -XX:CICompilerCount=1 >> -XX:CompileCommand=quiet >> -XX:CompileCommand=compileonly,*::testUnsafeHelper >> -XX:CompileCommand=print,*::testUnsafeHelper -XX:PrintIdealGraphLevel=0 >> -XX:-VerifyOops -XX:-UseCompressedOops TestUnsafeOffheapSwap >> >> >> Best regards, >> Vladimir Ivanov >> >> On 29.11.2019 20:18, Vladimir Ivanov wrote: >>> >>>> Does it affect this: >>>> https://bugs.openjdk.java.net/browse/JDK-8220714 >>> >>> Good point, Roman. Proposed patch breaks the fix for JDK-8220714. >>> >>> I'll investigate what happens there and come back with a revised fix. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>>>> >>>>> There were a number of fixes in C2 support for unsafe accesses recently >>>>> which led to additional memory barriers around them. It improved >>>>> stability, but in some cases it was redundant. One of important use >>>>> cases which regressed is off-heap accesses [1]. The barriers around >>>>> them >>>>> are redundant because they are serialized on raw memory and don't >>>>> intersect with any on-heap accesses. >>>>> >>>>> Proposed fix skips memory barriers around unsafe accesses which are >>>>> provably off-heap (base == NULL). >>>>> >>>>> It (almost completely) recovers performance on the microbenchmark >>>>> provided in JDK-8224182 [1]. >>>>> >>>>> Testing: tier1-6. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >>>>> >>>> >> > From vladimir.x.ivanov at oracle.com Mon Dec 2 14:57:00 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 2 Dec 2019 17:57:00 +0300 Subject: [14] RFR (S): 8234923: Missed call_site_target nmethod dependency for non-fully initialized ConstantCallSite instance In-Reply-To: References: <7d4c2ab1-f8ec-8ccc-a442-8401a048b353@oracle.com> Message-ID: <0e177195-9e3e-661e-5dd0-e613f02747c1@oracle.com> Thank you, John. Best regards, Vladimir Ivanov On 30.11.2019 02:30, John Rose wrote: > Reviewed. > >> On Nov 29, 2019, at 7:55 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8234923/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234923 > From erik.gahlin at oracle.com Mon Dec 2 15:00:05 2019 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 2 Dec 2019 16:00:05 +0100 Subject: RFR(M): 8216041: [Event Request] - Deoptimization In-Reply-To: References: Message-ID: <42a6af4e-d816-ebf1-9519-b1de6f5606ea@oracle.com> Hi Markus, Looks good, but could you change the names in metadata.xml to DeoptimizationReason and DeoptimizationAction? No need to send out new webrev. Thanks Erik On 2019-12-02 12:57, Markus Gronlund wrote: > Greetings, > > Please review the following changeset to introduce a Deoptimization (uncommon trap) event to JFR. > > Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2]. > > Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 > Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01 > Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg > Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc. > > [1] https://bugs.openjdk.java.net/browse/JDK-8225554 > [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html > > Thank you > Markus From aph at redhat.com Mon Dec 2 15:34:33 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 2 Dec 2019 10:34:33 -0500 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com> Message-ID: <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com> On 12/2/19 7:55 AM, Nick Gasson wrote: > On 29/11/2019 18:10, Andrew Haley wrote: >>> How about we exit with a fatal error if we can't find a suitably aligned >>> region? Then we can remove the code in decode_klass_non_null that uses >>> R27 and this patch is much simpler. That code path is poorly tested at >>> the moment so it seems risky to leave it in. With a hard error at least >>> users will report it to us so we can fix it. >> >> That is starting to sound very attractive. With a 64-bit address space I'm >> finding it very hard to imagine a scenario in which we don't find a >> suitable address. I think AOT-compiled code would still be OK, because it >> generates different code, but we'd have to do some testing. >> > > There's another little wrinkle: even after updating the shared metaspace > to use the search algorithm to find a 4G-aligned location, we can still > end up with something like: > > CompressedKlassPointers::base() => 0x1100000000 > CompressedKlassPointers::shift() => 3 > > (The shift is always set to LogKlassAlignmentInBytes when CDS is on.) I remember that. I don't know why; it's not much use to us on AArch64. > Here we can't use EOR because 0x1100000000 doesn't fit in the immediate, > and we can't use MOVK because the shift is non-zero. > > I think the solution is to adjust the algorithm to search in increments > of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which > we can no longer use a zero base). Then we can decode like this: > > const uint64_t shifted_base = > (uint64_t)CompressedKlassPointers::base() >> > CompressedKlassPointers::shift(); > guarantee((shifted_base & 0xffffffff) == 0, "compressed class base > bad alignment"); > > if (dst != src) movw(dst, src); > movk(dst, shifted_base >> 32, 32); > > if (CompressedKlassPointers::shift() != 0) { > lsl(dst, dst, LogKlassAlignmentInBytes); > } > > What do you think? Could work, but can you also try disabling the shift with CDS? It doesn't do us much good and it bloats code. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jianglizhou at google.com Mon Dec 2 16:05:37 2019 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 2 Dec 2019 08:05:37 -0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com> <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com> Message-ID: On Mon, Dec 2, 2019 at 7:34 AM Andrew Haley wrote: > > On 12/2/19 7:55 AM, Nick Gasson wrote: > > On 29/11/2019 18:10, Andrew Haley wrote: > >>> How about we exit with a fatal error if we can't find a suitably aligned > >>> region? Then we can remove the code in decode_klass_non_null that uses > >>> R27 and this patch is much simpler. That code path is poorly tested at > >>> the moment so it seems risky to leave it in. With a hard error at least > >>> users will report it to us so we can fix it. > >> > >> That is starting to sound very attractive. With a 64-bit address space I'm > >> finding it very hard to imagine a scenario in which we don't find a > >> suitable address. I think AOT-compiled code would still be OK, because it > >> generates different code, but we'd have to do some testing. > >> > > > > There's another little wrinkle: even after updating the shared metaspace > > to use the search algorithm to find a 4G-aligned location, we can still > > end up with something like: > > > > CompressedKlassPointers::base() => 0x1100000000 > > CompressedKlassPointers::shift() => 3 > > > > (The shift is always set to LogKlassAlignmentInBytes when CDS is on.) > > I remember that. I don't know why; it's not much use to us on AArch64. When CDS is enabled, the compressed klass encoding shift is set to LogKlassAlignmentInBytes with the intend for AOT compatibility. AOT requires LogKlassAlignmentInBytes (3) as the shift. Best, Jiangli > > > Here we can't use EOR because 0x1100000000 doesn't fit in the immediate, > > and we can't use MOVK because the shift is non-zero. > > > > I think the solution is to adjust the algorithm to search in increments > > of (4 << LogKlassAlignmentInBytes)*G once we hit 32G (the point at which > > we can no longer use a zero base). Then we can decode like this: > > > > const uint64_t shifted_base = > > (uint64_t)CompressedKlassPointers::base() >> > > CompressedKlassPointers::shift(); > > guarantee((shifted_base & 0xffffffff) == 0, "compressed class base > > bad alignment"); > > > > if (dst != src) movw(dst, src); > > movk(dst, shifted_base >> 32, 32); > > > > if (CompressedKlassPointers::shift() != 0) { > > lsl(dst, dst, LogKlassAlignmentInBytes); > > } > > > > What do you think? > > Could work, but can you also try disabling the shift with CDS? It doesn't do > us much good and it bloats code. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From vladimir.x.ivanov at oracle.com Mon Dec 2 16:44:19 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 2 Dec 2019 19:44:19 +0300 Subject: RFR(M): 8216041: [Event Request] - Deoptimization In-Reply-To: References: Message-ID: <181c7d8a-3c6f-307d-5aa0-257586a190c7@oracle.com> > Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01 Compiler changes look good. Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Mon Dec 2 20:09:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 2 Dec 2019 12:09:39 -0800 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type In-Reply-To: References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> Message-ID: +1 Thanks Vladimir > On Dec 2, 2019, at 5:38 AM, Tobias Hartmann wrote: > > Hi Vladimir, > >> On 29.11.19 16:28, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/ > > This looks good to me. > > Best regards, > Tobias From igor.ignatyev at oracle.com Mon Dec 2 21:19:00 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 2 Dec 2019 13:19:00 -0800 Subject: RFR(M): 8216041: [Event Request] - Deoptimization In-Reply-To: References: Message-ID: <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com> Hi Markus, looks good to me. -- Igor > On Dec 2, 2019, at 3:57 AM, Markus Gronlund wrote: > > Greetings, > > Please review the following changeset to introduce a Deoptimization (uncommon trap) event to JFR. > > Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2]. > > Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 > Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01 > Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg > Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc. > > [1] https://bugs.openjdk.java.net/browse/JDK-8225554 > [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html > > Thank you > Markus From john.r.rose at oracle.com Tue Dec 3 05:04:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 2 Dec 2019 21:04:50 -0800 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> <2BAEB1D3-46B7-4BA9-81A3-4F5E7B47B82A@gmail.com> Message-ID: <1590A038-CA8D-4174-8FC3-F3BAC1A93E57@oracle.com> On Nov 18, 2019, at 8:43 PM, August Nagro wrote: > > And here?s a tangent to thing to think about: is growing HashMap?s backing array by powers of 2 actually a good thing, when the HashMap gets large? What if you instead wanted to grow by powers of 1.5, or even grow probabilistically, based on the collision rate, allocation pressure, or other data? With fast-range you can do this if you want. And without the performance hit of %! It's an interesting tangent; let me follow you along it for a bit. I think there may be interesting compromises between a size menu having only powers of two and a size-to-order policy, or a menu of powers of bases not related to 2, such as 1.5. In particular, if you can manage a ratio near to 1.414 you can cut your fragmentation costs roughly in half by putting all powers of 2 on the menu, plus those values times 1.414. (Or likewise a pair of multipliers near to 1.260 and 1.587.) If there were a clever ALU operation (or table lookup) cheaper than a full multiply or remainder that would get the effect of a fast range reduction just for the items on the menu, then it would be a good trade-off to stick to the menu, where the discounts apply, rather than pay for full-custom sizes. Both 1.4 (7/5) and 1.5 (3/2) look attractively simple, requiring some wrangling of approximate moduli of 3 or 5 or 7. Or for 2^(1/3), 1.25 (5/4) and 1.66? (5/3)) are also simple ratios. Or maybe there is a non-obvious number (not a simple ratio) which has a surprisingly simple modulus approximation?though I?m not holding my breath on that one. Since I?m hoping to stay under the cost of a multiply, I admit it?s a tight ceiling, but I think there might be something there between the two extremes (2^n only and arbitrary sizes). ? John From john.r.rose at oracle.com Tue Dec 3 06:06:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 2 Dec 2019 22:06:39 -0800 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <877e3x0wji.fsf@oldenburg2.str.redhat.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> Message-ID: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> On Nov 18, 2019, at 12:17 PM, Florian Weimer wrote: > >> int bucket = fr(h * M); // M = 0x2357BD or something >> >> or maybe something fast and sloppy like: >> >> int bucket = fr(h + (h << 8)); Surely this one works, since fr is the final operation. The shift/add is just a mixing step to precondition the input. Just for the record, I?d like to keep brainstorming a bit more, though surely this sort of thing is of limited interest. So, just a little more from me on this. If we had BITR I?d want to try something like fr(h - bitr(h)). But what I keep wishing for a good one- or two-cycle instruction that will mix all input bits into all the output bits, so that any change in one input bit is likely to cause cascading changes in many output bits, perhaps even 50% on average. A pair of AES steps is a good example of this. I think AES would be superior to multiply (for mixing) when used on 128 bit payloads or larger, so it looks appealing (to me) for vectorizable hashing applications. Though it is overkill on scalars, I think it points in promising directions for scalars also. >> >> or even: >> >> int bucket = fr(h) ^ (h & (N-1)); > Does this really work? I don't think so. Oops, you are right. Something like it might work, though. The idea, on paper, is that h & (N-1) is less than N, for any N >=1. And if N-1 has a high enough pop-count the information content is close to 50% of h (though maybe 50% of the bits are masked out). The xor of two quasi-independent values both less than N is, well, less than 2^(ceil lg N), not N, which is a bug. Oops. There are ways to quickly combine two values less than N and reduce the result to less than N: You do a conditional subtract of N if the sum is >= N. So the tactical area I?m trying to explore here is to take two reduced hashes developed in parallel, which depend on different bits of the input, and combine them into a single stronger hash (not by ^). Maybe (I can?t resist hacking at it some more): int h1 = H1(h), h2 = H2(h); int bucket = CCA(h1 - h2, N); // where H1 := fr, H2(h) := (h & (N-1)) // where CCA(x, N) := x + ((x >> 31) & N) // Conditional Compensating Add In this framework, a better H2 for favoring the low bits of h might be H2(h) := ((1< I think this kind of perturbation is quite expensive. Arm's BITR should > be helpful here. Yes, BITR is a helpful building block. If I understand you correctly, it needs to be combined with other operations, such as multiply, shift, xor, etc., and can overcome biases towards high bits or towards low bits that come from the simple arithmetic definitions of the other mixing operations. The hack with CCA(h1 - h2, N) seems competitive with a BITR-based mixing step, since H2 can be very simple. A scalar variant of two AES steps (with xor of a second register or constant parameter at both stages) would be a better building block for strongly mixing bits. Or some other shallow linear network with a layer of non-linear S-boxes. > But even though this operation is commonly needed and > easily implemented in hardware, it's rarely found in CPUs. Yep; the whole cottage industry of building clever mixing functions out of hand calculator functions could be sidelined if CPUs gave us good cheap mixing primitives out of the box. The crypto literature is full of them, and many are designed to be easy to implement in silicon. ? John P.S. I mention AES because I?m familiar with that bit of crypto tech, and also because I actually tried it out once on the smhasher quality benchmark. No surprise in hindsight; it passes the quality tests with just two rounds. Given that it is as cheap as multiplication, and handles twice as many bits at a time, but requires two steps for full mixing, it would seem to be competitive with multiplication as a mixing step. It has no built-in biases towards high or low bits, so that?s an advantage over multiplication. Why two rounds? The one-round version has flaws, as a hash function, which are obvious on inspection of the simple structure of an AES round. Not every output bit is data-dependent on every input bit of one round, but two rounds swirls them all together. Are back-to-back AES rounds expensive? Maybe, although that?s how the instructions are designed to be used, about 10 of them back to back to do real crypto. From christian.hagedorn at oracle.com Tue Dec 3 07:12:42 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 3 Dec 2019 08:12:42 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com> References: <88c9da01-6865-4a92-67fd-2f52d6cfe84b@oracle.com> Message-ID: <79476732-3ddb-1002-d4b1-f845cf76ee86@oracle.com> Thank you Tobias for your review! Best regards, Christian On 02.12.19 14:07, Tobias Hartmann wrote: > Hi Christian, > > looks reasonable to me. > > Best regards, > Tobias > > On 20.11.19 15:14, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8231501 >> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.00/ >> >> The bug could be traced back to the concurrent cleaning of method data with its extra data in >> MethodData::clean_method_data() and the loading/copying of extra data for the ci method data in >> ciMethodData::load_extra_data(). I reproduced the bug by using the test [1] which extensively cleans >> method data by using the whitebox API [2]. >> >> Before loading and copying the extra data from the MDO to the ciMDO in >> ciMethodData::load_extra_data(), the metadata is prepared in a fixed-point iteration by cleaning all >> SpeculativeTrapData entries of methods whose klasses are unloaded [3]. If it encounters such a dead >> entry it releases the extra data lock (due to ranking issues) and tries again later [4]. This >> release of the lock triggers the bug: There can be cases where one thread A is waiting in the >> whitebox API method to get the extra data lock [2] to clean the extra data for the very same MDO for >> which another thread B just released the lock at [4]. If that MDO actually contained >> SpeculativeTrapData entries, then thread A cleaned those but the ciMDO, which thread B is preparing, >> still contains the uncleaned old MDO extra data (because thread B only made a snapshot of the MDO >> earlier at [5]). Things then go wrong when thread B can reacquire the lock after thread A. It tries >> to load the now cleaned extra data and immediately finishes at [6] since there are no >> SpeculativeTrapData entries anymore. It copied a single entry with tag DataLayout::no_tag [7] to the >> ciMDO which actually contained a SpeculativeTrapData entry. This results in a half way cleared entry >> (since a SpeculativeTrapData entry has an additional cell for the method) and possible other >> remaining SpeculativeTrapData entries: >> >> >> Let's assume a little-endian ordering and that both 0x00007fff... addresses are real pointers to >> methods. Tag 13 (0x0d) is used for SpeculativeTrapData and dp points to the first extra data entry: >> >> ciMDO extra data before thread B releases the lock at [4] (same extra data for MDO and ciMDO): >> 0x800000040011000d 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 >> dp: tag = 13 -> next entry = dp+16; dp+8: method 0x00007fffd4993c63 >> dp+16: tag = 13 -> next entry = dp+32; dp+24: method 0x00007fffd49b1a68 >> dp+32: tag = 0 -> end of extra data >> >> MDO extra data after thread B reacquires the lock and thread A cleaned the MDO (ciMDO extra data is >> unchanged): >> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 >> dp: tag = 0 -> end of extra data >> >> >> Returning at [6] when the extra data loading from MDO to ciMDO is finished: >> MDO extra data: >> 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 >> dp: tag = 0 -> end of extra data >> >> ciMDO extra data, only copied the first no_tag entry from MDO at [7] (8 bytes): >> 0x0000000000000000 0x00007fffd4993c63 0x800000040011000d 0x00007fffd49b1a68 0x0000000000000000 >> dp: tag = 0 -> next entry = dp+8 >> dp+8: tag = 0x63 = 99 -> there is no tag 99 -> fatal... >> >> >> The next time the ciMDO extra data is iterated, for example by using MethodData::next_extra(), it >> reads tag 99 after processing the first no_tag entry and jumping to the value at offset 8 which >> causes a crash since there is no tag 99 available. >> >> >> The fix is to completely zero out the current and all following SpeculativeTrapData entries if we >> encounter a no_tag in the MDO but a speculative_trap_data_tag tag in the ciMDO. There are also other >> cases where the method data is cleaned. Thus the bug is not only related to the whitebox API usage >> but occurs very rarely. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java >> >> [2] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/prims/whitebox.cpp#l1137 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l137 >> [4] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l115 >> [5] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l219 >> [6] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l191 >> [7] http://hg.openjdk.java.net/jdk/jdk/file/580fb715b29d/src/hotspot/share/ci/ciMethodData.cpp#l176 From nick.gasson at arm.com Tue Dec 3 07:43:19 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 3 Dec 2019 15:43:19 +0800 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> <450a3a82-3031-d988-ece1-cb1e37a74975@arm.com> <7851d79d-5cff-05bc-9c52-b1f1bf3c175e@redhat.com> Message-ID: >> > >> > CompressedKlassPointers::base() => 0x1100000000 >> > CompressedKlassPointers::shift() => 3 >> > >> > (The shift is always set to LogKlassAlignmentInBytes when CDS is on.) >> >> I remember that. I don't know why; it's not much use to us on AArch64. > > When CDS is enabled, the compressed klass encoding shift is set to > LogKlassAlignmentInBytes with the intend for AOT compatibility. AOT > requires LogKlassAlignmentInBytes (3) as the shift. > Yes, in AOTGraalHotSpotVMConfig.java we have: // AOT captures VM settings during compilation. For compressed oops this // presents a problem for the case when the VM selects a zero-shift mode // (i.e., when the heap is less than 4G). Compiling an AOT binary with // zero-shift limits its usability. As such we force the shift to be // always equal to alignment to avoid emitting zero-shift AOT code. CompressEncoding vmOopEncoding = super.getOopEncoding(); aotOopEncoding = new CompressEncoding(vmOopEncoding.getBase(), logMinObjAlignment()); CompressEncoding vmKlassEncoding = super.getKlassEncoding(); aotKlassEncoding = new CompressEncoding(vmKlassEncoding.getBase(), logKlassAlignment); I get why AOT needs to set the shift for compressed OOPs, but for compressed class pointers the maximum class space size is 3G so the shift is only useful to allow a zero base. For AOT+CDS the default class space load address is 0x800000000 (32G) which is above the limit where a zero base is possible. So I think it would be better if AOT and CDS used a zero shift by default, as on AArch64 and X86 at least, we generate less efficient code when both the shift and base are non-zero. Using logKlassAlignment above works well for non-CDS, as it allows the class space to mapped at a low address with zero base. Maybe aotKlassEncoding could be set from the current VM settings instead? Thanks, Nick From vladimir.x.ivanov at oracle.com Tue Dec 3 08:09:16 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Dec 2019 11:09:16 +0300 Subject: [14] RFR (S): 8231430: C2: Memory stomp in max_array_length() for T_ILLEGAL type In-Reply-To: References: <00ab4462-ca1d-7e37-6e92-aca8e975e79d@oracle.com> <83147646-d353-1f46-f50b-8c0edf16645f@oracle.com> Message-ID: Thanks for reviews, Vladimir and Tobias. Best regards, Vladimir Ivanov On 02.12.2019 23:09, Vladimir Kozlov wrote: > +1 > > Thanks > Vladimir > >> On Dec 2, 2019, at 5:38 AM, Tobias Hartmann wrote: >> >> Hi Vladimir, >> >>> On 29.11.19 16:28, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8231430/webrev.01/ >> >> This looks good to me. >> >> Best regards, >> Tobias > From tobias.hartmann at oracle.com Tue Dec 3 08:18:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 3 Dec 2019 09:18:22 +0100 Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> References: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> Message-ID: Hi, this looks good to me. Best regards, Tobias On 30.11.19 02:02, Ioi Lam wrote: > Hi Pengfei, > > I have cc-ed hotspot-compiler-dev at openjdk.java.net. > > Please do not push the patch until someone from hotspot-compiler-dev has looked at it. > > Many people are away due to Thanksgiving in the US. > > Thanks > - Ioi > > On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote: >> Hi, >> >> Please help review this small fix for 64-bit client build. >> >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 >> >> Current 64-bit client VM build fails because errors occurred in dumping >> the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which >> runs "java -Xshare:dump" after linking the JDK image. But for Client VM >> build on 64-bit platforms, the ergonomic flag UseCompressedOops is not >> set.[2] This leads to VM exits in checking the flags for dumping the >> shared archive.[3] >> >> This change removes the "#if defined" macro to make shared archive dump >> successful in 64-bit client build. By tracking the history of the macro, >> I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago >> when C1 did not have a good support of compressed oops and modified to >> current shape[5] in the implementation of tiered compilation. It should >> be safe to be removed today. >> >> This patch also fixes another client build issue on AArch64. >> >> [1] http://openjdk.java.net/jeps/341 >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694 >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551 >> [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 >> [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 >> >> -- >> Thanks, >> Pengfei >> > From vladimir.x.ivanov at oracle.com Tue Dec 3 09:32:48 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Dec 2019 12:32:48 +0300 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> Message-ID: <5ec39d06-8f9c-f731-bd8f-7eeef85147f2@oracle.com> > but two rounds swirls them all together. Are back-to-back AES rounds > expensive? Maybe, although that?s how the instructions are designed to > be used, about 10 of them back to back to do real crypto. Throughput-oriented implementation should work fine for crypto purposes, but AESENC does look very good on recent Intel micro-architectures from latency perspective as well (data from [1] [2]): it improved from 8/1 on Sandy Bridge and 7/1 on Haswell to 4/1 on Skylake and it's listed (on uops.info [2]) as 3/1 on Ice Lake which is on par with IMUL (while processing twice as much bits). And vector variant (VAESENC) has the same latency as scalar (8->7->4->3 [3]) which looks very appealing for throughput-oriented use cases. Best regards, Vladimir Ivanov [1] https://www.agner.org/optimize/instruction_tables.pdf [2] https://uops.info/html-lat/ICL/AESENC_XMM_XMM-Measurements.html [3] https://uops.info/html-instr/VAESENC_XMM_XMM_XMM.html From markus.gronlund at oracle.com Tue Dec 3 10:02:25 2019 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 3 Dec 2019 02:02:25 -0800 (PST) Subject: RFR(M): 8216041: [Event Request] - Deoptimization In-Reply-To: <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com> References: <4B4C80C8-CF8A-4AA2-ACF8-7C67D7360A86@oracle.com> Message-ID: Igor, Vladimir and Erik, Thank you for your reviews! Markus -----Original Message----- From: Igor Ignatyev Sent: den 2 december 2019 22:19 To: Markus Gronlund Cc: hotspot-jfr-dev at openjdk.java.net; hotspot compiler Subject: Re: RFR(M): 8216041: [Event Request] - Deoptimization Hi Markus, looks good to me. -- Igor > On Dec 2, 2019, at 3:57 AM, Markus Gronlund wrote: > > Greetings, > > Please review the following changeset to introduce a Deoptimization (uncommon trap) event to JFR. > > Igor Ignatyev has done related work under "JDK-8225554: add JFR event for uncommon trap" [1], and we have decided to merge our work, so this RFR will supersede the previous discussion related to JDK-8225554 [2]. > > Enhancement: https://bugs.openjdk.java.net/browse/JDK-8216041 > Webrev: http://cr.openjdk.java.net/~mgronlun/8216041/webrev01 > Example visualization (raw JMC): http://cr.openjdk.java.net/~mgronlun/8216041/Deoptimization.jpg > Summary: A description of which compiler compiled the method has been added, where descriptions are keyed by values from the CompilerType enumeration. This information has also been added to the Compilation event. The current suggestion is to have stack trace information turned off by default (default.jfc), but turned on when using profile.jfc. > > [1] https://bugs.openjdk.java.net/browse/JDK-8225554 > [2] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-June/000631.html > > Thank you > Markus From xxinliu at amazon.com Tue Dec 3 11:00:41 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 3 Dec 2019 11:00:41 +0000 Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath of allocate_array Message-ID: Hi, C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. We met this problem when we ran jcstress on AWS graviton processor, which might reorder stores. Without a storestore member, put_field behind might commit to memory ahead of array/object initialization. This change tries to fix that by adjusting bound location. Could reviewers help me to review this change? Bug: https://bugs.openjdk.java.net/browse/JDK-8234977 Webrev: https://cr.openjdk.java.net/~xliu/8234977/00/webrev/ Validation: I passed jcstress and hotspot tier1 on arch64. Thanks, --lx From vladimir.x.ivanov at oracle.com Tue Dec 3 11:49:14 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Dec 2019 14:49:14 +0300 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com> References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com> Message-ID: Thanks a lot for the review, John. Incremental change: http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.01-00 (Also, extended idealreg2regmask refactoring to non-vector registers.) > In machnode.cpp I think the code would be easier to read if a common > subexpression were assigned a name ?num_edges? as in similar places: > > uint num_edges = _opnds[opcnt]->num_edges(); > > (Alternatively, you could just increment ?skipped? on the fly, which > would be OK too.) I?m being picky because it cost me a second to verify > that the set of raw edges was processed exhaustively. This is a task > every reader of the code will have to do. Agree, fixed. > Another nit: There is some dissonance between the seemingly general > name postselect_cleanup and supports_generic_vector_operands, which > is a very specific name. But they refer to the same condition. Pick the > shorter name, maybe, and convert the longer one into a nice comment. > It?s also not clear that helpers like process_mach_node and get_vector_operand > are is part of postselect_cleanup. Should they be cleanup_mach_node > cleanup_vector_operand? It?s a nit-pick, but might help future readers. Agree, fixed. > The name get_vector_operand is particularly bad, since it suggests that > it accesses something previously computed, where in fact it transforms > the graph. Also, IMO, ?get_? is one of those noise words which offers little > help to the reader and just takes up valuable screen space. Agree, fixed. > The name clone_generic_vector_operand is confusing; I would expect it > to be called something like [specialize,cleanup,?]_generic_vector_operand. Agree, fixed. > There?s a funny condition reported in this comment: > // RShiftCntV/RShiftCntV report wide vector type, but VecS as ideal register. > > It seems to comes out of the blue sky. Maybe add a cross-referencing comment > between that and the vshiftcnt instruction in the AD file? (Are there asserts > that would catch similar oddities if they were to arise? Was this one caught > via an assert? I certainly hope it was, rather than by debugging bad code!) It comes not from vshiftcnt, but from ideal node declaration : class LShiftCntVNode : public VectorNode { public: LShiftCntVNode(Node* cnt, const TypeVect* vt) : VectorNode(cnt,vt) {} virtual int Opcode() const; virtual uint ideal_reg() const { return Matcher::vector_shift_count_ideal_reg(vect_type()->length_in_bytes()); } }; Yes, complete migration to generic vector operands help with catching such cases. It leads to concrete vector type mismatches on IR level and those are caught with asserts. Anyway, I added the reference to vectornode.hpp. > On a similar note (about asserts), I?m very glad to see verify_mach_nodes. > The name is a little non-specific. Maybe verify_after_postselect_cleanup. Agree, fixed. > Why do vector[xy]_reg_legacy and vectorz_reg_legacy get different treatments > in the change set? I?m mainly curious here. The vectorz_reg_vl thing is a > dynamic set (?) which is fine, but why is it needed for z and not xy? A comment > might help. Also, this gripe is not part of this review, but I?ll make it anyway: > The very very short acronym ?vl? which appears here starts for ?AVX512VL? > referring to ?variable length? but it bumps into the phrase ?vector legacy? > with an unfortunate occasion for confusion. Suggest ?_vl? be renamed to > ?_512vl? or some other more specific thing. I agree that _vl postfix is cryptic. I'll file an RFE to investigate possible cleanups. (One of the ideas was to completely eliminate dynamic register masks for vectors and get rid of (leg)vec[SDXYZ] along the way, but it was left for a future enhancement.) > For the string intrinsics, there?s a regular replacement of legRegS by legRegD. > That strikes me as potentially a semantic change. I suppose the register > allocator will do the same things in both cases, and there?s no spill code > generated by the JIT for such a temp. I wonder, if the change was necessary, > how do we know that all the occurrences were correctly found and changed? > (Also, the string intrinsic macros use AVX512 instructions when available, > and in theory those would require heftier register masks.) Can someone > comment on this, for the record?maybe even in the source code? The change is from legVecS to legRegD. It doesn't cause any behavioral changes, but aligns the declaration with the actual behavior. Regarding Vec=>Reg part, string intrinsic nodes don't advertise themselves as vector nodes. For example, they don't set C->max_vector_size() and are still used when vectors are disabled (-XX:MaxVectorSize=0). Such problems are caught during graph verification with -XX:MaxVectorSize=0 or when there are no vector operations present. S=>D change aligns x86_32.ad and x86_64.ad. 32-bit version consistently uses regD everywhere. Instead of migrating both x86_32.ad and x86_64.ad, it was decided to stick with regD/legVecD. > I suggest that the warning comments ?(leg)Vec should be used instead? could > be a little less cryptic. An unconditional warning like this makes the reader > wonder, ?so why is it here at all?? Maybe use a cross-reference as in: > // Replaces (leg)vec during post-selection cleanup. See above. Agree, fixed. Best regards, Vladimir Ivanov > On Nov 19, 2019, at 6:30 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8234391/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234391 >> >> Introduce generic vector operands and migrate existing usages from fixed sized operands (vec[SDXYZ]) to generic ones. >> >> (It's an updated version of generic vector support posted for review in August, 2019 [1] [2]. AD instruction merges will be handled separately.) >> >> On a high-level it is organized as follows: >> >> (1) all AD instructions in x86.ad/x86_64.ad/x86_32.ad use vec/legVec; >> >> (2) at runtime, right after matching is over, a special pass is performed which does: >> >> * replaces vecOper with vec[SDXYZ] depending on mach node type >> - vector mach nodes capute bottom_type() of their ideal prototype; >> >> * eliminates redundant reg-to-reg vector moves (MoveVec2Leg /MoveLeg2Vec) >> - matcher needs them, but they are useless for register allocator (moreover, may cause additional spills); >> >> >> (3) after post-selection pass is over, all mach nodes should have fixed-size vector operands. >> >> >> Some details: >> >> (1) vec and legVec are marked as "dynamic" operands, so post-selection rewriting works >> >> >> (2) new logic is guarded by new matcher flag (Matcher::supports_generic_vector_operands) which is enabled only on x86 >> >> >> (3) post-selection analysis is implemented as a single pass over the graph and processing individual nodes using their own (for DEF operands) or their inputs (USE operands) bottom_type() (which is an instance of TypeVect) >> >> >> (4) most of the analysis is cross-platform and interface with platform-specific code through 3 methods: >> >> static bool is_generic_reg2reg_move(MachNode* m); >> // distinguishes MoveVec2Leg/MoveLeg2Vec nodes >> >> static bool is_generic_vector(MachOper* opnd); >> // distinguishes vec/legVec operands >> >> static MachOper* clone_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg); >> // constructs fixed-sized vector operand based on ideal reg >> // vec + Op_Vec[SDXYZ] => vec[SDXYZ] >> // legVec + Op_Vec[SDXYZ] => legVec[SDXYZ] >> >> >> (5) TEMP operands are handled specially: >> - TEMP uses max_vector_size() to determine what fixed-sized operand to use >> * it is needed to cover reductions which don't produce vectors but scalars >> - TEMP_DEF inherits fixed-sized operand type from DEF; >> >> >> (6) there is limited number of special cases for mach nodes in Matcher::get_vector_operand_helper: >> >> - RShiftCntV/RShiftCntV: though it reports wide vector type as Node::bottom_type(), its ideal_reg is VecS! But for vector nodes only Node::bottom_type() is captured during matching and not ideal_reg(). >> >> - vshiftcntimm: chain instructions which convert scalar to vector don't have vector type. >> >> >> (7) idealreg2regmask initialization logic is adjusted to handle generic vector operands (see Matcher::get_vector_regmask) >> >> >> (8) operand renaming in x86_32.ad & x86_64.ad to avoid name conflicts with new vec/legVec operands >> >> >> (9) x86_64.ad: all TEMP usages of vecS/legVecS are replaced with regD/legRegD >> - it aligns the code between x86_64.ad and x86_32.ad >> - strictly speaking, it's illegal to use vector operands on a non-vector node (e.g., string_inflate) unless its usage is guarded by C2 vector support checks (-XX:MaxVectorSize=0) >> >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-tier4, jtreg compiler tests on KNL and SKL, >> performance testing (SPEC* + Octane + micros / G1 + ParGC). >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >> >> [2] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf > From christoph.goettschkes at microdoc.com Tue Dec 3 12:22:49 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Tue, 3 Dec 2019 13:22:49 +0100 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit In-Reply-To: <20191128124549.BCDF0C6F24@aojmv0009> References: <20191127125735.B9BE111F377@aojmv0009> <20191128124549.BCDF0C6F24@aojmv0009> Message-ID: Hi Vladimir, could you have a look at my updated webrev regarding this failing test? https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ See my inline comments in the mail below. Thanks, Christpoh "hotspot-compiler-dev" wrote on 2019-11-28 13:44:05: > From: christoph.goettschkes at microdoc.com > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Date: 2019-11-28 13:46 > Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > client VMs due to Unrecognized VM option LoopUnrollLimit > Sent by: "hotspot-compiler-dev" > > Hi Vladimir, > > Vladimir Kozlov wrote on 2019-11-27 20:54:02: > > > From: Vladimir Kozlov > > To: christoph.goettschkes at microdoc.com, > hotspot-compiler-dev at openjdk.java.net > > Date: 2019-11-27 20:54 > > Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > > client VMs due to Unrecognized VM option LoopUnrollLimit > > > > Hi Christoph > > > > I was about suggest IgnoreUnrecognizedVMOptions flag but remembered > > discussion about 8231954 fix. > > Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our previous > discussion. I also think that it doesn't make sense to execute tests in VM > configurations for which they are not written for. Most of the compiler > tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a good > amount of time in certain VM configurations. > > > But I think the test should be run with Graal - it does have OSR > > compilation and we need to test it. > > Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" is > required to trigger the faulty behavior, but I don't know much about > optimization in the graal JIT. > > > > > We can do it by splitting test runs (duplicate @test block with > > different run flags) to have 2 tests with different > > flags and conditions. See [1]. > > > > For existing @run block we use `@requires vm.compiler2.enabled` and for > > new without LoopUnrollLimit - `vm.graal.enabled`. > > I did the following: > > https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ > > Could you elaborate how the two flags are related? I though, if graal is > used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set > to true. Is that correct? I don't have a setup with graal, so I can not > test this. > > Thanks, > Christoph > From martin.doerr at sap.com Tue Dec 3 14:48:02 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 3 Dec 2019 14:48:02 +0000 Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath of allocate_array In-Reply-To: References: Message-ID: Hi, I think there are already sufficient StoreStore barriers: Regarding the Runtime1 calls, there's a full fence in ~ThreadInVMfromJava() (introduced by JRT_ENTRY) which is called before returning from the C++ code. Seems like the eden allocation parts have explicit StoreStore barriers AFAICS. Did I miss anything? Which path is it exactly where you miss such a barrier? Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Liu, Xin > Sent: Dienstag, 3. Dezember 2019 12:01 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: [DMARC FAILURE] RFR(XS): 8234977: [Aarch64] C1 lacks a membar > store in slowpath of allocate_array > > Hi, > > C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. > We met this problem when we ran jcstress on AWS graviton processor, > which might reorder stores. Without a storestore member, put_field behind > might commit to memory ahead of array/object initialization. > This change tries to fix that by adjusting bound location. Could reviewers help > me to review this change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234977 > Webrev: https://cr.openjdk.java.net/~xliu/8234977/00/webrev/ > > Validation: I passed jcstress and hotspot tier1 on arch64. > > Thanks, > --lx > > From tobias.hartmann at oracle.com Tue Dec 3 15:10:34 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 3 Dec 2019 16:10:34 +0100 Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal index in PhaseMacroExpand::expand_macro_nodes() Message-ID: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8234616 http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/ We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because 'macro_idx' is out of the upper bound: https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591 The problem is that in the previous iteration, we hit the following code path which removes an unreachable macro node but does not decrement 'macro_idx': https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594 The problem was introduced in JDK 14 by the fix for JDK-8227384: https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25 I've changed the loop to a for-loop to make sure the index is always decremented and also strengthened the asserts. I was not able to create a regression test but the issue reproduces intermittently with Lucene. Thanks, Tobias From vladimir.x.ivanov at oracle.com Tue Dec 3 15:16:43 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Dec 2019 18:16:43 +0300 Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal index in PhaseMacroExpand::expand_macro_nodes() In-Reply-To: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> Message-ID: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> > http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/ Looks good. Best regards, Vladimir Ivanov > > We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because > 'macro_idx' is out of the upper bound: > https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591 > > The problem is that in the previous iteration, we hit the following code path which removes an > unreachable macro node but does not decrement 'macro_idx': > https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594 > > The problem was introduced in JDK 14 by the fix for JDK-8227384: > https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25 > > I've changed the loop to a for-loop to make sure the index is always decremented and also > strengthened the asserts. I was not able to create a regression test but the issue reproduces > intermittently with Lucene. > > Thanks, > Tobias > From martin.doerr at sap.com Tue Dec 3 15:26:51 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 3 Dec 2019 15:26:51 +0000 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <8736e4ayfa.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com> Message-ID: Hi Roland, the new asserted condition can never be true when mode == IgnoreStripMined. So if this is intended, I suggest using 2 assertions: 1. mode != IgnoreStripMined at the beginning 2. (mode == ControlAroundStripMined && use == sfpt) ||!use->is_reachable_from_root() Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Roland Westrelin > Sent: Sonntag, 1. Dezember 2019 15:55 > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && > (use == sfpt || !use->is_reachable_from_root())) failed: missed a node > > > > http://cr.openjdk.java.net/~roland/8234350/webrev.00/ > > Anyone for a second review? > > Thanks, > Roland. From tobias.hartmann at oracle.com Tue Dec 3 15:30:52 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 3 Dec 2019 16:30:52 +0100 Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal index in PhaseMacroExpand::expand_macro_nodes() In-Reply-To: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> Message-ID: <51ce8912-e331-d37b-b846-814d089c6cb1@oracle.com> Thanks Vladimir! Best regards, Tobias On 03.12.19 16:16, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because >> 'macro_idx' is out of the upper bound: >> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591 >> >> The problem is that in the previous iteration, we hit the following code path which removes an >> unreachable macro node but does not decrement 'macro_idx': >> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594 >> >> The problem was introduced in JDK 14 by the fix for JDK-8227384: >> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25 >> >> I've changed the loop to a for-loop to make sure the index is always decremented and also >> strengthened the asserts. I was not able to create a regression test but the issue reproduces >> intermittently with Lucene. >> >> Thanks, >> Tobias >> From aph at redhat.com Tue Dec 3 16:17:43 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 3 Dec 2019 11:17:43 -0500 Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath of allocate_array In-Reply-To: References: Message-ID: On 12/3/19 6:00 AM, Liu, Xin wrote: > C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. I don't understand this. In the slow path, the MemAllocator::finish should insert the necessary membar. Can you show us the slow path which does not do the membar? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Tue Dec 3 18:18:12 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Dec 2019 10:18:12 -0800 Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal index in PhaseMacroExpand::expand_macro_nodes() In-Reply-To: <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> Message-ID: +1 Vladimir K On 12/3/19 7:16 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because >> 'macro_idx' is out of the upper bound: >> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591 >> >> The problem is that in the previous iteration, we hit the following code path which removes an >> unreachable macro node but does not decrement 'macro_idx': >> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594 >> >> The problem was introduced in JDK 14 by the fix for JDK-8227384: >> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25 >> >> I've changed the loop to a for-loop to make sure the index is always decremented and also >> strengthened the asserts. I was not able to create a regression test but the issue reproduces >> intermittently with Lucene. >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Tue Dec 3 18:33:52 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Dec 2019 10:33:52 -0800 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit In-Reply-To: <20191203122444.7D54F13FFFC@aojmv0009> References: <20191127125735.B9BE111F377@aojmv0009> <20191128124549.BCDF0C6F24@aojmv0009> <20191203122444.7D54F13FFFC@aojmv0009> Message-ID: You don't need to duplicate @bug (it was C2 bug anyway). And don't need to check & !vm.graal.enabled for C2 case (see explanation below) > Could you elaborate how the two flags are related? I though, if graal is > used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set > to true. Is that correct? I don't have a setup with graal, so I can not > test this. When we enable Graal JIT it is used instead of C2. They are mutually exclusive. Compiler.isC2Enabled() (which sets vm.compiler2.enabled [1]) returns false when isGraalEnabled() returns true [2]. Regards, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/jtreg-ext/requires/VMProps.java#l117 [2] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/lib/sun/hotspot/code/Compiler.java#l80 On 12/3/19 4:22 AM, christoph.goettschkes at microdoc.com wrote: > Hi Vladimir, > > could you have a look at my updated webrev regarding this failing test? > https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ > > See my inline comments in the mail below. > > Thanks, > Christpoh > > "hotspot-compiler-dev" > wrote on 2019-11-28 13:44:05: > >> From: christoph.goettschkes at microdoc.com >> To: Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Date: 2019-11-28 13:46 >> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for >> client VMs due to Unrecognized VM option LoopUnrollLimit >> Sent by: "hotspot-compiler-dev" > >> >> Hi Vladimir, >> >> Vladimir Kozlov wrote on 2019-11-27 > 20:54:02: >> >>> From: Vladimir Kozlov >>> To: christoph.goettschkes at microdoc.com, >> hotspot-compiler-dev at openjdk.java.net >>> Date: 2019-11-27 20:54 >>> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > >>> client VMs due to Unrecognized VM option LoopUnrollLimit >>> >>> Hi Christoph >>> >>> I was about suggest IgnoreUnrecognizedVMOptions flag but remembered >>> discussion about 8231954 fix. >> >> Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our > previous >> discussion. I also think that it doesn't make sense to execute tests in > VM >> configurations for which they are not written for. Most of the compiler >> tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a > good >> amount of time in certain VM configurations. >> >>> But I think the test should be run with Graal - it does have OSR >>> compilation and we need to test it. >> >> Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" > is >> required to trigger the faulty behavior, but I don't know much about >> optimization in the graal JIT. >> >>> >>> We can do it by splitting test runs (duplicate @test block with >>> different run flags) to have 2 tests with different >>> flags and conditions. See [1]. >>> >>> For existing @run block we use `@requires vm.compiler2.enabled` and > for >>> new without LoopUnrollLimit - `vm.graal.enabled`. >> >> I did the following: >> >> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ >> >> Could you elaborate how the two flags are related? I though, if graal is > >> used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are > set >> to true. Is that correct? I don't have a setup with graal, so I can not >> test this. >> >> Thanks, >> Christoph >> > From john.r.rose at oracle.com Tue Dec 3 18:59:14 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 3 Dec 2019 10:59:14 -0800 Subject: [14] RFR (L): 8234391: C2: Generic vector operands In-Reply-To: References: <89904467-5010-129f-6f61-e279cce8936a@oracle.com> <375F08F3-BE51-41F0-A20C-9443CB9F0D46@oracle.com> Message-ID: <675F00CB-AD08-4B08-BAAD-6346DD7A89D8@oracle.com> On Dec 3, 2019, at 3:49 AM, Vladimir Ivanov wrote: > > Thanks a lot for the review, John. You are welcome. Glad to be of assistance. From sandhya.viswanathan at intel.com Tue Dec 3 21:33:10 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 3 Dec 2019 21:33:10 +0000 Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations Message-ID: For vector replicate and reduction operations vinsert and vextract instructions are used. When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width. Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used. All other Xeon platforms should use the appropriate 256-bit wide vector instruction. JBS: https://bugs.openjdk.java.net/browse/JDK-8235288 Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/ Please review and approve. Best Regards, Sandhya From vladimir.kozlov at oracle.com Tue Dec 3 22:41:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Dec 2019 14:41:21 -0800 Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations In-Reply-To: References: Message-ID: <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com> Looks good. Thanks Vladimir > On Dec 3, 2019, at 1:33 PM, Viswanathan, Sandhya wrote: > > For vector replicate and reduction operations vinsert and vextract instructions are used. > When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width. > Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used. > All other Xeon platforms should use the appropriate 256-bit wide vector instruction. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235288 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > From claes.redestad at oracle.com Tue Dec 3 23:38:55 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 4 Dec 2019 00:38:55 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: Hi Thomas and others, thanks for the thorough feedback! I also had offline discussions about the existing utilities in zUtils.inline.hpp with Per Liden, and decided to try and work this into a templated solution that enables both 32-bit and 64-bit implementations to work correctly - along with signed variants (especially nice with types such as size_t). http://cr.openjdk.java.net/~redestad/8234331/open.03 This is a pretty significant rework. Changes since .01: - introduce utilities/powerOfTwo.hpp - move round_up_* and round_down_* from zUtils, re-implement to use the new implementation - implement next_* as round_up_*(value + 1) as per John's suggestion. - added tests - corrected the issues Thomas, Ivan and others pointed out, mainly ensuring we don't depend on undefined behavior in neither product code, asserts nor tests - .. and ensure to use next and round_up as appropriate to preserve behavior Other notes: - Moving existing power-of-two functions like is_power_of_2 from globalDefinitions.hpp to powerOfTwo.hpp would be straightforward, but tedious. I'd like to defer this to a follow-up - The xlc implementation is untested, but should work. If someone can verify, I'd be much obliged. - Many thanks to Erik ?sterlund, who guided me through a maze of undefined behavior and template metaprogramming to a workable and relatively clean implementation - HotSpot shrinks by ~15Kb :-) Testing: tier1-5 On 2019-11-28 08:34, Thomas St?fe wrote: > Hi Claes, > > I think this is useful. Why not a 64bit variant too? If you do not want > to go through the hassle of providing a count_leading_zeros(uint64_t), > you could call the 32bit variant twice and take care of endianness for > the caller. > > -- > > In inline int32_t next_power_of_two(int32_t value) , should we weed out > negative input values right away instead of asserting at the end of the > function? > > -- > > The functions will always return the next power of two, even if the > input is a power of two - e.g. "2" for "1". Is that intended? It would > be nice to have an API comment in the header describing these corner > cases (what happens for negative input, what happens if input is power 2). > > -- > > The patch can cause subtle differences in some caller code, I think, if > input value is a power of 2 already. See e.g: > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html > > - ?i=16; > - ?while( i < size ) i <<= 1; > + ?i = MAX2(16, (int)next_power_of_two(size)); > > If i == size == 16, old code would keep i==16, new code would come to > i==32, I think. > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html > > ?//------------------------------round_up--------------------------------------- > ?// Round up to nearest power of 2 > -uint NodeHash::round_up( uint x ) { > - ?x += (x>>2); ? ? ? ? ? ? ? ? ?// Add 25% slop > - ?if( x <16 ) return 16; ? ? ? ?// Small stuff > - ?uint i=16; > - ?while( i < x ) i <<= 1; ? ? ? // Double to fit > - ?return i; ? ? ? ? ? ? ? ? ? ? // Return hash table size > +uint NodeHash::round_up(uint x) { > + ?x += (x >> 2); ? ? ? ? ? ? ? ? ?// Add 25% slop > + ?return MAX2(16U, next_power_of_two(x)); > ?} > > same here. If x == 16, before we'd return 16, now 32. > > --- > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html > > I admit I do not understand the current coding :) I do not believe it > works for all input values, e.g. were > get_java_thread_list()->length()==1025, we'd get 1861 - if I am not > mistaken. Your code is definitely clearer but not equivalent to the old one. FTR, the algorithm used is described here: http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 (1025 should round up to 2048) Thanks! /Claes > > --- > > In the end, I wonder whether we should have two kind of APIs, or a > parameter, distinguishing between "next power of 2" and "next power of 2 > unless input value is already power of 2". > > Cheers, Thomas > > > > > > On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad > > wrote: > > Hi, > > in various places in the hotspot we have custom code to calculate the > next power of two, some of which have potential to go into an infinite > loop in case of an overflow. > > This patch proposes adding next_power_of_two utility methods which > avoid infinite loops on overflow, while providing slightly more > efficient code in most cases. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234331 > Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ > > Testing: tier1-3 > > Thanks! > > /Claes > From igor.ignatyev at oracle.com Wed Dec 4 01:49:02 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 3 Dec 2019 17:49:02 -0800 Subject: RFR(S) : 8129092 : compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast() can fail Message-ID: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com> http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 > 69 lines changed: 38 ins; 2 del; 29 mod; Hi all, could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event? the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes. JBS: https://bugs.openjdk.java.net/browse/JDK-8129092 webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java Thanks, -- Igor From john.r.rose at oracle.com Wed Dec 4 02:28:17 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 3 Dec 2019 18:28:17 -0800 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com> On Dec 3, 2019, at 3:38 PM, Claes Redestad wrote: > > http://cr.openjdk.java.net/~redestad/8234331/open.03 Nice work! A few minor complaints follow: I?m not totally comfortable with the proliferation of tiny header files. Are we crashing from one extreme (globalDefinitions.hpp) toward the other (usefulZeroConstant.hpp)? So having count_leading_zeros.hpp is arguably a correction, but then adding powerOfTwo.hpp feels like an over-correction. To be followed by more over-corrections: count_trailing_zeroes.hpp, populationCount.hpp, and so on. You won?t be surprised to hear that I think this would be excessive splitting, and that a lumpier outcome would feel better to me, as a successor to the chaos globalDefinitions.hpp. I propose putting anything involving integral bitmask operations into intBitmasks.hpp, including today?s count_trailing_zeroes.hpp and all the new stuff that will be like it. Start by renaming count_trailing_zeroes.hpp and rolling the new stuff into it. I can wait for this, but I?m going to grumble some more if we keep going down this road to TinyHeaderVille. This particular code : + if (high != 0) { + _BitScanReverse(&index, x); + } else { + _BitScanReverse(&index, x); + index += 32; + } is clearly untested, since it?s wrong. The `index+=32` belongs on the other branch. (Or is it me?) The root cause of this is the needless obfuscation of the `index ^= 31` and `index ^= 63`. Really?? Just say `31 - index` and `63 - index`, and the arithmetic reasoning will take care of itself. In the new code, `assert(lz > ?)` and `assert(lz < ?)` statements are in an unpredictable relative order, making it harder to compare and contrast the four versions. Suggest putting the < before the > case more regularly. To `next_power_of_2` I suggest adding an assert that the increment will not overflow. The asserts won't catch `next_power_of_2(max_jint)`, not for sure. ? John B. Twiddler From vladimir.kozlov at oracle.com Wed Dec 4 02:41:23 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Dec 2019 18:41:23 -0800 Subject: RFR(S) : 8129092 : compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast() can fail In-Reply-To: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com> References: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com> Message-ID: LGTM Thanks, Vladimir K On 12/3/19 5:49 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 >> 69 lines changed: 38 ins; 2 del; 29 mod; > > Hi all, > > could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event? > > the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8129092 > webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 > testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Wed Dec 4 05:09:24 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 3 Dec 2019 21:09:24 -0800 Subject: RFR(S) : 8129092 : compiler/intrinsics/classcast/NullCheckDroppingsTest.java testVarClassCast() can fail In-Reply-To: References: <6F72710B-8DA3-4733-A5CC-E3CB60005D4A@oracle.com> Message-ID: <69FA66AF-51FE-4235-BB8B-773EB884E295@oracle.com> thanks Vladimir, pushed. -- Igor > On Dec 3, 2019, at 6:41 PM, Vladimir Kozlov wrote: > > LGTM > > Thanks, > Vladimir K > > On 12/3/19 5:49 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 >>> 69 lines changed: 38 ins; 2 del; 29 mod; >> Hi all, >> could you please review the small patch which makes the test more robust by using newly introduced JFR Deoptimization event? >> the test used to use WhiteBox.isMethodCompiled to check if there was deoptimization, which, in case uncommon trap's action is none or maybe_recompile (which has been seen), is incorrect. the patch replaces WhiteBox method call w/ asserting JFR events, and check not only that deoptimization happens, but also check its reason is null_check. testing that, I noticed that 'testVarClassCast' doesn't hit uncommon trap due to null_check, but because of unstable_if in the ternary operator at L#180, so I modified the test to pass an instance of Class, which required some small changes. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8129092 >> webrev: http://cr.openjdk.java.net/~iignatyev//8129092/webrev.01 >> testing: compiler/intrinsics/classcast/NullCheckDroppingsTest.java >> Thanks, >> -- Igor From tobias.hartmann at oracle.com Wed Dec 4 05:59:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 4 Dec 2019 06:59:19 +0100 Subject: [14] RFR(S): 8234616: assert(0 <= i && i < _len) failed: illegal index in PhaseMacroExpand::expand_macro_nodes() In-Reply-To: References: <0d6221a1-23c2-3e22-c4ab-a182696f4275@oracle.com> <1022059a-c624-b184-e69a-5bf06d6eac50@oracle.com> Message-ID: <712012d5-9ec8-ca3f-ff2d-801c15fb2851@oracle.com> Thanks Vladimir! Best regards, Tobias On 03.12.19 19:18, Vladimir Kozlov wrote: > +1 > > Vladimir K > > On 12/3/19 7:16 AM, Vladimir Ivanov wrote: >> >>> http://cr.openjdk.java.net/~thartmann/8234616/webrev.00/ >> >> Looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> >>> We assert in PhaseMacroExpand::expand_macro_nodes() when accessing the macro_node array because >>> 'macro_idx' is out of the upper bound: >>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2591 >>> >>> The problem is that in the previous iteration, we hit the following code path which removes an >>> unreachable macro node but does not decrement 'macro_idx': >>> https://hg.openjdk.java.net/jdk/jdk/file/43c4fb8ba96b/src/hotspot/share/opto/macro.cpp#l2594 >>> >>> The problem was introduced in JDK 14 by the fix for JDK-8227384: >>> https://hg.openjdk.java.net/jdk/jdk/rev/43c4fb8ba96b#l3.25 >>> >>> I've changed the loop to a for-loop to make sure the index is always decremented and also >>> strengthened the asserts. I was not able to create a regression test but the issue reproduces >>> intermittently with Lucene. >>> >>> Thanks, >>> Tobias >>> From Pengfei.Li at arm.com Wed Dec 4 06:25:30 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Wed, 4 Dec 2019 06:25:30 +0000 Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> Message-ID: Thanks Tobias. Could anyone help push this? It's now reviewed by adinn, aph and thartmann. > Hi, > > this looks good to me. > > Best regards, > Tobias > > On 30.11.19 02:02, Ioi Lam wrote: > > Hi Pengfei, > > > > I have cc-ed hotspot-compiler-dev at openjdk.java.net. > > > > Please do not push the patch until someone from hotspot-compiler-dev > has looked at it. > > > > Many people are away due to Thanksgiving in the US. > > > > Thanks > > - Ioi > > > > On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote: > >> Hi, > >> > >> Please help review this small fix for 64-bit client build. > >> > >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/ > >> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791 > >> > >> Current 64-bit client VM build fails because errors occurred in > >> dumping the CDS archive. In JDK 12, we enabled "Default CDS > >> Archives"[1] which runs "java -Xshare:dump" after linking the JDK > >> image. But for Client VM build on 64-bit platforms, the ergonomic > >> flag UseCompressedOops is not set.[2] This leads to VM exits in > >> checking the flags for dumping the shared archive.[3] > >> > >> This change removes the "#if defined" macro to make shared archive > >> dump successful in 64-bit client build. By tracking the history of > >> the macro, I found it is initially added as "#ifndef COMPILER1"[4] 10 > >> years ago when C1 did not have a good support of compressed oops and > >> modified to current shape[5] in the implementation of tiered > >> compilation. It should be safe to be removed today. > >> > >> This patch also fixes another client build issue on AArch64. > >> > >> [1] http://openjdk.java.net/jeps/341 > >> [2] > >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/shar > >> e/runtime/arguments.cpp#l1694 > >> [3] > >> http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/shar > >> e/runtime/arguments.cpp#l3551 [4] > >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7 > >> [5] > >> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56 -- Thanks, Pengfei From tobias.hartmann at oracle.com Wed Dec 4 07:09:45 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 4 Dec 2019 08:09:45 +0100 Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64 In-Reply-To: References: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com> Message-ID: <8315aeca-f0cb-b41e-782a-28e545772d09@oracle.com> On 04.12.19 07:25, Pengfei Li (Arm Technology China) wrote: > Thanks Tobias. Could anyone help push this? It's now reviewed by adinn, aph and thartmann. Sure, pushed. Best regards, Tobias From xxinliu at amazon.com Wed Dec 4 07:23:06 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 4 Dec 2019 07:23:06 +0000 Subject: RFR(XS): 8234977: [Aarch64] C1 lacks a membar store in slowpath of allocate_array In-Reply-To: References: Message-ID: Hi, Andrew and Martin, The slowpath I refer to is the JRT new_type_array. https://hg.openjdk.java.net/jdk/jdk/file/a1802614d6fe/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#l855 Martin is right. the JRT function is actually guarded by a mfence. Thank you to point it out! My previous understanding was wrong. I acknowledge that the extra "membar(StoreStore)" is unnecessary when the control flow returns from the slow path. I withdraw my webrev. My fix can solve our problem on jdk8u and jdk11, but it is too lame. I found that the root cause is JDK-8233839, so I marked my bug 'duplication'. I will propose to backport JDK-8233839 to jdk8u-aarch64. On the other side, can I say C2 has a redundant membar(StoreStore). Is it worth optimizing it? The branch at 0x0000ffff8056bef0 of https://bugs.openjdk.java.net/secure/attachment/85882/hotspot_pid4218.c2.log is the return from slow-path Thanks, --lx ?On 12/3/19, 8:18 AM, "aph at redhat.com" wrote: On 12/3/19 6:00 AM, Liu, Xin wrote: > C1 misses a member for the slow path of alloc_array or alloc_obj on aarch64. I don't understand this. In the slow path, the MemAllocator::finish should insert the necessary membar. Can you show us the slow path which does not do the membar? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christoph.goettschkes at microdoc.com Wed Dec 4 09:49:55 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Wed, 4 Dec 2019 10:49:55 +0100 Subject: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for client VMs due to Unrecognized VM option LoopUnrollLimit In-Reply-To: References: <20191127125735.B9BE111F377@aojmv0009> <20191128124549.BCDF0C6F24@aojmv0009> <20191203122444.7D54F13FFFC@aojmv0009> Message-ID: Hi Vladimir, thanks for the explanation and for pointing to the code. I integrated your suggestions into this new webrev: https://cr.openjdk.java.net/~cgo/8234906/webrev.02/ I already created the changeset with you as a reviewer, so if you are fine with this version, may I ask you to sponsor it and commit it into the repository for me? https://cr.openjdk.java.net/~cgo/8234906/webrev.02/jdk-jdk.changeset Thanks, Christoph Vladimir Kozlov wrote on 2019-12-03 19:33:52: > From: Vladimir Kozlov > To: christoph.goettschkes at microdoc.com > Cc: hotspot-compiler-dev at openjdk.java.net > Date: 2019-12-03 19:34 > Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > client VMs due to Unrecognized VM option LoopUnrollLimit > > You don't need to duplicate @bug (it was C2 bug anyway). And don't need > to check & !vm.graal.enabled for C2 case (see > explanation below) > > > Could you elaborate how the two flags are related? I though, if graal is > > used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are set > > to true. Is that correct? I don't have a setup with graal, so I can not > > test this. > > When we enable Graal JIT it is used instead of C2. They are mutually exclusive. > Compiler.isC2Enabled() (which sets vm.compiler2.enabled [1]) returns > false when isGraalEnabled() returns true [2]. > > Regards, > Vladimir > > [1] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/jtreg- > ext/requires/VMProps.java#l117 > [2] http://hg.openjdk.java.net/jdk/jdk/file/138b0f3fe18c/test/lib/sun/ > hotspot/code/Compiler.java#l80 > > > On 12/3/19 4:22 AM, christoph.goettschkes at microdoc.com wrote: > > Hi Vladimir, > > > > could you have a look at my updated webrev regarding this failing test? > > https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ > > > > See my inline comments in the mail below. > > > > Thanks, > > Christpoh > > > > "hotspot-compiler-dev" > > wrote on 2019-11-28 13:44:05: > > > >> From: christoph.goettschkes at microdoc.com > >> To: Vladimir Kozlov > >> Cc: hotspot-compiler-dev at openjdk.java.net > >> Date: 2019-11-28 13:46 > >> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > >> client VMs due to Unrecognized VM option LoopUnrollLimit > >> Sent by: "hotspot-compiler-dev" > > > >> > >> Hi Vladimir, > >> > >> Vladimir Kozlov wrote on 2019-11-27 > > 20:54:02: > >> > >>> From: Vladimir Kozlov > >>> To: christoph.goettschkes at microdoc.com, > >> hotspot-compiler-dev at openjdk.java.net > >>> Date: 2019-11-27 20:54 > >>> Subject: Re: RFR: 8234906: [TESTBUG] TestDivZeroCheckControl fails for > > > >>> client VMs due to Unrecognized VM option LoopUnrollLimit > >>> > >>> Hi Christoph > >>> > >>> I was about suggest IgnoreUnrecognizedVMOptions flag but remembered > >>> discussion about 8231954 fix. > >> > >> Yes, I try to avoid "IgnoreUnrecognizedVMOptions" because of our > > previous > >> discussion. I also think that it doesn't make sense to execute tests in > > VM > >> configurations for which they are not written for. Most of the compiler > >> tests simply have "IgnoreUnrecognizedVMOptions" and probably waste a > > good > >> amount of time in certain VM configurations. > >> > >>> But I think the test should be run with Graal - it does have OSR > >>> compilation and we need to test it. > >> > >> Sure. I disabled it, because I thought that the flag "LoopUnrollLimit" > > is > >> required to trigger the faulty behavior, but I don't know much about > >> optimization in the graal JIT. > >> > >>> > >>> We can do it by splitting test runs (duplicate @test block with > >>> different run flags) to have 2 tests with different > >>> flags and conditions. See [1]. > >>> > >>> For existing @run block we use `@requires vm.compiler2.enabled` and > > for > >>> new without LoopUnrollLimit - `vm.graal.enabled`. > >> > >> I did the following: > >> > >> https://cr.openjdk.java.net/~cgo/8234906/webrev.01/ > >> > >> Could you elaborate how the two flags are related? I though, if graal is > > > >> used as a JIT, both `vm.graal.enabled` and `vm.compiler2.enabled` are > > set > >> to true. Is that correct? I don't have a setup with graal, so I can not > >> test this. > >> > >> Thanks, > >> Christoph > >> > > > From claes.redestad at oracle.com Wed Dec 4 10:18:26 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 4 Dec 2019 11:18:26 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <3A61B685-1311-48D2-92F8-0ABE95F9EFD5@oracle.com> Message-ID: <3b9b77dc-7a2d-8365-e060-30d13ba355a9@oracle.com> On 2019-12-04 03:28, John Rose wrote: > On Dec 3, 2019, at 3:38 PM, Claes Redestad > wrote: >> >> http://cr.openjdk.java.net/~redestad/8234331/open.03 > > Nice work! Thanks! > > A few minor complaints follow: > > I?m not totally comfortable with the proliferation of tiny header files. > Are we crashing from one extreme (globalDefinitions.hpp) toward the > other (usefulZeroConstant.hpp)? ?So having?count_leading_zeros.hpp > is arguably a correction, but then adding?powerOfTwo.hpp feels like > an over-correction. ?To be followed by more over-corrections: > count_trailing_zeroes.hpp, populationCount.hpp, and so on. > > You won?t be surprised to hear that I think this would be excessive > splitting, and that a lumpier outcome would feel better to me, as a > successor to the chaos globalDefinitions.hpp. ?I propose putting > anything involving integral bitmask operations into intBitmasks.hpp, > including today?s count_trailing_zeroes.hpp and all the new stuff > that will be like it. ?Start by renaming count_trailing_zeroes.hpp and > rolling the new stuff into it. ?I can wait for this, but I?m going to > grumble > some more if we keep going down this road to TinyHeaderVille. I hear microservices is all the rage these days. :-) I can agree that one-header-per-logical-function would become excessive, but I think organizing functions into more logical units needs to be done iteratively as patterns emerge. > > This particular code : > + ? ?if (high != 0) { > + ? ? ?_BitScanReverse(&index, x); > + ? ?} else { > + ? ? ?_BitScanReverse(&index, x); > + ? ? ?index += 32; > + ? ?} > > is clearly untested, since it?s wrong. The `index+=32` belongs on the other > branch. ?(Or is it me?) You're both right and wrong: yes, this block is untested (we don't have 32-bit Windows builds in our test system). And the code is/was wrong, since the statement in the first branch should read: _BitScanReverse(&index, high); // high, not x However, the += 32 is in the right place: when the upper 32-bit word of a 64-bit is non-zero, the number of leading zeros is in the [0-31] range, but if it's in the lower 32-bit word the number of leading zeros is in the [32-63] range. _BitScanReverse only operates on a 32-bit int. > The root cause of this is the needless obfuscation > of the `index ^= 31` and `index ^= 63`. ?Really?? ?Just say `31 - index` and > `63 - index`, and the arithmetic reasoning will take care of itself. I agree this is a bit obscure (a habit I probably picked up from Hacker's Delight and other sources), and will go with 31u|63u - index. > > In the new code, `assert(lz > ?)`?and?`assert(lz < ?)`?statements are in an > unpredictable relative order, making it harder to compare?and contrast the > four versions. ?Suggest putting the < before the > case more regularly. Fixed. > > To?`next_power_of_2` I suggest adding an assert that the increment > will not overflow. ?The asserts won't catch `next_power_of_2(max_jint)`, > not for sure. Yes, adding a check for the max value should be sufficient, since other overflows are asserted against in round_up: assert(value != std::numeric_limits::max(), "Overflow"); Updating in place and rerunning gtests on all platforms. Thanks! /Claes > > ? John B. Twiddler > From vladimir.x.ivanov at oracle.com Wed Dec 4 12:40:21 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 4 Dec 2019 15:40:21 +0300 Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations In-Reply-To: References: Message-ID: > Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/ Looks good. Best regards, Vladimir Ivanov From thomas.stuefe at gmail.com Wed Dec 4 16:32:44 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 4 Dec 2019 17:32:44 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: Hi Claes, I think this is very good and useful. General remarks: - Pity that it only works for 32/64bit values. It would be neat to have variants for 8 and 16 bit values too. - +1 for a "lumpier" header. - the round_down() variants: would it make sense to allow 0 as valid input? I can imagine scenarios where 0 could be a valid input and result for this operation. - I tested on AIX, builds and gtests run through without errors. ---- http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html + int hash_table_size = round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) << 1); Style nit, I would have preferred this to be two operations for readability. http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html 43 inline typename EnableIf::value, T>::type round_down_power_of_2(T value) { ... 47 assert(lz < sizeof(T) * BitsPerByte, "Sanity"); Do we need this if we check the input value for >0 ? (Same question for the signed version of round_up() below). 48 assert(lz > 0, "Overflow to negative"); Same here? Also, small nit, the assert text is slightly off since no overflow happened yet, but the input value is negative. (The asserts do not bother me much, I am just trying to understand why we need them) - 100 // Accepts 0 (returns 1); overflows if value is larger than or equal to 2^31 101 // or 2^63, for 32- and 64-bit integers, respectively Comment is a bit off. True only for unsigned values. Also it should mention that overflow results in assert to be in sync with the other comments. ---- http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html template inline uint32_t count_leading_zeros(T x) { Instead of all the "if sizeof(T) == 4|8", could we specialize the functions for different operand sizes? Example: template struct Impl; template struct Impl { static int doit(T v) { return __builtin_clzll(v); } }; template struct Impl { static int doit(T v) { return __builtin_clz(v); } }; template int count_leading_zero(T v) { return Impl::doit(v); } Would make it easier later to plug in implementations for different sizes, and we'd still get a compile time error for invalid input types. - I wondered why count_leading_zeros did not allow 0 as input until I saw that __builtin_clz(ll) unfolds to bsr; and that does not work for input=0. I still find it an artificial limitation but that was there before your patch. ---- Thank you for the enhanced comment in the fallback implementation, thats helpful. ---- http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html It bothers me a bit that we do not test invalid input and overflow. There is a way, I believe, to have gtests expect an assert. We have TEST_VM_ASSERT but I am not sure if that is a good idea since crashing takes time. I also see that we only use it at a single place, so I guess others share the same concern :) Alternatively, one could have variants of round_up/down which return a definite value on overflow. I wonder whether this may be useful someplace outside the tests. - Just a thought, but could you cut down the code for the many TEST functions if you would use a templatized test function? Basically, something like template struct MyMaxImpl; template MyMaxImpl { static int32 get() { return int32_max_pow2; } }; .. etc template void round_up_test() { EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1; ... const T max = MyMaxImpl::get(); EXPECT_EQ(round_up_power_of_2(max - 1), max) ... } TEST(power_of_2, round_up_power_of_2_signed_32) { round_up_do_test(); } That way the test would also be easily extendable later if we decide to add support for uint8_t or uint16_t. Thanks, Thomas On Wed, Dec 4, 2019 at 12:36 AM Claes Redestad wrote: > Hi Thomas and others, > > thanks for the thorough feedback! > > I also had offline discussions about the existing utilities in > zUtils.inline.hpp with Per Liden, and decided to try and work this into > a templated solution that enables both 32-bit and 64-bit > implementations to work correctly - along with signed variants > (especially nice with types such as size_t). > > http://cr.openjdk.java.net/~redestad/8234331/open.03 > > This is a pretty significant rework. Changes since .01: > > - introduce utilities/powerOfTwo.hpp > - move round_up_* and round_down_* from zUtils, re-implement to use > the new implementation > - implement next_* as round_up_*(value + 1) as per John's suggestion. > - added tests > - corrected the issues Thomas, Ivan and others pointed out, mainly > ensuring we don't depend on undefined behavior in neither product > code, asserts nor tests > - .. and ensure to use next and round_up as appropriate to preserve > behavior > > Other notes: > > - Moving existing power-of-two functions like is_power_of_2 from > globalDefinitions.hpp to powerOfTwo.hpp would be straightforward, but > tedious. I'd like to defer this to a follow-up > - The xlc implementation is untested, but should work. If someone can > verify, I'd be much obliged. > - Many thanks to Erik ?sterlund, who guided me through a maze of > undefined behavior and template metaprogramming to a workable and > relatively clean implementation > - HotSpot shrinks by ~15Kb :-) > > Testing: tier1-5 > > On 2019-11-28 08:34, Thomas St?fe wrote: > > Hi Claes, > > > > I think this is useful. Why not a 64bit variant too? If you do not want > > to go through the hassle of providing a count_leading_zeros(uint64_t), > > you could call the 32bit variant twice and take care of endianness for > > the caller. > > > > -- > > > > In inline int32_t next_power_of_two(int32_t value) , should we weed out > > negative input values right away instead of asserting at the end of the > > function? > > > > -- > > > > The functions will always return the next power of two, even if the > > input is a power of two - e.g. "2" for "1". Is that intended? It would > > be nice to have an API comment in the header describing these corner > > cases (what happens for negative input, what happens if input is power > 2). > > > > -- > > > > The patch can cause subtle differences in some caller code, I think, if > > input value is a power of 2 already. See e.g: > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html > > > > - i=16; > > - while( i < size ) i <<= 1; > > + i = MAX2(16, (int)next_power_of_two(size)); > > > > If i == size == 16, old code would keep i==16, new code would come to > > i==32, I think. > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html > > > > > //------------------------------round_up--------------------------------------- > > // Round up to nearest power of 2 > > -uint NodeHash::round_up( uint x ) { > > - x += (x>>2); // Add 25% slop > > - if( x <16 ) return 16; // Small stuff > > - uint i=16; > > - while( i < x ) i <<= 1; // Double to fit > > - return i; // Return hash table size > > +uint NodeHash::round_up(uint x) { > > + x += (x >> 2); // Add 25% slop > > + return MAX2(16U, next_power_of_two(x)); > > } > > > > same here. If x == 16, before we'd return 16, now 32. > > > > --- > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html > > > > I admit I do not understand the current coding :) I do not believe it > > works for all input values, e.g. were > > get_java_thread_list()->length()==1025, we'd get 1861 - if I am not > > mistaken. Your code is definitely clearer but not equivalent to the old > one. > > FTR, the algorithm used is described here: > > http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 > > (1025 should round up to 2048) > > Thanks! > > /Claes > > > > > --- > > > > In the end, I wonder whether we should have two kind of APIs, or a > > parameter, distinguishing between "next power of 2" and "next power of 2 > > unless input value is already power of 2". > > > > Cheers, Thomas > > > > > > > > > > > > On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad > > > wrote: > > > > Hi, > > > > in various places in the hotspot we have custom code to calculate the > > next power of two, some of which have potential to go into an > infinite > > loop in case of an overflow. > > > > This patch proposes adding next_power_of_two utility methods which > > avoid infinite loops on overflow, while providing slightly more > > efficient code in most cases. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234331 > > Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/ > > > > Testing: tier1-3 > > > > Thanks! > > > > /Claes > > > From sandhya.viswanathan at intel.com Wed Dec 4 17:01:52 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 4 Dec 2019 17:01:52 +0000 Subject: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations In-Reply-To: <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com> References: <3C7A0114-AD7B-4032-8D50-0F2D31EB97BC@oracle.com> Message-ID: Hi Vladimir, Could you please sponsor this patch? Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, December 03, 2019 2:41 PM To: Viswanathan, Sandhya Cc: hotspot compiler Subject: Re: RFR(XXS): 8235288 :AVX 512 instructions inadvertently used on Xeon for small vector width operations Looks good. Thanks Vladimir > On Dec 3, 2019, at 1:33 PM, Viswanathan, Sandhya wrote: > > For vector replicate and reduction operations vinsert and vextract instructions are used. > When UseAVX level is set to 3, these instructions are unnecessarily encoded with 512-bit vector width. > Only for KNL platform which doesn't support AVX512 variable length encoding, the 512-bit wide instruction need to be used. > All other Xeon platforms should use the appropriate 256-bit wide vector instruction. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235288 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8235288/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > From claes.redestad at oracle.com Wed Dec 4 17:37:05 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 4 Dec 2019 18:37:05 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> Message-ID: <39475089-9933-f85a-8927-661bba44780c@oracle.com> Hi Thomas, On 2019-12-04 17:32, Thomas St?fe wrote: > Hi Claes, > > I think this is very good and useful. > > General remarks: > > - Pity that it only works for 32/64bit values. It would be neat to have > variants for 8 and 16 bit values too. I don't think adding 8- and 16-bit variants would be too hard, but would need specialized adjustment since most platforms only have 32- and 64- bit intrinsics for clz. Do you know any code where we might need such specializations...? > > - +1 for a "lumpier" header. o_O > > - the round_down() variants: would it make sense to allow 0 as valid > input? I can imagine scenarios where 0 could be a valid input and result > for this operation. I've been contemplating if there's room for variants that are less strict, e.g., where you can define what values you should get on 0 or overflow. I think that's out of scope for this RFE. > > - I tested on AIX, builds and gtests run through without errors. Great, thanks! > > ---- > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html > > + ?int hash_table_size = > round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) << 1); > > Style nit, I would have preferred this to be two operations for > readability. Sure, will fix. > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html > > ? 43 inline typename EnableIf::value, T>::type > round_down_power_of_2(T value) { > ... > > ? 47 ? assert(lz < sizeof(T) * BitsPerByte, "Sanity"); > > Do we need this if we check the input value for >0 ? (Same question for > the signed version of round_up() below). Technically we shouldn't need any of the "Sanity" asserts that check that lz is in the expected range ([0-31]/[0-63]). I added them mostly as scaffolding when writing these functions, but kept them since I think they help reason about the code. I can remove them if you insist. > > ? 48 ? assert(lz > 0, "Overflow to negative"); > > Same here? Right, for round_down this one is pointless lz == 0 would only be possible if value > 0. I'll remove this one. For round_up we overflow when value > 2^30, so lz == 1, which is not covered by value > 0. So that assert I think is needed. > Also, small nit, the assert text is slightly off since no overflow > happened yet, but the input value is negative. Maybe just: assert(lz > 1, "Will overflow"); > > (The asserts do not bother me much, I am just trying to understand why > we need them) To catch obvious programming mistakes, catch more issues during testing, and reason about the code. > > - > > ?100 // Accepts 0 (returns 1); overflows if value is larger than or > ?equal to 2^31 > ?101 // or 2^63, for 32- and 64-bit integers, respectively > > Comment is a bit off. True only for unsigned values. Also it should > mention that overflow results in assert to be in sync with the other > comments. How about: // Accepts 0 (returns 1), overflows with assert if value is larger than // or equal to 2^31 (2^30 for signed) or 2^63 (2^62 if signed), for 32- // and 64-bit integers, respectively > > ---- > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html > > template inline uint32_t count_leading_zeros(T x) { > > Instead of all the "if sizeof(T) == 4|8", could we specialize the > functions for different operand sizes? Example: > > template struct Impl; > template struct Impl { static int doit(T v) { return > __builtin_clzll(v); } }; > template struct Impl { static int doit(T v) { return > __builtin_clz(v); } }; > template int count_leading_zero(T v) { return Impl sizeof(T)>::doit(v); } > > Would make it easier later to plug in implementations for different > sizes, and we'd still get a compile time error for invalid input types. I guess we could, but I'm not so sure it'd help readability. > > - > > I wondered why count_leading_zeros did not allow 0 as input until I saw > that __builtin_clz(ll) unfolds to bsr; and that does not work for > input=0. I still find it an artificial limitation but that was there > before your patch. Yes, it's a bit unfortunate and artificial, and I think we'd regress some actually performance-critical places by adding a special-case for 0. If we want graceful versions that give 0 and max on over- and underflow we can add them in separately, I think. > > ---- > > Thank you for the enhanced comment in the fallback implementation, thats > helpful. > > ---- > > http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html > > It bothers me a bit that we do not test invalid input and overflow. > There is a way, I believe, to have gtests expect an assert. We have > TEST_VM_ASSERT but I am not sure if that is a good idea since crashing > takes time. I also see that we only use it at a single place, so I guess > others share the same concern :) > > Alternatively, one could have variants of round_up/down which return a > definite value on overflow. I wonder whether this may be useful > someplace outside the tests. Yeah, a lighter mechanism for testing asserts would be great. I'd prefer deferring negative tests to a follow-up, though, since I have a few other things to attend to before 14 FC... :-) > > - > > Just a thought, but could you cut down the code for the many TEST > functions if you would use a templatized test function? Basically, > something like > > template struct MyMaxImpl; > template MyMaxImpl ? { static int32 get() { return > int32_max_pow2; } }; > .. etc > > template void round_up_test() { > ? EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1; > ... > ? ? const T max = MyMaxImpl::get(); > ? EXPECT_EQ(round_up_power_of_2(max - 1), max) > ... > } > > TEST(power_of_2, round_up_power_of_2_signed_32) { > ? round_up_do_test(); > } > > That way the test would also be easily extendable later if we decide to > add support for uint8_t or uint16_t. We might save a few lines of code, yes, but my preference is keeping tests explicit even if that makes them somewhat more repetitive - to a point. I can see if we were to add int8/int16 variants that this might move well beyond that point, though. Thanks! /Claes From claes.redestad at oracle.com Thu Dec 5 01:32:55 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 5 Dec 2019 02:32:55 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <39475089-9933-f85a-8927-661bba44780c@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: Hi again, On 2019-12-04 18:37, Claes Redestad wrote: >> >> >> http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html >> >> >> template inline uint32_t count_leading_zeros(T x) { >> >> Instead of all the "if sizeof(T) == 4|8", could we specialize the >> functions for different operand sizes? Example: >> >> template struct Impl; >> template struct Impl { static int doit(T v) { >> return __builtin_clzll(v); } }; >> template struct Impl { static int doit(T v) { >> return __builtin_clz(v); } }; >> template int count_leading_zero(T v) { return Impl> sizeof(T)>::doit(v); } >> >> Would make it easier later to plug in implementations for different >> sizes, and we'd still get a compile time error for invalid input types. > > I guess we could, but I'm not so sure it'd help readability. after some considerations, I took this idea for a spin, along with implementations and tests for 8 and 16-bit for completeness up front: http://cr.openjdk.java.net/~redestad/8234331/open.04 I also turned tests into template functions etc. I think it turned out an improvement. Testing: tier1-3 (some ongoing) Thanks! /Claes From tobias.hartmann at oracle.com Thu Dec 5 08:56:57 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 5 Dec 2019 09:56:57 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> Message-ID: Hi Vladimir, this looks good to me. On 29.11.19 16:42, Vladimir Ivanov wrote: > It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. Is the remaining regression due to JDK-8226396? Best regards, Tobias From vladimir.x.ivanov at oracle.com Thu Dec 5 09:00:15 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 5 Dec 2019 12:00:15 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> Message-ID: <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com> Thanks, Tobias. >> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. > > Is the remaining regression due to JDK-8226396? Yes. Best regards, Vladimir Ivanov From tobias.hartmann at oracle.com Thu Dec 5 09:03:57 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 5 Dec 2019 10:03:57 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <4c2c64dd-e339-1b86-d6ce-639cd9565308@oracle.com> Message-ID: <64515d64-b198-27f0-b1e0-5e87985cb954@oracle.com> On 05.12.19 10:00, Vladimir Ivanov wrote: >> Is the remaining regression due to JDK-8226396? > > Yes. Okay, thanks for confirming. Best regards, Tobias From vladimir.x.ivanov at oracle.com Thu Dec 5 09:53:28 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 5 Dec 2019 12:53:28 +0300 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information Message-ID: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8234392 Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() on x86. It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() [1] for details). Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Testing: tier1-4 Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442 From thomas.stuefe at gmail.com Thu Dec 5 10:25:38 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 5 Dec 2019 11:25:38 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <39475089-9933-f85a-8927-661bba44780c@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: Hi Claes, answers to your remarks inline. I will post the review for v0.4 separately. On Wed, Dec 4, 2019 at 6:36 PM Claes Redestad wrote: > Hi Thomas, > > On 2019-12-04 17:32, Thomas St?fe wrote: > > Hi Claes, > > > > I think this is very good and useful. > > > > General remarks: > > > > - Pity that it only works for 32/64bit values. It would be neat to have > > variants for 8 and 16 bit values too. > > I don't think adding 8- and 16-bit variants would be too hard, but would > need specialized adjustment since most platforms only have 32- and 64- > bit intrinsics for clz. Do you know any code where we might need such > specializations...? > > Not offhand, no. Just an urge to completeness. > > > > - +1 for a "lumpier" header. > > o_O > > Referring to Johns earlier mail about lumping the utility functions together in a single header instead of having micro headers for every function. > > > - the round_down() variants: would it make sense to allow 0 as valid > > input? I can imagine scenarios where 0 could be a valid input and result > > for this operation. > > I've been contemplating if there's room for variants that are less > strict, e.g., where you can define what values you should get on > 0 or overflow. I think that's out of scope for this RFE. > > > > > - I tested on AIX, builds and gtests run through without errors. > > Great, thanks! > > > > > ---- > > > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/runtime/threadSMR.cpp.udiff.html > > > > + int hash_table_size = > > round_up_power_of_2(MIN2((int)get_java_thread_list()->length(), 32) << > 1); > > > > Style nit, I would have preferred this to be two operations for > > readability. > > Sure, will fix. > > > > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/powerOfTwo.hpp.html > > > > 43 inline typename EnableIf::value, T>::type > > round_down_power_of_2(T value) { > > ... > > > > 47 assert(lz < sizeof(T) * BitsPerByte, "Sanity"); > > > > Do we need this if we check the input value for >0 ? (Same question for > > the signed version of round_up() below). > > Technically we shouldn't need any of the "Sanity" asserts that check > that lz is in the expected range ([0-31]/[0-63]). I added them mostly as > scaffolding when writing these functions, but kept them since I think > they help reason about the code. I can remove them if you insist. > > No, not necessary. Just wondered if I missed something. > > > > 48 assert(lz > 0, "Overflow to negative"); > > > > Same here? > > Right, for round_down this one is pointless lz == 0 would only be > possible if value > 0. I'll remove this one. > > For round_up we overflow when value > 2^30, so lz == 1, which is not > covered by value > 0. So that assert I think is needed. > > Oh, right. > > Also, small nit, the assert text is slightly off since no overflow > > happened yet, but the input value is negative. > > Maybe just: > > assert(lz > 1, "Will overflow"); > Thanks. > > > > > (The asserts do not bother me much, I am just trying to understand why > > we need them) > > To catch obvious programming mistakes, catch more issues during testing, > and reason about the code. > > > > > - > > > > 100 // Accepts 0 (returns 1); overflows if value is larger than or > > equal to 2^31 > > 101 // or 2^63, for 32- and 64-bit integers, respectively > > > > Comment is a bit off. True only for unsigned values. Also it should > > mention that overflow results in assert to be in sync with the other > > comments. > > How about: > > // Accepts 0 (returns 1), overflows with assert if value is larger than > // or equal to 2^31 (2^30 for signed) or 2^63 (2^62 if signed), for 32- > // and 64-bit integers, respectively > Sounds good thank you. > > > > > ---- > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html > > > > template inline uint32_t count_leading_zeros(T x) { > > > > Instead of all the "if sizeof(T) == 4|8", could we specialize the > > functions for different operand sizes? Example: > > > > template struct Impl; > > template struct Impl { static int doit(T v) { return > > __builtin_clzll(v); } }; > > template struct Impl { static int doit(T v) { return > > __builtin_clz(v); } }; > > template int count_leading_zero(T v) { return Impl > sizeof(T)>::doit(v); } > > > > Would make it easier later to plug in implementations for different > > sizes, and we'd still get a compile time error for invalid input types. > > I guess we could, but I'm not so sure it'd help readability. > > > > > - > > > > I wondered why count_leading_zeros did not allow 0 as input until I saw > > that __builtin_clz(ll) unfolds to bsr; and that does not work for > > input=0. I still find it an artificial limitation but that was there > > before your patch. > > Yes, it's a bit unfortunate and artificial, and I think we'd regress > some actually performance-critical places by adding a special-case for > 0. If we want graceful versions that give 0 and max on over- and > underflow we can add them in separately, I think. > > > > > ---- > > > > Thank you for the enhanced comment in the fallback implementation, thats > > helpful. > > > > ---- > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.03/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html > > > > It bothers me a bit that we do not test invalid input and overflow. > > There is a way, I believe, to have gtests expect an assert. We have > > TEST_VM_ASSERT but I am not sure if that is a good idea since crashing > > takes time. I also see that we only use it at a single place, so I guess > > others share the same concern :) > > > > Alternatively, one could have variants of round_up/down which return a > > definite value on overflow. I wonder whether this may be useful > > someplace outside the tests. > > Yeah, a lighter mechanism for testing asserts would be great. I'd prefer > deferring negative tests to a follow-up, though, since I have a few > other things to attend to before 14 FC... :-) > > > > > - > > > > Just a thought, but could you cut down the code for the many TEST > > functions if you would use a templatized test function? Basically, > > something like > > > > template struct MyMaxImpl; > > template MyMaxImpl { static int32 get() { return > > int32_max_pow2; } }; > > .. etc > > > > template void round_up_test() { > > EXPECT_EQ(round_up_power_of_2((T)1), (T)1) << "value = " << 1; > > ... > > const T max = MyMaxImpl::get(); > > EXPECT_EQ(round_up_power_of_2(max - 1), max) > > ... > > } > > > > TEST(power_of_2, round_up_power_of_2_signed_32) { > > round_up_do_test(); > > } > > > > That way the test would also be easily extendable later if we decide to > > add support for uint8_t or uint16_t. > > We might save a few lines of code, yes, but my preference is keeping > tests explicit even if that makes them somewhat more repetitive - to a > point. I can see if we were to add int8/int16 variants that this might > move well beyond that point, though. > > Thanks! > > /Claes > Cheers, Thomas From thomas.stuefe at gmail.com Thu Dec 5 10:26:59 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 5 Dec 2019 11:26:59 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: Hi Claes, this looks nice, thank you for your work. Only minor stuff remains, mostly matters of taste. Feel free to ignore them. The patch is already fine as it is for me. With your latest patch, AIX still works. --- http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html Taste matter: I find the return type for count_leading_zeros as uint32_t oddly specific. Instinctively I would have choosen int or unsigned. The gcc buildins all return int. --- Question, why do we treat signed numbers of 8 and 16 bits different wrt to values < 0? if you leave behavior as it is, could you please adjust the comment: // Return the number of leading zeros in x, e.g. the zero-based index // of the most significant set bit in x. Undefined for 0. + // For signed numbers whose size is < 32bit, function will + // return 0 for values < 0. ---- Taste matter: Windows, 64bit version for x86: You could call your own count_leading_zeros, instead of calling _BitScanReverse. ---- http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html In the test for next_up, can we have, for completeness sake, a test which tests that with any valid input pow2 the result is the next pow2? --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp Thu Dec 05 02:28:58 2019 +0100 +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp Thu Dec 05 11:08:47 2019 +0100 @@ -139,6 +139,11 @@ EXPECT_EQ(pow2, next_power_of_2(pow2 - 1)) << "value = " << pow2 - 1; } + // next(pow2) should return pow2 * 2 + for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) { + EXPECT_EQ(pow2 * 2, next_power_of_2(pow2)) + << "value = " << pow2; + } --- Thanks, Thomas On Thu, Dec 5, 2019 at 2:32 AM Claes Redestad wrote: > Hi again, > > On 2019-12-04 18:37, Claes Redestad wrote: > >> > >> > >> > http://cr.openjdk.java.net/~redestad/8234331/open.03/src/hotspot/share/utilities/count_leading_zeros.hpp.frames.html > >> > >> > >> template inline uint32_t count_leading_zeros(T x) { > >> > >> Instead of all the "if sizeof(T) == 4|8", could we specialize the > >> functions for different operand sizes? Example: > >> > >> template struct Impl; > >> template struct Impl { static int doit(T v) { > >> return __builtin_clzll(v); } }; > >> template struct Impl { static int doit(T v) { > >> return __builtin_clz(v); } }; > >> template int count_leading_zero(T v) { return Impl >> sizeof(T)>::doit(v); } > >> > >> Would make it easier later to plug in implementations for different > >> sizes, and we'd still get a compile time error for invalid input types. > > > > I guess we could, but I'm not so sure it'd help readability. > > after some considerations, I took this idea for a spin, along with > implementations and tests for 8 and 16-bit for completeness up > front: > > http://cr.openjdk.java.net/~redestad/8234331/open.04 > > I also turned tests into template functions etc. I think it turned > out an improvement. > > Testing: tier1-3 (some ongoing) > > Thanks! > > /Claes > From vladimir.x.ivanov at oracle.com Thu Dec 5 12:01:21 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 5 Dec 2019 15:01:21 +0300 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations Message-ID: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all https://bugs.openjdk.java.net/browse/JDK-8235405 Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following operations: - LoadVector - StoreVector - RoundDoubleModeV - AndV - OrV - XorV - MulAddVS2VI - PopCountVI Indiviual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html From vladimir.x.ivanov at oracle.com Thu Dec 5 12:43:39 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 5 Dec 2019 15:43:39 +0300 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic Message-ID: http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8235143 Thread::currentThread() intrinsic doesn't need memory state: though multiple threads can execute same code, "current thread" can't change in the context of a single method activation. So, once it is observed, it's safe to share among all users. One of the use cases which benefit a lot from such optimization is ownership checks for thread confined resources (fast path check for owner thread to avoid heavy-weight synchronization). The patch was part of foreign-memaccess branch in Project Panama and showed good performance results on Memory Access API implementation [1]. Testing: tier1-4 PS: the optimization should be disabled in Project Loom: the assumption doesn't hold for continuations (in their current form). Best regards, Vladimir Ivanov [1] https://openjdk.java.net/jeps/370 JEP 370: Foreign-Memory Access API (Incubator) From christian.hagedorn at oracle.com Thu Dec 5 13:22:16 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 5 Dec 2019 14:22:16 +0100 Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in get_early_ctrl_for_expensive Message-ID: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8229994 http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/ As a part of loop peeling, loop-invariant dominating tests are moved out of the loop [1]. Doing so requires to change all control inputs coming from an independent test 'test' to data nodes inside the loop body to use the peeled version 'peeled_test' of 'test' instead [2]: the control test->data will be changed to peeled_test->data. If a data node is expensive then an assertion [3] checks the dominance relation of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but fails to find 'peeled_test' as part of the idom chain starting from the previous control input 'test' of that expensive node. The reason is that the idom information was not correctly set before at [4] after creating the peeled nodes. If 'head' (a CountedLoop node) has another OuterStripMinedLoop node, let's say 'outer_head', as an input then 'outer_head' is set as idom of 'head' instead of setting the idom of 'outer_head' to the new loop entry from the peeled iteration. We then miss all the idom information of the peeled iteration and the idom of 'outer_head' still points to the old loop entry node before peeling. The fix is straight forward to also account for loop strip mined loops when correcting the idom to the new loop entry from the peeled iteration at [4]. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 [2] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 [3] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 [4] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 From vladimir.x.ivanov at oracle.com Thu Dec 5 13:31:23 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 5 Dec 2019 16:31:23 +0300 Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in get_early_ctrl_for_expensive In-Reply-To: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> Message-ID: <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com> Looks good. Best regards, Vladimir Ivanov On 05.12.2019 16:22, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8229994 > http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/ > > As a part of loop peeling, loop-invariant dominating tests are moved out > of the loop [1]. Doing so requires to change all control inputs coming > from an independent test 'test' to data nodes inside the loop body to > use the peeled version 'peeled_test' of 'test' instead [2]: the control > test->data will be changed to peeled_test->data. If a data node is > expensive then an assertion [3] checks the dominance relation of 'test' > and 'peeled_test' ('peeled_test' should dominate 'test') but fails to > find 'peeled_test' as part of the idom chain starting from the previous > control input 'test' of that expensive node. > > The reason is that the idom information was not correctly set before at > [4] after creating the peeled nodes. If 'head' (a CountedLoop node) has > another OuterStripMinedLoop node, let's say 'outer_head', as an input > then 'outer_head' is set as idom of 'head' instead of setting the idom > of 'outer_head' to the new loop entry from the peeled iteration. We then > miss all the idom information of the peeled iteration and the idom of > 'outer_head' still points to the old loop entry node before peeling. > > The fix is straight forward to also account for loop strip mined loops > when correcting the idom to the new loop entry from the peeled iteration > at [4]. > > Thank you! > > Best regards, > Christian > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 > > [2] > http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 > > [3] > http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 > > [4] > http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 > From christian.hagedorn at oracle.com Thu Dec 5 13:39:56 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 5 Dec 2019 14:39:56 +0100 Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in get_early_ctrl_for_expensive In-Reply-To: <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com> References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> <16ea8963-0403-0365-d105-b7e6bf7b90a6@oracle.com> Message-ID: Thank you for your review Vladimir! Best regards, Christian On 05.12.19 14:31, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 05.12.2019 16:22, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8229994 >> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/ >> >> As a part of loop peeling, loop-invariant dominating tests are moved >> out of the loop [1]. Doing so requires to change all control inputs >> coming from an independent test 'test' to data nodes inside the loop >> body to use the peeled version 'peeled_test' of 'test' instead [2]: >> the control test->data will be changed to peeled_test->data. If a data >> node is expensive then an assertion [3] checks the dominance relation >> of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but >> fails to find 'peeled_test' as part of the idom chain starting from >> the previous control input 'test' of that expensive node. >> >> The reason is that the idom information was not correctly set before >> at [4] after creating the peeled nodes. If 'head' (a CountedLoop node) >> has another OuterStripMinedLoop node, let's say 'outer_head', as an >> input then 'outer_head' is set as idom of 'head' instead of setting >> the idom of 'outer_head' to the new loop entry from the peeled >> iteration. We then miss all the idom information of the peeled >> iteration and the idom of 'outer_head' still points to the old loop >> entry node before peeling. >> >> The fix is straight forward to also account for loop strip mined loops >> when correcting the idom to the new loop entry from the peeled >> iteration at [4]. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 >> >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 >> >> [4] >> http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 >> From christian.hagedorn at oracle.com Thu Dec 5 14:39:12 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 5 Dec 2019 15:39:12 +0100 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be Message-ID: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233032 http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ The problem of the bug can be traced back to finding the first and last memory state of a load when processing a load pack in SuperWord::co_locate_pack() [1]. One load of a load pack in the test case is dependent on the store node 's' which is sandwiched between other store nodes 'p1' and 'p2' of a store pack (p1 -> s -> p2). The other load of the load pack is dependent on the store node 'p2'. The sandwiched store node 's' swaps positions with 'p2' to move it out of the pack dependencies: p1 -> p2 -> s. However, the bb indices are not updated (bb_idx(s) < bb_idx(p2) is still true). Therefore, it sets last_mem to the memory state of the first load in the loop at [1]. As a result, the graph walk at [2] always starts following the input of the memory state of the first load (which should actually be the one of the last load) and will move beyond a loop phi as the stop condition is never met for a node having another memory state than the first one of the load pack. Eventually a bb index for a node outside of the loop is read resulting in the seen assertion failure. I suggest to use a solution to look up the first and last memory state that does not use bb indices as it probably is quite difficult to update them properly when moving sandwiched stores around. I've seen that Roland once proposed such a solution [3][4] which he then replaced by a version using bb indices (causing this bug). I adopted Roland's original patch into mine and propose to use it as a fix. Thank you! Best regards, Christian [1] https://hg.openjdk.java.net/jdk/jdk/file/5defda391e18/src/hotspot/share/opto/superword.cpp#l2258 [2] https://hg.openjdk.java.net/jdk/jdk/file/5defda391e18/src/hotspot/share/opto/superword.cpp#l2276 [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/028702.html [4] http://cr.openjdk.java.net/~roland/8201367/webrev.00/ From claes.redestad at oracle.com Thu Dec 5 14:59:16 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 5 Dec 2019 15:59:16 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: On 2019-12-05 11:26, Thomas St?fe wrote: > Hi Claes, > > this looks nice, thank you for your work. Only minor stuff remains, > mostly matters of taste. Feel free to ignore them. The patch is already > fine as it is for me. > > With your latest patch, AIX still works. Great! > > --- > > http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html > > Taste matter: I find the return type for count_leading_zeros as uint32_t > oddly specific. Instinctively I would have choosen int or unsigned. The > gcc buildins all return int. No particular reason. > > --- > > Question, why do we treat signed numbers of 8 and 16 bits different wrt > to values < 0? > > if you leave behavior as it is, could you please adjust the comment: > > ?// Return the number of leading zeros in x, e.g. the zero-based index > ?// of the most significant set bit in x.? Undefined for 0. > + // For signed numbers whose size is < 32bit, function will > + // return 0 for values < 0. Such a comment is a bit redundant since count_leading_zeros will return 0 for any negative input. Maybe we should spell that out more clearly, though. The special cases avoided issues with sign extension when upcasting, e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so __builtin_clz((int8_t)-1) returns 0, which then messes up the manual adjustment. But this can be further simplified by masking, avoiding an extra branch: template struct CountLeadingZerosImpl { static uint32_t doit(T v) { return __builtin_clz((uint32_t)v & 0xFF) - 24u; } }; template struct CountLeadingZerosImpl { static uint32_t doit(T v) { return __builtin_clz((uint32_t)v & 0xFFFF) - 16u; } }; > > ---- > > Taste matter: Windows, 64bit version for x86: You could call your own > count_leading_zeros, instead of calling _BitScanReverse. Sure. > > ---- > > http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html > > In the test for next_up, can we have, for completeness sake, a test > which tests that with any valid input pow2 the result is the next pow2? > > --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 > 02:28:58 2019 +0100 > +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 > 11:08:47 2019 +0100 > @@ -139,6 +139,11 @@ > ? ? ?EXPECT_EQ(pow2, next_power_of_2(pow2 - 1)) > ? ? ? ?<< "value = " << pow2 - 1; > ? ?} > + ?// next(pow2) should return pow2 * 2 > + ?for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) { > + ? ?EXPECT_EQ(pow2 * 2, next_power_of_2(pow2)) > + ? ? ?<< "value = " << pow2; > + ?} Fixed. I also ran into an issue with use of std::numeric_limit::max() causing obscure issues on some platforms at some call sites (*cough* Solaris), so I replaced those usages with a portable (but not very efficient) version of that in place for the few assert/test uses: http://cr.openjdk.java.net/~redestad/8234331/open.05 Testing: tier1-3, some ongoing /Claes From tobias.hartmann at oracle.com Thu Dec 5 15:16:51 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 5 Dec 2019 16:16:51 +0100 Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in get_early_ctrl_for_expensive In-Reply-To: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> Message-ID: Hi Christian, looks good to me but the CompileCommand argument in the @run statement of the test should come before the test class name (no new webrev required). Best regards, Tobias On 05.12.19 14:22, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8229994 > http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/ > > As a part of loop peeling, loop-invariant dominating tests are moved out of the loop [1]. Doing so > requires to change all control inputs coming from an independent test 'test' to data nodes inside > the loop body to use the peeled version 'peeled_test' of 'test' instead [2]: the control test->data > will be changed to peeled_test->data. If a data node is expensive then an assertion [3] checks the > dominance relation of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but fails to > find 'peeled_test' as part of the idom chain starting from the previous control input 'test' of that > expensive node. > > The reason is that the idom information was not correctly set before at [4] after creating the > peeled nodes. If 'head' (a CountedLoop node) has another OuterStripMinedLoop node, let's say > 'outer_head', as an input then 'outer_head' is set as idom of 'head' instead of setting the idom of > 'outer_head' to the new loop entry from the peeled iteration. We then miss all the idom information > of the peeled iteration and the idom of 'outer_head' still points to the old loop entry node before > peeling. > > The fix is straight forward to also account for loop strip mined loops when correcting the idom to > the new loop entry from the peeled iteration at [4]. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 > [2] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 > [3] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 > [4] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 From martin.doerr at sap.com Thu Dec 5 16:13:00 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 5 Dec 2019 16:13:00 +0000 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic In-Reply-To: References: Message-ID: Hi Vladimir, looks good to me. The j.l.Thread Oop for Thread::current() is allowed to live in a register with this change. GC will find it via OopMap at safepoints. So I think it's correct. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov > Sent: Donnerstag, 5. Dezember 2019 13:44 > To: hotspot compiler > Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in > Thread::currentThread() intrinsic > > http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8235143 > > Thread::currentThread() intrinsic doesn't need memory state: > though multiple threads can execute same code, "current thread" can't > change in the context of a single method activation. So, once it is > observed, it's safe to share among all users. > > One of the use cases which benefit a lot from such optimization is > ownership checks for thread confined resources (fast path check for > owner thread to avoid heavy-weight synchronization). > > The patch was part of foreign-memaccess branch in Project Panama and > showed good performance results on Memory Access API implementation > [1]. > > Testing: tier1-4 > > PS: the optimization should be disabled in Project Loom: the assumption > doesn't hold for continuations (in their current form). > > Best regards, > Vladimir Ivanov > > [1] https://openjdk.java.net/jeps/370 > JEP 370: Foreign-Memory Access API (Incubator) From thomas.stuefe at gmail.com Thu Dec 5 17:35:40 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 5 Dec 2019 18:35:40 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: Hi Claes, this looks all good to me. Thanks for your patience! I won't be able to test the latest iteration on AIX (just started vacation) but I am quite sure it will work since the delta to the last version is minimal; should it not we will fix it. Cheers, Thomas On Thu, Dec 5, 2019 at 3:56 PM Claes Redestad wrote: > On 2019-12-05 11:26, Thomas St?fe wrote: > > Hi Claes, > > > > this looks nice, thank you for your work. Only minor stuff remains, > > mostly matters of taste. Feel free to ignore them. The patch is already > > fine as it is for me. > > > > With your latest patch, AIX still works. > > Great! > > > > > --- > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html > > > > Taste matter: I find the return type for count_leading_zeros as uint32_t > > oddly specific. Instinctively I would have choosen int or unsigned. The > > gcc buildins all return int. > > No particular reason. > > > > > --- > > > > Question, why do we treat signed numbers of 8 and 16 bits different wrt > > to values < 0? > > > > if you leave behavior as it is, could you please adjust the comment: > > > > // Return the number of leading zeros in x, e.g. the zero-based index > > // of the most significant set bit in x. Undefined for 0. > > + // For signed numbers whose size is < 32bit, function will > > + // return 0 for values < 0. > > Such a comment is a bit redundant since count_leading_zeros will return > 0 for any negative input. Maybe we should spell that out more clearly, > though. > > The special cases avoided issues with sign extension when upcasting, > e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so > __builtin_clz((int8_t)-1) returns 0, which then messes up the manual > adjustment. But this can be further simplified by masking, avoiding > an extra branch: > > template struct CountLeadingZerosImpl { > static uint32_t doit(T v) { > return __builtin_clz((uint32_t)v & 0xFF) - 24u; > } > }; > > template struct CountLeadingZerosImpl { > static uint32_t doit(T v) { > return __builtin_clz((uint32_t)v & 0xFFFF) - 16u; > } > }; > > > > > ---- > > > > Taste matter: Windows, 64bit version for x86: You could call your own > > count_leading_zeros, instead of calling _BitScanReverse. > > Sure. > > > > > ---- > > > > > http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html > > > > In the test for next_up, can we have, for completeness sake, a test > > which tests that with any valid input pow2 the result is the next pow2? > > > > --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp Thu Dec 05 > > 02:28:58 2019 +0100 > > +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp Thu Dec 05 > > 11:08:47 2019 +0100 > > @@ -139,6 +139,11 @@ > > EXPECT_EQ(pow2, next_power_of_2(pow2 - 1)) > > << "value = " << pow2 - 1; > > } > > + // next(pow2) should return pow2 * 2 > > + for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) { > > + EXPECT_EQ(pow2 * 2, next_power_of_2(pow2)) > > + << "value = " << pow2; > > + } > > Fixed. > > I also ran into an issue with use of std::numeric_limit::max() > causing obscure issues on some platforms at some call sites (*cough* > Solaris), so I replaced those usages with a portable (but not very > efficient) version of that in place for the few assert/test uses: > > http://cr.openjdk.java.net/~redestad/8234331/open.05 > > Testing: tier1-3, some ongoing > > /Claes > From erik.osterlund at oracle.com Thu Dec 5 17:49:50 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 5 Dec 2019 18:49:50 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: Hi Claes, I wonder what issue you ran into with std::numeric_limit on Solaris. I would personally like to understand that before dismissing its use and rolling our own function instead. Note that std::numeric_limit does not accept CV qualified types - perhaps that is what you ran into. Perhaps if you pass in RemoveCv::type instead of T to that trait, things will work properly. If you really want to note use numeric_limit, then you might have to stick in some more sanity checks in that functions, such as checking that the input is integral, etc. And dynamically calculating the limit does seem unfortunate. In that case I'd rather we rolled our own numeric limit trait, but I would be surprised if that is indeed necessary. Other than that, I would just like to say I approve of having this stuff in a separate file, as it includes a whole bunch of metaprogramming utilities, which I would not like to be added from the #include "allTheRandomStuff" header of our JVM, that is globalDefinitions.hpp. Other than that, I think this looks good. Oh - not sure but I have a gut feeling you can remove a few includes for zUtils that were only previously used to get to the power of two utility, which is no longer necessary. Thanks, /Erik On 2019-12-05 15:59, Claes Redestad wrote: > On 2019-12-05 11:26, Thomas St?fe wrote: >> Hi Claes, >> >> this looks nice, thank you for your work. Only minor stuff remains, >> mostly matters of taste. Feel free to ignore them. The patch is >> already fine as it is for me. >> >> With your latest patch, AIX still works. > > Great! > >> >> --- >> >> http://cr.openjdk.java.net/~redestad/8234331/open.04/src/hotspot/share/utilities/count_leading_zeros.hpp.udiff.html >> >> >> Taste matter: I find the return type for count_leading_zeros as >> uint32_t oddly specific. Instinctively I would have choosen int or >> unsigned. The gcc buildins all return int. > > No particular reason. > >> >> --- >> >> Question, why do we treat signed numbers of 8 and 16 bits different >> wrt to values < 0? >> >> if you leave behavior as it is, could you please adjust the comment: >> >> ??// Return the number of leading zeros in x, e.g. the zero-based index >> ??// of the most significant set bit in x.? Undefined for 0. >> + // For signed numbers whose size is < 32bit, function will >> + // return 0 for values < 0. > > Such a comment is a bit redundant since count_leading_zeros will return > 0 for any negative input. Maybe we should spell that out more clearly, > though. > > The special cases avoided issues with sign extension when upcasting, > e.g, casting int8_t -1 to unsigned returns 0xFFFFFFFF, so > __builtin_clz((int8_t)-1) returns 0, which then messes up the manual > adjustment. But this can be further simplified by masking, avoiding > an extra branch: > > template struct CountLeadingZerosImpl { > ? static uint32_t doit(T v) { > ??? return __builtin_clz((uint32_t)v & 0xFF) - 24u; > ? } > }; > > template struct CountLeadingZerosImpl { > ? static uint32_t doit(T v) { > ??? return __builtin_clz((uint32_t)v & 0xFFFF) - 16u; > ? } > }; > >> >> ---- >> >> Taste matter: Windows, 64bit version for x86: You could call your own >> count_leading_zeros, instead of calling _BitScanReverse. > > Sure. > >> >> ---- >> >> http://cr.openjdk.java.net/~redestad/8234331/open.04/test/hotspot/gtest/utilities/test_powerOfTwo.cpp.html >> >> >> In the test for next_up, can we have, for completeness sake, a test >> which tests that with any valid input pow2 the result is the next pow2? >> >> --- a/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 >> 02:28:58 2019 +0100 >> +++ b/test/hotspot/gtest/utilities/test_powerOfTwo.cpp ?Thu Dec 05 >> 11:08:47 2019 +0100 >> @@ -139,6 +139,11 @@ >> ?? ? ?EXPECT_EQ(pow2, next_power_of_2(pow2 - 1)) >> ?? ? ? ?<< "value = " << pow2 - 1; >> ?? ?} >> + ?// next(pow2) should return pow2 * 2 >> + ?for (T pow2 = T(1); pow2 < t_max_pow2 / 2; pow2 = pow2 * 2) { >> + ? ?EXPECT_EQ(pow2 * 2, next_power_of_2(pow2)) >> + ? ? ?<< "value = " << pow2; >> + ?} > > Fixed. > > I also ran into an issue with use of std::numeric_limit::max() > causing obscure issues on some platforms at some call sites (*cough* > Solaris), so I replaced those usages with a portable (but not very > efficient) version of that in place for the few assert/test uses: > > http://cr.openjdk.java.net/~redestad/8234331/open.05 > > Testing: tier1-3, some ongoing > > /Claes From claes.redestad at oracle.com Thu Dec 5 18:04:32 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 5 Dec 2019 19:04:32 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> Message-ID: <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com> Hi Erik, thanks for looking at this. On 2019-12-05 18:49, Erik ?sterlund wrote: > Hi Claes, > > I wonder what issue you ran into with std::numeric_limit on Solaris. I > would personally like to understand that > before dismissing its use and rolling our own function instead. > Note that std::numeric_limit does not accept CV qualified types - > perhaps that is what you ran into. slowdebug failed to build with the following: [2019-12-05T01:34:22,502Z] Undefined first referenced [2019-12-05T01:34:22,502Z] symbol in file [2019-12-05T01:34:22,502Z] unsigned std::_Integer_limits::max() > > Perhaps if you pass in RemoveCv::type instead of T to that trait, > things will work properly. > If you really want to note use numeric_limit, then you might have to > stick in some more sanity checks in > that functions, such as checking that the input is integral, etc. And > dynamically calculating the limit > does seem unfortunate. In that case I'd rather we rolled our own numeric > limit trait, but I would be > surprised if that is indeed necessary. I can try the RemoveCv trick. This "private" max_value utility is not performance critical since it's only used for an assert and in tests, but yes it'd be much preferable if we could use std::numeric_limits<..> > > Other than that, I would just like to say I approve of having this stuff > in a separate file, as it includes > a whole bunch of metaprogramming utilities, which I would not like to be > added from the #include "allTheRandomStuff" > header of our JVM, that is globalDefinitions.hpp. Yes, in this particular case wedging these utilities into globalDefinitions is tricky. In fact impossible unless we also throw in count_leading_zeros.hpp there too. I do agree with others that some functions could be consolidated into bigger lumps. > > Other than that, I think this looks good. Oh - not sure but I have a gut > feeling you can remove a few includes > for zUtils that were only previously used to get to the power of two > utility, which is no longer necessary. Good point, I'll check. Thanks! /Claes From vladimir.kozlov at oracle.com Thu Dec 5 19:16:37 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 5 Dec 2019 11:16:37 -0800 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> Message-ID: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> CCing to GC group. Looks fine to me but someone from GC land have to look too. I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory through Unsafe API. Thanks, Vladimir On 11/29/19 7:42 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8226411 > > There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around > them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap > accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with > any on-heap accesses. > > Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL). > > It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. > > Testing: tier1-6. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8224182 From erik.osterlund at oracle.com Thu Dec 5 20:14:59 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Thu, 5 Dec 2019 21:14:59 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> Message-ID: Hi, Could we use the existing IN_NATIVE decorator instead of introducing a new decorator that seems to be an alias for the same thing? The decorator describing its use (IN_NATIVE) says it is for off-heap accesses pointing into the heap. We can just remove from the comment the part presuming it is a reference. What do you think? Thanks, /Erik > On 5 Dec 2019, at 20:16, Vladimir Kozlov wrote: > > ?CCing to GC group. > > Looks fine to me but someone from GC land have to look too. > > I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory through Unsafe API. > > Thanks, > Vladimir > >> On 11/29/19 7:42 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8226411 >> There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with any on-heap accesses. >> Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL). >> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. >> Testing: tier1-6. >> Best regards, >> Vladimir Ivanov >> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 From claes.redestad at oracle.com Thu Dec 5 22:26:54 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 5 Dec 2019 23:26:54 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com> References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com> <39475089-9933-f85a-8927-661bba44780c@oracle.com> <5fd925e4-c109-8c27-4af6-25a4976e5551@oracle.com> Message-ID: On 2019-12-05 19:04, Claes Redestad wrote: > On 2019-12-05 18:49, Erik ?sterlund wrote: >> Hi Claes, >> >> I wonder what issue you ran into with std::numeric_limit on Solaris. I >> would personally like to understand that >> before dismissing its use and rolling our own function instead. >> Note that std::numeric_limit does not accept CV qualified types - >> perhaps that is what you ran into. > > slowdebug failed to build with the following: > > [2019-12-05T01:34:22,502Z] Undefined??????????? first referenced > [2019-12-05T01:34:22,502Z]? symbol????????????????? in file > [2019-12-05T01:34:22,502Z] unsigned > std::_Integer_limits::max() > >> >> Perhaps if you pass in RemoveCv::type instead of T to that trait, >> things will work properly. >> If you really want to note use numeric_limit, then you might have to >> stick in some more sanity checks in >> that functions, such as checking that the input is integral, etc. And >> dynamically calculating the limit >> does seem unfortunate. In that case I'd rather we rolled our own >> numeric limit trait, but I would be >> surprised if that is indeed necessary. > > I can try the RemoveCv trick. This "private" max_value utility is not > performance critical since it's only used for an assert and in tests, > but yes it'd be much preferable if we could use std::numeric_limits<..> This doesn't fix the slowdebug build failure on Solaris. Ok if I file a follow-up bug to examine this in depth, and move ahead with the version here (open.05)? /Claes From christian.hagedorn at oracle.com Fri Dec 6 07:05:11 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 6 Dec 2019 08:05:11 +0100 Subject: [14] RFR(S): 8229994: assert(false) failed: Bad graph detected in get_early_ctrl_for_expensive In-Reply-To: References: <518e0ebe-a65c-a571-26c1-fa77d8c4ba82@oracle.com> Message-ID: Thank you Tobias for your review! I fixed it. Best regards, Christian On 05.12.19 16:16, Tobias Hartmann wrote: > Hi Christian, > > looks good to me but the CompileCommand argument in the @run statement of the test should come > before the test class name (no new webrev required). > > Best regards, > Tobias > > On 05.12.19 14:22, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8229994 >> http://cr.openjdk.java.net/~chagedorn/8229994/webrev.00/ >> >> As a part of loop peeling, loop-invariant dominating tests are moved out of the loop [1]. Doing so >> requires to change all control inputs coming from an independent test 'test' to data nodes inside >> the loop body to use the peeled version 'peeled_test' of 'test' instead [2]: the control test->data >> will be changed to peeled_test->data. If a data node is expensive then an assertion [3] checks the >> dominance relation of 'test' and 'peeled_test' ('peeled_test' should dominate 'test') but fails to >> find 'peeled_test' as part of the idom chain starting from the previous control input 'test' of that >> expensive node. >> >> The reason is that the idom information was not correctly set before at [4] after creating the >> peeled nodes. If 'head' (a CountedLoop node) has another OuterStripMinedLoop node, let's say >> 'outer_head', as an input then 'outer_head' is set as idom of 'head' instead of setting the idom of >> 'outer_head' to the new loop entry from the peeled iteration. We then miss all the idom information >> of the peeled iteration and the idom of 'outer_head' still points to the old loop entry node before >> peeling. >> >> The fix is straight forward to also account for loop strip mined loops when correcting the idom to >> the new loop entry from the peeled iteration at [4]. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l669 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopopts.cpp#l271 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopnode.cpp#l153 >> [4] http://hg.openjdk.java.net/jdk/jdk/file/99b71c5b02ff/src/hotspot/share/opto/loopTransform.cpp#l658 From navy.xliu at gmail.com Fri Dec 6 07:23:54 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Thu, 5 Dec 2019 23:23:54 -0800 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose Message-ID: Hi, Reviewers, Could you review this very simple bugfix for C1? JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ The root cause is some instructions are going to be eliminated, so they are not assigned to any valid bci. In present of -XX:+PrintIRDuringConstruction -XX:+Verbose, C1 will print them out and then hit the assert. Yes, I can twiddle graph_builder to assign right BCIs to them, but I would like to have a more robust InstructionPrinter::print_line. the CR will leave blanks in the position of bci. Eliminated store for object 0: . 0 a67 a58._24 := a54 (L) next Eliminated load: . 0 i35 a11._24 (I) position thanks, --lx From tobias.hartmann at oracle.com Fri Dec 6 08:24:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 6 Dec 2019 09:24:01 +0100 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: References: Message-ID: <7176592d-4784-0797-4457-463773464536@oracle.com> Hi Liu, your fix looks good to me but could you please add a regression test? Thanks, Tobias On 06.12.19 08:23, Liu Xin wrote: > Hi, Reviewers, > > Could you review this very simple bugfix for C1? > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > > The root cause is some instructions are going to be eliminated, so they are > not assigned to any valid bci. > In present of -XX:+PrintIRDuringConstruction -XX:+Verbose, C1 will print > them out and then hit the assert. > > Yes, I can twiddle graph_builder to assign right BCIs to them, but I would > like to have a more robust InstructionPrinter::print_line. the CR will > leave blanks in the position of bci. > Eliminated store for object 0: > . 0 a67 a58._24 := a54 (L) next > Eliminated load: > . 0 i35 a11._24 (I) position > > thanks, > --lx > From rahul.v.raghavan at oracle.com Fri Dec 6 14:20:58 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 6 Dec 2019 19:50:58 +0530 Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out Message-ID: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com> Hi, Please review the following fix changeset. - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/ # https://bugs.openjdk.java.net/browse/JDK-8233453 # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] Issue is intermittent timeout failures of the test; mostly with solaris-sparc and sometimes with windows-x64. The proposed fix here is to increase the timeout factor for the test. [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] - * @run main/othervm + * @run main/othervm/timeout=300 ....... - * @run main/othervm + * @run main/othervm/timeout=300 For some timeout failure cases of the test run, logs got generated with extra thread dump info. e.g.: one attached to JBS page - https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps. Also instead it seems as if the timeout factor for the test is less and that test just need more time to execute. Got no failures with repeat test run after above proposed fix, also supports this. Also checking the comments, fix done for old - - https://bugs.openjdk.java.net/browse/JDK-8212028 (Use run-test makefile framework for testing in Oracle's Mach5) - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254 Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests. It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of - https://bugs.openjdk.java.net/browse/JDK-8221393 ResolvedMethodTable too small for StackWalking applications (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.) Opened a separate task - JDK-8235485 - to check if this slowdown is expected. Not getting any timeout failure, for repeat test run with above fix proposal. Thanks, Rahul From vladimir.x.ivanov at oracle.com Fri Dec 6 15:19:58 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 6 Dec 2019 18:19:58 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> Message-ID: <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com> Hi Erik, I like your idea. Here's updated version: http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01 While browsing the code, I noticed that changes in G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar is used for oop case). But I decided to keep them to keep it (relatively) close to C2Access::needs_cpu_membar(). Best regards, Vladimir Ivanov On 05.12.2019 23:14, Erik ?sterlund wrote: > Hi, > > Could we use the existing IN_NATIVE decorator instead of introducing a new decorator that seems to be an alias for the same thing? The decorator describing its use (IN_NATIVE) says it is for off-heap accesses pointing into the heap. We can just remove from the comment the part presuming it is a reference. > > What do you think? > > Thanks, > /Erik > >> On 5 Dec 2019, at 20:16, Vladimir Kozlov wrote: >> >> ?CCing to GC group. >> >> Looks fine to me but someone from GC land have to look too. >> >> I wish we have more concrete indication for off-heap access instead of guessing it based on how we address memory through Unsafe API. >> >> Thanks, >> Vladimir >> >>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>> There were a number of fixes in C2 support for unsafe accesses recently which led to additional memory barriers around them. It improved stability, but in some cases it was redundant. One of important use cases which regressed is off-heap accesses [1]. The barriers around them are redundant because they are serialized on raw memory and don't intersect with any on-heap accesses. >>> Proposed fix skips memory barriers around unsafe accesses which are provably off-heap (base == NULL). >>> It (almost completely) recovers performance on the microbenchmark provided in JDK-8224182 [1]. >>> Testing: tier1-6. >>> Best regards, >>> Vladimir Ivanov >>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 > From vladimir.x.ivanov at oracle.com Fri Dec 6 15:27:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 6 Dec 2019 18:27:32 +0300 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic In-Reply-To: References: Message-ID: <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com> Thanks, Martin. > The j.l.Thread Oop for Thread::current() is allowed to live in a register with this change. > GC will find it via OopMap at safepoints. So I think it's correct. FTR nothing changes in that respect: with the patch it behaves as if a j.l.Thread is cached in a local after the first Thread::currentThread() call. But it can be done explicitly on bytecode level in a user code. It's not the case that j.l.Thread oop couldn't be alive across a safepoint before and now it can. Best regards, Vladimir Ivanov >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov >> Sent: Donnerstag, 5. Dezember 2019 13:44 >> To: hotspot compiler >> Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in >> Thread::currentThread() intrinsic >> >> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8235143 >> >> Thread::currentThread() intrinsic doesn't need memory state: >> though multiple threads can execute same code, "current thread" can't >> change in the context of a single method activation. So, once it is >> observed, it's safe to share among all users. >> >> One of the use cases which benefit a lot from such optimization is >> ownership checks for thread confined resources (fast path check for >> owner thread to avoid heavy-weight synchronization). >> >> The patch was part of foreign-memaccess branch in Project Panama and >> showed good performance results on Memory Access API implementation >> [1]. >> >> Testing: tier1-4 >> >> PS: the optimization should be disabled in Project Loom: the assumption >> doesn't hold for continuations (in their current form). >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://openjdk.java.net/jeps/370 >> JEP 370: Foreign-Memory Access API (Incubator) From martin.doerr at sap.com Fri Dec 6 17:04:38 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 6 Dec 2019 17:04:38 +0000 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic In-Reply-To: <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com> References: <959d9da3-ae64-446b-5fbf-3a871fbf3642@oracle.com> Message-ID: Hi Vladimir, thanks for your additional comments. > It's not the case that j.l.Thread oop couldn't be alive across a > safepoint before and now it can. Right, but the j.l.Thread oop loads can move across safepoints, now. So they can possibly become additional GC roots in registers at safepoints. That should be fine. Best regards, Martin > -----Original Message----- > From: Vladimir Ivanov > Sent: Freitag, 6. Dezember 2019 16:28 > To: Doerr, Martin ; hotspot compiler compiler-dev at openjdk.java.net> > Subject: Re: [14] RFR (XXS): 8235143: C2: No memory state needed in > Thread::currentThread() intrinsic > > Thanks, Martin. > > > The j.l.Thread Oop for Thread::current() is allowed to live in a register with > this change. > > GC will find it via OopMap at safepoints. So I think it's correct. > > FTR nothing changes in that respect: with the patch it behaves as if a > j.l.Thread is cached in a local after the first Thread::currentThread() > call. But it can be done explicitly on bytecode level in a user code. > It's not the case that j.l.Thread oop couldn't be alive across a > safepoint before and now it can. > > Best regards, > Vladimir Ivanov > > >> -----Original Message----- > >> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Vladimir Ivanov > >> Sent: Donnerstag, 5. Dezember 2019 13:44 > >> To: hotspot compiler > >> Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in > >> Thread::currentThread() intrinsic > >> > >> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ > >> https://bugs.openjdk.java.net/browse/JDK-8235143 > >> > >> Thread::currentThread() intrinsic doesn't need memory state: > >> though multiple threads can execute same code, "current thread" can't > >> change in the context of a single method activation. So, once it is > >> observed, it's safe to share among all users. > >> > >> One of the use cases which benefit a lot from such optimization is > >> ownership checks for thread confined resources (fast path check for > >> owner thread to avoid heavy-weight synchronization). > >> > >> The patch was part of foreign-memaccess branch in Project Panama and > >> showed good performance results on Memory Access API > implementation > >> [1]. > >> > >> Testing: tier1-4 > >> > >> PS: the optimization should be disabled in Project Loom: the assumption > >> doesn't hold for continuations (in their current form). > >> > >> Best regards, > >> Vladimir Ivanov > >> > >> [1] https://openjdk.java.net/jeps/370 > >> JEP 370: Foreign-Memory Access API (Incubator) From erik.osterlund at oracle.com Fri Dec 6 17:10:37 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Fri, 6 Dec 2019 18:10:37 +0100 Subject: RFR: 8234331: Add robust and optimized utility for rounding up to next power of two+ In-Reply-To: References: Message-ID: <215D2CA3-AFE2-4943-A483-809BBAA514CB@oracle.com> Hi Claes, Sure. Looks good! Thanks, /Erik > On 5 Dec 2019, at 23:24, Claes Redestad wrote: > > ? > >> On 2019-12-05 19:04, Claes Redestad wrote: >>> On 2019-12-05 18:49, Erik ?sterlund wrote: >>> Hi Claes, >>> >>> I wonder what issue you ran into with std::numeric_limit on Solaris. I would personally like to understand that >>> before dismissing its use and rolling our own function instead. >>> Note that std::numeric_limit does not accept CV qualified types - perhaps that is what you ran into. >> slowdebug failed to build with the following: >> [2019-12-05T01:34:22,502Z] Undefined first referenced >> [2019-12-05T01:34:22,502Z] symbol in file >> [2019-12-05T01:34:22,502Z] unsigned >> std::_Integer_limits::max() >>> >>> Perhaps if you pass in RemoveCv::type instead of T to that trait, things will work properly. >>> If you really want to note use numeric_limit, then you might have to stick in some more sanity checks in >>> that functions, such as checking that the input is integral, etc. And dynamically calculating the limit >>> does seem unfortunate. In that case I'd rather we rolled our own numeric limit trait, but I would be >>> surprised if that is indeed necessary. >> I can try the RemoveCv trick. This "private" max_value utility is not >> performance critical since it's only used for an assert and in tests, >> but yes it'd be much preferable if we could use std::numeric_limits<..> > > This doesn't fix the slowdebug build failure on Solaris. > > Ok if I file a follow-up bug to examine this in depth, and move ahead with the version here (open.05)? > > /Claes From igor.ignatyev at oracle.com Fri Dec 6 17:12:07 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 6 Dec 2019 09:12:07 -0800 Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out In-Reply-To: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com> References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com> Message-ID: <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com> Hi Rahul, the fix sound reasonable to me. Thanks, Igor > On Dec 6, 2019, at 6:20 AM, Rahul Raghavan wrote: > > Hi, > > Please review the following fix changeset. > > - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/ > > # https://bugs.openjdk.java.net/browse/JDK-8233453 > # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] > > Issue is intermittent timeout failures of the test; > mostly with solaris-sparc and sometimes with windows-x64. > > The proposed fix here is to increase the timeout factor for the test. > [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] > - * @run main/othervm > + * @run main/othervm/timeout=300 > ....... > - * @run main/othervm > + * @run main/othervm/timeout=300 > > For some timeout failure cases of the test run, logs got generated with extra thread dump info. > e.g.: one attached to JBS page - > https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr > But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps. > Also instead it seems as if the timeout factor for the test is less > and that test just need more time to execute. > Got no failures with repeat test run after above proposed fix, also supports this. > > > Also checking the comments, fix done for old - > - https://bugs.openjdk.java.net/browse/JDK-8212028 > (Use run-test makefile framework for testing in Oracle's Mach5) > - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254 > Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests. > > > It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of - > https://bugs.openjdk.java.net/browse/JDK-8221393 > ResolvedMethodTable too small for StackWalking applications > (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.) > Opened a separate task - JDK-8235485 - to check if this slowdown is expected. > > > Not getting any timeout failure, for repeat test run with above fix proposal. > > > Thanks, > Rahul From tobias.hartmann at oracle.com Fri Dec 6 17:23:09 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 6 Dec 2019 18:23:09 +0100 Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out In-Reply-To: <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com> References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com> <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com> Message-ID: <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com> +1 Best regards, Tobias On 06.12.19 18:12, Igor Ignatyev wrote: > Hi Rahul, > > the fix sound reasonable to me. > > Thanks, > Igor > >> On Dec 6, 2019, at 6:20 AM, Rahul Raghavan wrote: >> >> Hi, >> >> Please review the following fix changeset. >> >> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/ >> >> # https://bugs.openjdk.java.net/browse/JDK-8233453 >> # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] >> >> Issue is intermittent timeout failures of the test; >> mostly with solaris-sparc and sometimes with windows-x64. >> >> The proposed fix here is to increase the timeout factor for the test. >> [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] >> - * @run main/othervm >> + * @run main/othervm/timeout=300 >> ....... >> - * @run main/othervm >> + * @run main/othervm/timeout=300 >> >> For some timeout failure cases of the test run, logs got generated with extra thread dump info. >> e.g.: one attached to JBS page - >> https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr >> But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps. >> Also instead it seems as if the timeout factor for the test is less >> and that test just need more time to execute. >> Got no failures with repeat test run after above proposed fix, also supports this. >> >> >> Also checking the comments, fix done for old - >> - https://bugs.openjdk.java.net/browse/JDK-8212028 >> (Use run-test makefile framework for testing in Oracle's Mach5) >> - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254 >> Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests. >> >> >> It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of - >> https://bugs.openjdk.java.net/browse/JDK-8221393 >> ResolvedMethodTable too small for StackWalking applications >> (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.) >> Opened a separate task - JDK-8235485 - to check if this slowdown is expected. >> >> >> Not getting any timeout failure, for repeat test run with above fix proposal. >> >> >> Thanks, >> Rahul > From vladimir.kozlov at oracle.com Fri Dec 6 19:23:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Dec 2019 11:23:20 -0800 Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the original Method Message-ID: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8235438 This fix is prepared by Tom R. "The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI version of this code should be converted to use the same scheme." I tested tier1-2 and tier3-4-graal. All clean. Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8140685 From sandhya.viswanathan at intel.com Fri Dec 6 19:45:56 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 6 Dec 2019 19:45:56 +0000 Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 Message-ID: The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq. An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes. JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/ Please review and approve. Best Regards, Sandhya From vladimir.kozlov at oracle.com Fri Dec 6 20:04:26 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Dec 2019 12:04:26 -0800 Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 In-Reply-To: References: Message-ID: Hi Sandhya, Ii is confusing after looking on this and 8200067 changes [1]. CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage. I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion. Is vpclmulqdq also affected? Thanks, Vladimir [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224 On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote: > The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq. > An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > From coleen.phillimore at oracle.com Fri Dec 6 20:11:19 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 6 Dec 2019 15:11:19 -0500 Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the original Method In-Reply-To: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> Message-ID: This looks good.? I'm glad to see less places that have to call version_matches(), and this awkward handling for redefinition. CDS should probably zero version after it loads the class out of the archive too, which I assume would have fixed it.? But I like the new version of the code better than the old. Also, one small nit, which made reviewing this in frames a pain: Can you split these long lines? 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle java_class, InstanceKlass* holder, int version, 2723 const methodHandle& method, int bci, 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) { 2746 void java_lang_StackTraceElement::decode(const methodHandle& method, int bci, Symbol*& methodname, Symbol*& filename, int& line_number, TRAPS) { I don't see why the caller of java_lang_StackTraceElement can't get the methodname itself, and save this one output parameter. thanks, Coleen On 12/6/19 2:23 PM, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8235438/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8235438 > > This fix is prepared by Tom R. > > "The JDK14 version of StackTraceElement::decode is based on the JDK8 > code which contains mixed usages of method->constants() and > method->method_holder()->constants() assuming they point to the same > thing. In the case of anonymous methods this isn't true. Usually this > isn't a problem but if CDS is enabled the the version flag of > method->method_holder()->constants() is non-zero but version of of > method->constants() is 0 which causes the code to switch constants > pools and it reads garbage. JDK-8140685 [1] refactored this code to > remove this logic and the JVMCI version of this code should be > converted to use the same scheme." > > I tested tier1-2 and tier3-4-graal. All clean. > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8140685 From vladimir.kozlov at oracle.com Fri Dec 6 21:42:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Dec 2019 13:42:21 -0800 Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the original Method In-Reply-To: References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> Message-ID: <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com> Thank you, Coleen On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote: > This looks good.? I'm glad to see less places that have to call version_matches(), and this awkward handling for > redefinition. > > CDS should probably zero version after it loads the class out of the archive too, which I assume would have fixed it. > But I like the new version of the code better than the old. > > Also, one small nit, which made reviewing this in frames a pain: Can you split these long lines? > > 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle java_class, InstanceKlass* holder, int version, > 2723 const methodHandle& method, int bci, > 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) { > > 2746 void java_lang_StackTraceElement::decode(const methodHandle& method, int bci, Symbol*& methodname, Symbol*& > filename, int& line_number, TRAPS) { Done. > > I don't see why the caller of java_lang_StackTraceElement can't get the methodname itself, and save this one output > parameter. Good suggestion. https://cr.openjdk.java.net/~kvn/8235438/webrev.01/ Thanks, Vladimir > > thanks, > Coleen > > > On 12/6/19 2:23 PM, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8235438 >> >> This fix is prepared by Tom R. >> >> "The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of >> method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of >> anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of >> method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to >> switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI >> version of this code should be converted to use the same scheme." >> >> I tested tier1-2 and tier3-4-graal. All clean. >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8140685 > From coleen.phillimore at oracle.com Fri Dec 6 21:47:07 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 6 Dec 2019 16:47:07 -0500 Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the original Method In-Reply-To: <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com> References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com> Message-ID: <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com> On 12/6/19 4:42 PM, Vladimir Kozlov wrote: > Thank you, Coleen > > On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote: >> This looks good.? I'm glad to see less places that have to call >> version_matches(), and this awkward handling for redefinition. >> >> CDS should probably zero version after it loads the class out of the >> archive too, which I assume would have fixed it.? But I like the new >> version of the code better than the old. >> >> Also, one small nit, which made reviewing this in frames a pain: Can >> you split these long lines? >> >> 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle >> java_class, InstanceKlass* holder, int version, >> 2723 const methodHandle& method, int bci, >> 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) { >> >> 2746 void java_lang_StackTraceElement::decode(const methodHandle& >> method, int bci, Symbol*& methodname, Symbol*& filename, int& >> line_number, TRAPS) { > > Done. > >> >> I don't see why the caller of java_lang_StackTraceElement can't get >> the methodname itself, and save this one output parameter. > > Good suggestion. > > https://cr.openjdk.java.net/~kvn/8235438/webrev.01/ Yes, that makes is clear which functions want which output parameters. Looks good - thanks, Coleen > > Thanks, > Vladimir > >> >> thanks, >> Coleen >> >> >> On 12/6/19 2:23 PM, Vladimir Kozlov wrote: >>> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8235438 >>> >>> This fix is prepared by Tom R. >>> >>> "The JDK14 version of StackTraceElement::decode is based on the JDK8 >>> code which contains mixed usages of method->constants() and >>> method->method_holder()->constants() assuming they point to the same >>> thing. In the case of anonymous methods this isn't true. Usually >>> this isn't a problem but if CDS is enabled the the version flag of >>> method->method_holder()->constants() is non-zero but version of of >>> method->constants() is 0 which causes the code to switch constants >>> pools and it reads garbage. JDK-8140685 [1] refactored this code to >>> remove this logic and the JVMCI version of this code should be >>> converted to use the same scheme." >>> >>> I tested tier1-2 and tier3-4-graal. All clean. >>> >>> Thanks, >>> Vladimir >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8140685 >> From vladimir.kozlov at oracle.com Fri Dec 6 21:58:17 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Dec 2019 13:58:17 -0800 Subject: [14] RFR(S) 8235438: [JVMCI] StackTraceElement::decode should use the original Method In-Reply-To: <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com> References: <4d81b37a-e02b-c338-9a4d-c603d2505eff@oracle.com> <44c651fd-4ee6-9e17-cfbd-6cd90f56d663@oracle.com> <898a6ab3-63ed-8884-73dd-425358bcb53f@oracle.com> Message-ID: <9cb378b7-129c-2625-c754-c381e3ae7b91@oracle.com> Thank you, Coleen Vladimir On 12/6/19 1:47 PM, coleen.phillimore at oracle.com wrote: > > > On 12/6/19 4:42 PM, Vladimir Kozlov wrote: >> Thank you, Coleen >> >> On 12/6/19 12:11 PM, coleen.phillimore at oracle.com wrote: >>> This looks good.? I'm glad to see less places that have to call version_matches(), and this awkward handling for >>> redefinition. >>> >>> CDS should probably zero version after it loads the class out of the archive too, which I assume would have fixed >>> it.? But I like the new version of the code better than the old. >>> >>> Also, one small nit, which made reviewing this in frames a pain: Can you split these long lines? >>> >>> 2722 void java_lang_StackTraceElement::decode_file_and_line(Handle java_class, InstanceKlass* holder, int version, >>> 2723 const methodHandle& method, int bci, >>> 2724 Symbol*& source, oop& source_file, int& line_number, TRAPS) { >>> >>> 2746 void java_lang_StackTraceElement::decode(const methodHandle& method, int bci, Symbol*& methodname, Symbol*& >>> filename, int& line_number, TRAPS) { >> >> Done. >> >>> >>> I don't see why the caller of java_lang_StackTraceElement can't get the methodname itself, and save this one output >>> parameter. >> >> Good suggestion. >> >> https://cr.openjdk.java.net/~kvn/8235438/webrev.01/ > > Yes, that makes is clear which functions want which output parameters. > > Looks good - thanks, > Coleen >> >> Thanks, >> Vladimir >> >>> >>> thanks, >>> Coleen >>> >>> >>> On 12/6/19 2:23 PM, Vladimir Kozlov wrote: >>>> https://cr.openjdk.java.net/~kvn/8235438/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8235438 >>>> >>>> This fix is prepared by Tom R. >>>> >>>> "The JDK14 version of StackTraceElement::decode is based on the JDK8 code which contains mixed usages of >>>> method->constants() and method->method_holder()->constants() assuming they point to the same thing. In the case of >>>> anonymous methods this isn't true. Usually this isn't a problem but if CDS is enabled the the version flag of >>>> method->method_holder()->constants() is non-zero but version of of method->constants() is 0 which causes the code to >>>> switch constants pools and it reads garbage. JDK-8140685 [1] refactored this code to remove this logic and the JVMCI >>>> version of this code should be converted to use the same scheme." >>>> >>>> I tested tier1-2 and tier3-4-graal. All clean. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8140685 >>> > From sandhya.viswanathan at intel.com Fri Dec 6 22:05:16 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 6 Dec 2019 22:05:16 +0000 Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 In-Reply-To: References: Message-ID: Hi Vladimir, It is only AVX512 evpclmulqdq() based CRC32 that is affected. In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq(). JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/ Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Friday, December 06, 2019 12:04 PM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 Hi Sandhya, Ii is confusing after looking on this and 8200067 changes [1]. CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage. I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion. Is vpclmulqdq also affected? Thanks, Vladimir [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224 On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote: > The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq. > An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/ > > Please review and approve. > > Best Regards, > Sandhya > From vladimir.kozlov at oracle.com Fri Dec 6 22:41:16 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Dec 2019 14:41:16 -0800 Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 In-Reply-To: References: Message-ID: <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com> Good. Do you want to backport this into 11u too? Thanks, Vladimir On 12/6/19 2:05 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > It is only AVX512 evpclmulqdq() based CRC32 that is affected. > In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq(). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 > Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/ > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, December 06, 2019 12:04 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 > > Hi Sandhya, > > Ii is confusing after looking on this and 8200067 changes [1]. > CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage. > I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion. > > Is vpclmulqdq also affected? > > Thanks, > Vladimir > > [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ > [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224 > > On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote: >> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq. >> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/ >> >> Please review and approve. >> >> Best Regards, >> Sandhya >> From sandhya.viswanathan at intel.com Fri Dec 6 23:02:10 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 6 Dec 2019 23:02:10 +0000 Subject: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 In-Reply-To: <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com> References: <5b72ed1e-a723-78cc-e836-758aa8481ae2@oracle.com> Message-ID: Yes I want to backport this to 11u too. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Friday, December 06, 2019 2:41 PM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 Good. Do you want to backport this into 11u too? Thanks, Vladimir On 12/6/19 2:05 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > It is only AVX512 evpclmulqdq() based CRC32 that is affected. > In the updated webrev, I have changed the supports_vpclmulqdq() to supports_avx512_vpclmulqdq(). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 > Updated webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.01/ > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, December 06, 2019 12:04 PM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: RFR(XXS): 8235510 : java.util.zip.CRC32 performance drop after 8200067 > > Hi Sandhya, > > Ii is confusing after looking on this and 8200067 changes [1]. > CPUID bit we check and supports_vpclmulqdq() function use 'vpclmulqdq' name but there is already such old AVX instruction [2] which is guarded by supports_clmul(). What actually 8200067 added and you are removing is evpclmulqdq instruction usage. > I think you should do renaming of CPUID bit and supports_vpclmulqdq() function to reflect that and avoid confusion. > > Is vpclmulqdq also affected? > > Thanks, > Vladimir > > [1] http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/ > [2] http://hg.openjdk.java.net/jdk/jdk/file/1498cd1c98ad/src/hotspot/cpu/x86/assembler_x86.cpp#l7224 > > On 12/6/19 11:45 AM, Viswanathan, Sandhya wrote: >> The changes introduced in 8200067 cause performance to drop for java.util.zip.CRC32 on Intel platforms supporting vector pclmulqdq. >> An enhanced algorithm is needed which takes into account latency and throughput of vector pclmulqdq. For now, we need to backout the changes. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8235510 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8235510/webrev.00/ >> >> Please review and approve. >> >> Best Regards, >> Sandhya >> From john.r.rose at oracle.com Sat Dec 7 01:42:24 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 6 Dec 2019 17:42:24 -0800 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic In-Reply-To: References: Message-ID: Looks good, although it?s a little more complicated to read. It makes me wonder what?s wrong with the various overloadings of GraphKit::make_load, that you have to open-code a special call. If this happens a lot, we should try to figure out a new way to call GK::make_load. I would have thought there would be an adr_idx value for C->immutable_memory! Feeding from immutable_memory will probably be a good thing to do in more circumstances as we get more reliably immutable data. ? John > On Dec 5, 2019, at 4:43 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8235143 > > Thread::currentThread() intrinsic doesn't need memory state: > though multiple threads can execute same code, "current thread" can't change in the context of a single method activation. So, once it is observed, it's safe to share among all users. > > One of the use cases which benefit a lot from such optimization is ownership checks for thread confined resources (fast path check for owner thread to avoid heavy-weight synchronization). > > The patch was part of foreign-memaccess branch in Project Panama and showed good performance results on Memory Access API implementation [1]. > > Testing: tier1-4 > > PS: the optimization should be disabled in Project Loom: the assumption doesn't hold for continuations (in their current form). > > Best regards, > Vladimir Ivanov > > [1] https://openjdk.java.net/jeps/370 > JEP 370: Foreign-Memory Access API (Incubator) From john.r.rose at oracle.com Sat Dec 7 02:09:24 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 6 Dec 2019 18:09:24 -0800 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> Message-ID: <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com> On Nov 29, 2019, at 7:42 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8226411 That looks good to me. I wonder if there is a similar opportunity in LibraryCallKit::inline_unsafe_load_store or inline_vector_mem_operation or inline_unsafe_copyMemory. All of those also form unsafe addresses, and at least some seem to mention NULL_PTR or IN_HEAP. Is it worth an explanatory comment or a tracking bug? ? John From navy.xliu at gmail.com Sat Dec 7 04:26:16 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Fri, 6 Dec 2019 20:26:16 -0800 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: <7176592d-4784-0797-4457-463773464536@oracle.com> References: <7176592d-4784-0797-4457-463773464536@oracle.com> Message-ID: hi, Tobias, Thank you for reviewing it. I add a regression test about it. Could you take a look? https://cr.openjdk.java.net/~xliu/8235383/01/webrev/ thanks, --lx On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann wrote: > Hi Liu, > > your fix looks good to me but could you please add a regression test? > > Thanks, > Tobias > > On 06.12.19 08:23, Liu Xin wrote: > > Hi, Reviewers, > > > > Could you review this very simple bugfix for C1? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > > > > The root cause is some instructions are going to be eliminated, so they > are > > not assigned to any valid bci. > > In present of -XX:+PrintIRDuringConstruction -XX:+Verbose, C1 will > print > > them out and then hit the assert. > > > > Yes, I can twiddle graph_builder to assign right BCIs to them, but I > would > > like to have a more robust InstructionPrinter::print_line. the CR will > > leave blanks in the position of bci. > > Eliminated store for object 0: > > . 0 a67 a58._24 := a54 (L) next > > Eliminated load: > > . 0 i35 a11._24 (I) position > > > > thanks, > > --lx > > > From claes.redestad at oracle.com Sun Dec 8 21:44:53 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Sun, 8 Dec 2019 22:44:53 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel Message-ID: Hi, increasing MaxInlineLevel can substantially improve performance in some benchmarks[1], and has been reported to help applications implemented in scala in particular. There is always some risk of regressions when tweaking the default inlining settings. I've done a number of experiments to ascertain that the effect of increasing this on a wide array of benchmarks. With 15 all benchmarks tested are show either neutral or positive results, with no observed regression w.r.t. compilation speed or code cache usage. Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ Thanks! /Claes [1] One http://renaissance.dev sub-benchmark improve by almost 3x with an increase from 9 to 15. From navy.xliu at gmail.com Sun Dec 8 23:23:21 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Sun, 8 Dec 2019 15:23:21 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: Hi, Claes, If you increase MaxInlineLevel from 9 to 15, don't you need adjust initial and maximal size of CodeCache? Could you elaborate 3x sub-benchmark? Is it java or scala? thanks, --lx On Sun, Dec 8, 2019 at 1:42 PM Claes Redestad wrote: > Hi, > > increasing MaxInlineLevel can substantially improve performance in some > benchmarks[1], and has been reported to help applications implemented in > scala in particular. > > There is always some risk of regressions when tweaking the default > inlining settings. I've done a number of experiments to ascertain that > the effect of increasing this on a wide array of benchmarks. With 15 all > benchmarks tested are show either neutral or positive results, with no > observed regression w.r.t. compilation speed or code cache usage. > > Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ > > Thanks! > > /Claes > > [1] One http://renaissance.dev sub-benchmark improve by almost 3x with > an increase from 9 to 15. > From claes.redestad at oracle.com Mon Dec 9 00:22:04 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 9 Dec 2019 01:22:04 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com> Hi lx, On 2019-12-09 00:23, Liu Xin wrote: > Hi, Claes, > > If you increase MaxInlineLevel from 9 to 15, don't you need adjust > initial and maximal size of CodeCache? Not necessarily: yes, more aggressive inlining may lead to larger nmethods and less sharing overall, which *could* mean we need more code cache space over time. But it may also mean more aggressive dead code elimination, which can mean a net reduction in code cache utilization for some applications. For the applications and benchmarks we've tested, code cache utilization is largely the same. And while we can't rule out that some applications may see regressions, data strongly suggests a more aggressive setting will be better on average for most. > Could you elaborate?3x sub-benchmark? Is it java or scala? IIRC scala-kmeans was the sub-benchmark that saw the largest improvement. /Claes From vladimir.kozlov at oracle.com Mon Dec 9 00:52:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 8 Dec 2019 16:52:42 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> Nice finding! Good. Thanks, Vladimir On 12/8/19 1:44 PM, Claes Redestad wrote: > Hi, > > increasing MaxInlineLevel can substantially improve performance in some > benchmarks[1], and has been reported to help applications implemented in > scala in particular. > > There is always some risk of regressions when tweaking the default > inlining settings. I've done a number of experiments to ascertain that > the effect of increasing this on a wide array of benchmarks. With 15 all > benchmarks tested are show either neutral or positive results, with no > observed regression w.r.t. compilation speed or code cache usage. > > Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ > > Thanks! > > /Claes > > [1] One http://renaissance.dev sub-benchmark improve by almost 3x with > an increase from 9 to 15. From navy.xliu at gmail.com Mon Dec 9 05:19:19 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Sun, 8 Dec 2019 21:19:19 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com> References: <52e0eae4-1df4-bcee-af59-c889d0d25f53@oracle.com> Message-ID: hi, Claes, Glad to know that bigger scope for optimizations can offset the bigger methods. Thank you for the clarification. thanks --lx On Sun, Dec 8, 2019 at 4:19 PM Claes Redestad wrote: > Hi lx, > > On 2019-12-09 00:23, Liu Xin wrote: > > Hi, Claes, > > > > If you increase MaxInlineLevel from 9 to 15, don't you need adjust > > initial and maximal size of CodeCache? > > Not necessarily: yes, more aggressive inlining may lead to larger > nmethods and less sharing overall, which *could* mean we need more code > cache space over time. But it may also mean more aggressive dead code > elimination, which can mean a net reduction in code cache utilization > for some applications. > > For the applications and benchmarks we've tested, code cache utilization > is largely the same. And while we can't rule out that some applications > may see regressions, data strongly suggests a more aggressive setting > will be better on average for most. > > > Could you elaborate 3x sub-benchmark? Is it java or scala? > > IIRC scala-kmeans was the sub-benchmark that saw the largest > improvement. > > /Claes > From tobias.hartmann at oracle.com Mon Dec 9 07:15:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 9 Dec 2019 08:15:22 +0100 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: References: <7176592d-4784-0797-4457-463773464536@oracle.com> Message-ID: <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com> Hi Liu, thanks for adding the test. We try to avoid bug ids as test names. In this case, I would suggest something like TestPrintIRDuringConstruction with a compiler.c1 package declaration. Also, the first line of the copyright header looks wrong (should look like this [1]). line 27-29: I don't think you need these lines line 42: "initiail" -> "initial" Thanks, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java On 07.12.19 05:26, Liu Xin wrote: > hi, Tobias,? > > Thank you for reviewing it. I add a regression test about it. Could you take a look? > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/ > > thanks, > > --lx > > > > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann > wrote: > > Hi Liu, > > your fix looks good to me but could you please add a regression test? > > Thanks, > Tobias > > On 06.12.19 08:23, Liu Xin wrote: > > Hi, Reviewers, > > > > Could you review this very simple bugfix for C1? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > > > > The root cause is some instructions are going to be eliminated, so they are > > not assigned to any valid bci. > > In present of? -XX:+PrintIRDuringConstruction -XX:+Verbose,? C1 will print > > them out and then hit the assert. > > > > Yes, I can twiddle graph_builder to assign right BCIs to them,? but I would > > like to have a more robust InstructionPrinter::print_line. the CR will > > leave blanks in the position of bci. > > Eliminated store for object 0: > > .? ? ? 0? ? a67? ? a58._24 := a54 (L) next > > Eliminated load: > > .? ? ? 0? ? i35? ? a11._24 (I) position > > > > thanks, > > --lx > > > From navy.xliu at gmail.com Mon Dec 9 08:17:41 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Mon, 9 Dec 2019 00:17:41 -0800 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com> References: <7176592d-4784-0797-4457-463773464536@oracle.com> <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com> Message-ID: Hi, Tobias, Thanks for your feedback. Here is the new webrev. I update what you pointed out. https://cr.openjdk.java.net/~xliu/8235383/02/webrev/ The patch passed hotspot-tier1 for both fastdebug and release builds. thanks, --lx On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann wrote: > Hi Liu, > > thanks for adding the test. > > We try to avoid bug ids as test names. In this case, I would suggest > something like > TestPrintIRDuringConstruction with a compiler.c1 package declaration. > Also, the first line of the > copyright header looks wrong (should look like this [1]). > > line 27-29: I don't think you need these lines > line 42: "initiail" -> "initial" > > Thanks, > Tobias > > [1] > > http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java > > On 07.12.19 05:26, Liu Xin wrote: > > hi, Tobias, > > > > Thank you for reviewing it. I add a regression test about it. Could you > take a look? > > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/ > > > > thanks, > > > > --lx > > > > > > > > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann < > tobias.hartmann at oracle.com > > > wrote: > > > > Hi Liu, > > > > your fix looks good to me but could you please add a regression test? > > > > Thanks, > > Tobias > > > > On 06.12.19 08:23, Liu Xin wrote: > > > Hi, Reviewers, > > > > > > Could you review this very simple bugfix for C1? > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > > > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > > > > > > The root cause is some instructions are going to be eliminated, so > they are > > > not assigned to any valid bci. > > > In present of -XX:+PrintIRDuringConstruction -XX:+Verbose, C1 > will print > > > them out and then hit the assert. > > > > > > Yes, I can twiddle graph_builder to assign right BCIs to them, > but I would > > > like to have a more robust InstructionPrinter::print_line. the CR > will > > > leave blanks in the position of bci. > > > Eliminated store for object 0: > > > . 0 a67 a58._24 := a54 (L) next > > > Eliminated load: > > > . 0 i35 a11._24 (I) position > > > > > > thanks, > > > --lx > > > > > > From rahul.v.raghavan at oracle.com Mon Dec 9 08:38:03 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Mon, 9 Dec 2019 14:08:03 +0530 Subject: [14]RFR: 8233453: MLVM deoptimize stress test timed out In-Reply-To: <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com> References: <3322718b-5097-d418-e00d-2ca4593136bb@oracle.com> <2B01486C-0BBC-48DB-BBA0-D2CD67D5A1DD@oracle.com> <6a887d8a-8843-bd21-857d-1d41531738aa@oracle.com> Message-ID: Thank Igor, Tobias. On 06/12/19 10:53 pm, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 06.12.19 18:12, Igor Ignatyev wrote: >> Hi Rahul, >> >> the fix sound reasonable to me. >> >> Thanks, >> Igor >> >>> On Dec 6, 2019, at 6:20 AM, Rahul Raghavan wrote: >>> >>> Hi, >>> >>> Please review the following fix changeset. >>> >>> - http://cr.openjdk.java.net/~rraghavan/8233453/webrev.00/ >>> >>> # https://bugs.openjdk.java.net/browse/JDK-8233453 >>> # test-[open/test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] >>> >>> Issue is intermittent timeout failures of the test; >>> mostly with solaris-sparc and sometimes with windows-x64. >>> >>> The proposed fix here is to increase the timeout factor for the test. >>> [test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java] >>> - * @run main/othervm >>> + * @run main/othervm/timeout=300 >>> ....... >>> - * @run main/othervm >>> + * @run main/othervm/timeout=300 >>> >>> For some timeout failure cases of the test run, logs got generated with extra thread dump info. >>> e.g.: one attached to JBS page - >>> https://bugs.openjdk.java.net/secure/attachment/85805/Test_id0--JDK13.jtr >>> But could not find any hint of blocking threads or deadlocks from the analysis of these thread dumps. >>> Also instead it seems as if the timeout factor for the test is less >>> and that test just need more time to execute. >>> Got no failures with repeat test run after above proposed fix, also supports this. >>> >>> >>> Also checking the comments, fix done for old - >>> - https://bugs.openjdk.java.net/browse/JDK-8212028 >>> (Use run-test makefile framework for testing in Oracle's Mach5) >>> - http://hg.openjdk.java.net/jdk/jdk/rev/28375a1de254 >>> Found that similar '/timeout=300' addition was done for some '../vmTestbase/vm/mlvm/..' tests. >>> >>> >>> It seems the intermittent timeouts for the test started with jdk-13+17, with fix changeset of - >>> https://bugs.openjdk.java.net/browse/JDK-8221393 >>> ResolvedMethodTable too small for StackWalking applications >>> (could not get any timeout failure for the test with sources updated to jdk-13+16 or updated to changeset version just before 8221393 fix.) >>> Opened a separate task - JDK-8235485 - to check if this slowdown is expected. >>> >>> >>> Not getting any timeout failure, for repeat test run with above fix proposal. >>> >>> >>> Thanks, >>> Rahul >> From claes.redestad at oracle.com Mon Dec 9 10:15:04 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 9 Dec 2019 11:15:04 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> References: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> Message-ID: <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> Thanks for the review! /Claes On 2019-12-09 01:52, Vladimir Kozlov wrote: > Nice finding! Good. > > Thanks, > Vladimir > > On 12/8/19 1:44 PM, Claes Redestad wrote: >> Hi, >> >> increasing MaxInlineLevel can substantially improve performance in some >> benchmarks[1], and has been reported to help applications implemented in >> scala in particular. >> >> There is always some risk of regressions when tweaking the default >> inlining settings. I've done a number of experiments to ascertain that >> the effect of increasing this on a wide array of benchmarks. With 15 all >> benchmarks tested are show either neutral or positive results, with no >> observed regression w.r.t. compilation speed or code cache usage. >> >> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ >> >> Thanks! >> >> /Claes >> >> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with >> an increase from 9 to 15. From martin.doerr at sap.com Mon Dec 9 11:16:30 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 9 Dec 2019 11:16:30 +0000 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> References: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> Message-ID: Hi, I think tuning inlining makes sense for C2. The problem I see is that C1 uses the same inlining flags. C1 doesn't have the concept of uncommon traps so it compiles all paths. That's why I think this change is only good for C2. So I suggest separating C1 inlining flags before tuning them for C2 like in the example below (note that new flags require CSR). Feedback for this idea is welcome. Best regards, Martin diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp --- a/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 10:26:41 2019 +0100 +++ b/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 12:08:37 2019 +0100 @@ -174,6 +174,12 @@ develop_pd(bool, RoundFPResults, \ "Indicates whether rounding is needed for floating point results")\ \ + product(intx, C1MaxInlineSize, 35, \ + "The maximum bytecode size of a method to be inlined by C1") \ + product(intx, C1MaxInlineLevel, 9, \ + "The maximum number of nested calls that are inlined by C1") \ + product(intx, C1MaxRecursiveInlineLevel, 1, \ + "maximum number of nested recursive calls that are inlined") \ develop(intx, NestedInliningSizeRatio, 90, \ "Percentage of prev. allowed inline size in recursive inlining") \ range(0, 100) \ > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Claes Redestad > Sent: Montag, 9. Dezember 2019 11:15 > To: Vladimir Kozlov ; hotspot compiler > > Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel > > Thanks for the review! > > /Claes > > On 2019-12-09 01:52, Vladimir Kozlov wrote: > > Nice finding! Good. > > > > Thanks, > > Vladimir > > > > On 12/8/19 1:44 PM, Claes Redestad wrote: > >> Hi, > >> > >> increasing MaxInlineLevel can substantially improve performance in some > >> benchmarks[1], and has been reported to help applications implemented > in > >> scala in particular. > >> > >> There is always some risk of regressions when tweaking the default > >> inlining settings. I've done a number of experiments to ascertain that > >> the effect of increasing this on a wide array of benchmarks. With 15 all > >> benchmarks tested are show either neutral or positive results, with no > >> observed regression w.r.t. compilation speed or code cache usage. > >> > >> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ > >> > >> Thanks! > >> > >> /Claes > >> > >> [1] One http://renaissance.dev sub-benchmark improve by almost 3x > with > >> an increase from 9 to 15. From claes.redestad at oracle.com Mon Dec 9 12:44:19 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 9 Dec 2019 13:44:19 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> Message-ID: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com> Hi, Nils raised this issue in the bug, and while I think it's a fair point I think it's orthogonal to whether or not we can/should tune the default here and now. I think the data we have speaks in favor of doing this tuning for JDK 14, and then re-evaluate with more real-world data for JDK 15, possibly dialing back the C1 defaults. When implementing such flags we can also evaluate if C1 should be even more conservative than it is today. It's also worth thinking about whether or not we should introduce different settings for C1 level 1, 2 and 3.. /Claes On 2019-12-09 12:16, Doerr, Martin wrote: > Hi, > > I think tuning inlining makes sense for C2. > > The problem I see is that C1 uses the same inlining flags. > C1 doesn't have the concept of uncommon traps so it compiles all paths. > That's why I think this change is only good for C2. > > So I suggest separating C1 inlining flags before tuning them for C2 like in the example below (note that new flags require CSR). > Feedback for this idea is welcome. > > Best regards, > Martin > > > > diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp > --- a/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 10:26:41 2019 +0100 > +++ b/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 12:08:37 2019 +0100 > @@ -174,6 +174,12 @@ > develop_pd(bool, RoundFPResults, \ > "Indicates whether rounding is needed for floating point results")\ > \ > + product(intx, C1MaxInlineSize, 35, \ > + "The maximum bytecode size of a method to be inlined by C1") \ > + product(intx, C1MaxInlineLevel, 9, \ > + "The maximum number of nested calls that are inlined by C1") \ > + product(intx, C1MaxRecursiveInlineLevel, 1, \ > + "maximum number of nested recursive calls that are inlined") \ > develop(intx, NestedInliningSizeRatio, 90, \ > "Percentage of prev. allowed inline size in recursive inlining") \ > range(0, 100) \ > > > > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Claes Redestad >> Sent: Montag, 9. Dezember 2019 11:15 >> To: Vladimir Kozlov ; hotspot compiler >> >> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel >> >> Thanks for the review! >> >> /Claes >> >> On 2019-12-09 01:52, Vladimir Kozlov wrote: >>> Nice finding! Good. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/8/19 1:44 PM, Claes Redestad wrote: >>>> Hi, >>>> >>>> increasing MaxInlineLevel can substantially improve performance in some >>>> benchmarks[1], and has been reported to help applications implemented >> in >>>> scala in particular. >>>> >>>> There is always some risk of regressions when tweaking the default >>>> inlining settings. I've done a number of experiments to ascertain that >>>> the effect of increasing this on a wide array of benchmarks. With 15 all >>>> benchmarks tested are show either neutral or positive results, with no >>>> observed regression w.r.t. compilation speed or code cache usage. >>>> >>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x >> with >>>> an increase from 9 to 15. From gromero at linux.vnet.ibm.com Mon Dec 9 13:10:29 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 9 Dec 2019 10:10:29 -0300 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters Message-ID: Hi, Could the following change be reviewed please? Bug : https://bugs.openjdk.java.net/browse/JDK-8223968 Webrev: http://cr.openjdk.java.net/~gromero/8223968/v1/ It simply adds a description to the RTM abort counters, helping to understand the RTM statistics output faster and better. The change touches RTM shared code and also RTM tests in order to adapt the regex to the new output format. Thanks a lot. Best regards, Gustavo From tobias.hartmann at oracle.com Mon Dec 9 13:59:23 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 9 Dec 2019 14:59:23 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class Message-ID: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8235452 http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/ We crash during loop verification because a strip mined loop lost its OuterStripMined loop counterpart. Removal of the outer loop is triggered by the new optimization added with JDK-8220376 [1] but it can probably happen in other circumstances as well. The strip mined loop also lost its CountedLoopEnd and is therefore malformed. The detailed steps of how this happens are described in the test: http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/test/hotspot/jtreg/compiler/loopstripmining/TestDeadOuterStripMinedLoop.java.html The fix is to not try to verify strip mining if the strip mined loop is malformed. I've added an assert to check that the OuterStripMinedLoop is always removed in this case. The test also triggers a crash in the UseProfiledLoopPredicate code added by JDK-8203197 [2]. I've slightly modified the code in loopPredicate.cpp to only execute PathFrequency::to if the projection matches the uncommon trap if pattern and filed JDK-8235584 [3] to investigate in detail. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8220376 [2] https://bugs.openjdk.java.net/browse/JDK-8203197 [3] https://bugs.openjdk.java.net/browse/JDK-8235584 From rwestrel at redhat.com Mon Dec 9 14:05:42 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 09 Dec 2019 15:05:42 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> Message-ID: <8736dtbnm1.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8235452/webrev.00/ For OuterStripMinedLoopNode, is_valid_counted_loop() is false so the verification never runs when called from a OuterStripMinedLoopNode and the new line you added: 958 } else if (is_OuterStripMinedLoop()) { 959 outer = this->as_OuterStripMinedLoop(); 960 inner = outer->unique_ctrl_out()->as_CountedLoop(); 961 assert(inner->is_valid_counted_loop(), "OuterStripMinedLoop should have been removed"); 962 assert(!is_strip_mined(), "outer loop shouldn't be marked strip mined"); 963 } is unreachable. Roland. From martin.doerr at sap.com Mon Dec 9 14:06:32 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 9 Dec 2019 14:06:32 +0000 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters In-Reply-To: References: Message-ID: Hi Gustavo, nice improvement! I'd only make the message array static: static const char* _abortX_desc[ABORT_STATUS_LIMIT]; +const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = { + "abort instruction ", + "may succeed on retry", + "thread conflict ", + "buffer overflow ", + "debug or trap hit ", + "maximum nested depth" +}; Please update the copyright in the test. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Montag, 9. Dezember 2019 14:10 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Doerr, Martin ; vladimir.kozlov at oracle.com > Subject: RFR(XS): 8223968: Add abort type description to RTM statistic > counters > > Hi, > > Could the following change be reviewed please? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8223968 > Webrev: http://cr.openjdk.java.net/~gromero/8223968/v1/ > > It simply adds a description to the RTM abort counters, helping to understand > the RTM statistics output faster and better. > > The change touches RTM shared code and also RTM tests in order to adapt > the > regex to the new output format. > > Thanks a lot. > > Best regards, > Gustavo From tobias.hartmann at oracle.com Mon Dec 9 14:13:08 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 9 Dec 2019 15:13:08 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: <8736dtbnm1.fsf@redhat.com> References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> <8736dtbnm1.fsf@redhat.com> Message-ID: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> Hi Roland, thanks for looking at this! On 09.12.19 15:05, Roland Westrelin wrote: > For OuterStripMinedLoopNode, is_valid_counted_loop() is false so the > verification never runs when called from a OuterStripMinedLoopNode and > the new line you added: > > 958 } else if (is_OuterStripMinedLoop()) { > 959 outer = this->as_OuterStripMinedLoop(); > 960 inner = outer->unique_ctrl_out()->as_CountedLoop(); > 961 assert(inner->is_valid_counted_loop(), "OuterStripMinedLoop should have been removed"); > 962 assert(!is_strip_mined(), "outer loop shouldn't be marked strip mined"); > 963 } > > is unreachable. Right. What about this? http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/ Thanks, Tobias From rwestrel at redhat.com Mon Dec 9 14:18:27 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 09 Dec 2019 15:18:27 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com> Message-ID: <87wob5a8gc.fsf@redhat.com> Hi Martin, Thanks for reviewing this. > 1. mode != IgnoreStripMined at the beginning > 2. (mode == ControlAroundStripMined && use == sfpt) ||!use->is_reachable_from_root() That's reasonable. I will make the suggested changes. Roland. From rwestrel at redhat.com Mon Dec 9 14:19:51 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 09 Dec 2019 15:19:51 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> Message-ID: <87tv69a8e0.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/ Yes, good. Roland. From tobias.hartmann at oracle.com Mon Dec 9 14:30:32 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 9 Dec 2019 15:30:32 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: <87tv69a8e0.fsf@redhat.com> References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> <87tv69a8e0.fsf@redhat.com> Message-ID: Thanks Roland. Best regards, Tobias On 09.12.19 15:19, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/ > > Yes, good. > > Roland. > From rwestrel at redhat.com Mon Dec 9 15:15:59 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 09 Dec 2019 16:15:59 +0100 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be In-Reply-To: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> Message-ID: <87k175a5sg.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ That looks reasonable to me. Roland. From nils.eliasson at oracle.com Mon Dec 9 15:20:46 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 9 Dec 2019 16:20:46 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com> References: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com> Message-ID: Hi, For the record - I agree with Claes point. There are a lot of other things that will be nice to evaluate in a later release, but given that the data for this change looks good - I think we show go ahead. Reviewed, Nils On 2019-12-09 13:44, Claes Redestad wrote: > Hi, > > Nils raised this issue in the bug, and while I think it's a fair point I > think it's orthogonal to whether or not we can/should tune the default > here and now. I think the data we have speaks in favor of doing this > tuning for JDK 14, and then re-evaluate with more real-world data for > JDK 15, possibly dialing back the C1 defaults. > > When implementing such flags we can also evaluate if C1 should be even > more conservative than it is today. It's also worth thinking about > whether or not we should introduce different settings for C1 level 1, 2 > and 3.. > > /Claes > > On 2019-12-09 12:16, Doerr, Martin wrote: >> Hi, >> >> I think tuning inlining makes sense for C2. >> >> The problem I see is that C1 uses the same inlining flags. >> C1 doesn't have the concept of uncommon traps so it compiles all paths. >> That's why I think this change is only good for C2. >> >> So I suggest separating C1 inlining flags before tuning them for C2 >> like in the example below (note that new flags require CSR). >> Feedback for this idea is welcome. >> >> Best regards, >> Martin >> >> >> >> diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp >> --- a/src/hotspot/share/c1/c1_globals.hpp?????? Mon Dec 09 10:26:41 >> 2019 +0100 >> +++ b/src/hotspot/share/c1/c1_globals.hpp?????? Mon Dec 09 12:08:37 >> 2019 +0100 >> @@ -174,6 +174,12 @@ >> ??? develop_pd(bool, >> RoundFPResults,????????????????????????????????????????? \ >> ??????????? "Indicates whether rounding is needed for floating point >> results")\ >> \ >> +? product(intx, C1MaxInlineSize, >> 35,??????????????????????????????????????? \ >> +????????? "The maximum bytecode size of a method to be inlined by >> C1")????? \ >> +? product(intx, C1MaxInlineLevel, >> 9,??????????????????????????????????????? \ >> +????????? "The maximum number of nested calls that are inlined by >> C1")????? \ >> +? product(intx, C1MaxRecursiveInlineLevel, >> 1,?????????????????????????????? \ >> +????????? "maximum number of nested recursive calls that are >> inlined")????? \ >> ??? develop(intx, NestedInliningSizeRatio, >> 90,??????????????????????????????? \ >> ??????????? "Percentage of prev. allowed inline size in recursive >> inlining")? \ >> ??????????? range(0, >> 100)???????????????????????????????????????????????????? \ >> >> >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Claes Redestad >>> Sent: Montag, 9. Dezember 2019 11:15 >>> To: Vladimir Kozlov ; hotspot compiler >>> >>> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel >>> >>> Thanks for the review! >>> >>> /Claes >>> >>> On 2019-12-09 01:52, Vladimir Kozlov wrote: >>>> Nice finding! Good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/8/19 1:44 PM, Claes Redestad wrote: >>>>> Hi, >>>>> >>>>> increasing MaxInlineLevel can substantially improve performance in >>>>> some >>>>> benchmarks[1], and has been reported to help applications implemented >>> in >>>>> scala in particular. >>>>> >>>>> There is always some risk of regressions when tweaking the default >>>>> inlining settings. I've done a number of experiments to ascertain >>>>> that >>>>> the effect of increasing this on a wide array of benchmarks. With >>>>> 15 all >>>>> benchmarks tested are show either neutral or positive results, >>>>> with no >>>>> observed regression w.r.t. compilation speed or code cache usage. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ >>>>> >>>>> Thanks! >>>>> >>>>> /Claes >>>>> >>>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x >>> with >>>>> an increase from 9 to 15. From rwestrel at redhat.com Mon Dec 9 15:44:22 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 09 Dec 2019 16:44:22 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: References: Message-ID: <87h829a4h5.fsf@redhat.com> Hi Christian, > Before loading and copying the extra data from the MDO to the ciMDO in > ciMethodData::load_extra_data(), the metadata is prepared in a > fixed-point iteration by cleaning all SpeculativeTrapData entries of > methods whose klasses are unloaded [3]. If it encounters such a dead > entry it releases the extra data lock (due to ranking issues) and tries > again later [4]. This release of the lock triggers the bug: There can be > cases where one thread A is waiting in the whitebox API method to get > the extra data lock [2] to clean the extra data for the very same MDO > for which another thread B just released the lock at [4]. If that MDO > actually contained SpeculativeTrapData entries, then thread A cleaned > those but the ciMDO, which thread B is preparing, still contains the > uncleaned old MDO extra data (because thread B only made a snapshot of > the MDO earlier at [5]). Would it be possible to call prepare_data() before the snapshot is taken so the snapshot doesn't contain any entry that are then removed? Roland. From erik.osterlund at oracle.com Mon Dec 9 15:59:39 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 9 Dec 2019 16:59:39 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com> References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com> Message-ID: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com> Hi Vladimir, On 2019-12-06 16:19, Vladimir Ivanov wrote: > Hi Erik, > > I like your idea. Here's updated version: > ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01 Looks good! > While browsing the code, I noticed that changes in > G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar > is used for oop case). But I decided to keep them to keep it > (relatively) close to C2Access::needs_cpu_membar(). Sounds reasonable. Thanks, /Erik > Best regards, > Vladimir Ivanov > > On 05.12.2019 23:14, Erik ?sterlund wrote: >> Hi, >> >> Could we use the existing IN_NATIVE decorator instead of introducing >> a new decorator that seems to be an alias for the same thing? The >> decorator describing its use (IN_NATIVE) says it is for off-heap >> accesses pointing into the heap. We can just remove from the comment >> the part presuming it is a reference. >> >> What do you think? >> >> Thanks, >> /Erik >> >>> On 5 Dec 2019, at 20:16, Vladimir Kozlov >>> wrote: >>> >>> ?CCing to GC group. >>> >>> Looks fine to me but someone from GC land have to look too. >>> >>> I wish we have more concrete indication for off-heap access instead >>> of guessing it based on how we address memory through Unsafe API. >>> >>> Thanks, >>> Vladimir >>> >>>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>>> There were a number of fixes in C2 support for unsafe accesses >>>> recently which led to additional memory barriers around them. It >>>> improved stability, but in some cases it was redundant. One of >>>> important use cases which regressed is off-heap accesses [1]. The >>>> barriers around them are redundant because they are serialized on >>>> raw memory and don't intersect with any on-heap accesses. >>>> Proposed fix skips memory barriers around unsafe accesses which are >>>> provably off-heap (base == NULL). >>>> It (almost completely) recovers performance on the microbenchmark >>>> provided in JDK-8224182 [1]. >>>> Testing: tier1-6. >>>> Best regards, >>>> Vladimir Ivanov >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >> From vladimir.x.ivanov at oracle.com Mon Dec 9 16:04:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 9 Dec 2019 19:04:01 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com> References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com> <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com> Message-ID: <784f3959-933a-a5c7-beb2-f011c9928ec7@oracle.com> Thanks, Erik! Best regards, Vladimir Ivanov On 09.12.2019 18:59, Erik ?sterlund wrote: > Hi Vladimir, > > On 2019-12-06 16:19, Vladimir Ivanov wrote: >> Hi Erik, >> >> I like your idea. Here's updated version: >> ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01 > > Looks good! > >> While browsing the code, I noticed that changes in >> G1BarrierSetC2::load_at_resolved() aren't required (need_cpu_mem_bar >> is used for oop case). But I decided to keep them to keep it >> (relatively) close to C2Access::needs_cpu_membar(). > > Sounds reasonable. > > Thanks, > /Erik > >> Best regards, >> Vladimir Ivanov >> >> On 05.12.2019 23:14, Erik ?sterlund wrote: >>> Hi, >>> >>> Could we use the existing IN_NATIVE decorator instead of introducing >>> a new decorator that seems to be an alias for the same thing? The >>> decorator describing its use (IN_NATIVE) says it is for off-heap >>> accesses pointing into the heap. We can just remove from the comment >>> the part presuming it is a reference. >>> >>> What do you think? >>> >>> Thanks, >>> /Erik >>> >>>> On 5 Dec 2019, at 20:16, Vladimir Kozlov >>>> wrote: >>>> >>>> ?CCing to GC group. >>>> >>>> Looks fine to me but someone from GC land have to look too. >>>> >>>> I wish we have more concrete indication for off-heap access instead >>>> of guessing it based on how we address memory through Unsafe API. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> On 11/29/19 7:42 AM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/8226411/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8226411 >>>>> There were a number of fixes in C2 support for unsafe accesses >>>>> recently which led to additional memory barriers around them. It >>>>> improved stability, but in some cases it was redundant. One of >>>>> important use cases which regressed is off-heap accesses [1]. The >>>>> barriers around them are redundant because they are serialized on >>>>> raw memory and don't intersect with any on-heap accesses. >>>>> Proposed fix skips memory barriers around unsafe accesses which are >>>>> provably off-heap (base == NULL). >>>>> It (almost completely) recovers performance on the microbenchmark >>>>> provided in JDK-8224182 [1]. >>>>> Testing: tier1-6. >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8224182 >>> > From vladimir.x.ivanov at oracle.com Mon Dec 9 16:06:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 9 Dec 2019 19:06:57 +0300 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com> References: <0b821ef8-90ec-28e3-2008-ec999af7e87a@oracle.com> <86F63579-C1DB-469D-8E8F-E8017C28D342@oracle.com> Message-ID: <0f033409-6fd8-a60b-3eac-850a0040c91e@oracle.com> Thanks for review, John. > I wonder if there is a similar opportunity in > LibraryCallKit::inline_unsafe_load_store > or inline_vector_mem_operation or?inline_unsafe_copyMemory. ?All of those > also form unsafe addresses, and at least some seem to mention NULL_PTR > or IN_HEAP. > Is it worth an explanatory comment or a tracking bug? Good point. I think inline_vector_mem_operation and inline_unsafe_copyMemory can benefit from a similar optimization. Will file an RFE. Best regards, Vladimir Ivanov From tobias.hartmann at oracle.com Mon Dec 9 16:25:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 9 Dec 2019 17:25:22 +0100 Subject: [14] RFR (S): 8226411: C2: Avoid memory barriers around off-heap unsafe accesses In-Reply-To: <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com> References: <7fa6918f-f645-ba25-cad3-df223eba9990@oracle.com> <015f0c8a-37f0-e603-e644-7a51195c4ad2@oracle.com> <7326c54d-1aef-8cf7-cc4e-01c26aed73c8@oracle.com> Message-ID: On 09.12.19 16:59, Erik ?sterlund wrote: >> I like your idea. Here's updated version: >> ? http://cr.openjdk.java.net/~vlivanov/8226411/webrev.01 > > Looks good! +1 Best regards, Tobias From vladimir.x.ivanov at oracle.com Mon Dec 9 16:26:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 9 Dec 2019 19:26:01 +0300 Subject: [14] RFR (XXS): 8235143: C2: No memory state needed in Thread::currentThread() intrinsic In-Reply-To: References: Message-ID: <1c2ed832-2915-5484-0620-5838d8377ec4@oracle.com> Thanks for review, John. > It makes me wonder what?s wrong with the various overloadings > of GraphKit::make_load, that you have to open-code a special call. > If this happens a lot, we should try to figure out a new way to call > GK::make_load. I would have thought there would be an adr_idx > value for C->immutable_memory! > > Feeding from immutable_memory will probably be a good thing to > do in more circumstances as we get more reliably immutable data. Yes, it looks appealing, but there are some cases which need special handling. For example, it doesn't respect instance initialization which can manifest as observing partially constructed instance. Best regards, Vladimir Ivanov >> On Dec 5, 2019, at 4:43 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8235143/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8235143 >> >> Thread::currentThread() intrinsic doesn't need memory state: >> though multiple threads can execute same code, "current thread" can't change in the context of a single method activation. So, once it is observed, it's safe to share among all users. >> >> One of the use cases which benefit a lot from such optimization is ownership checks for thread confined resources (fast path check for owner thread to avoid heavy-weight synchronization). >> >> The patch was part of foreign-memaccess branch in Project Panama and showed good performance results on Memory Access API implementation [1]. >> >> Testing: tier1-4 >> >> PS: the optimization should be disabled in Project Loom: the assumption doesn't hold for continuations (in their current form). >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://openjdk.java.net/jeps/370 >> JEP 370: Foreign-Memory Access API (Incubator) > From sandhya.viswanathan at intel.com Mon Dec 9 18:54:32 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 9 Dec 2019 18:54:32 +0000 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: Hi Vladimir, It will be wonderful if we can get this checked in by Thursday deadline for JDK 14. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Thursday, December 05, 2019 4:01 AM To: hotspot compiler Cc: Bhateja, Jatin ; Viswanathan, Sandhya Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all https://bugs.openjdk.java.net/browse/JDK-8235405 Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following operations: - LoadVector - StoreVector - RoundDoubleModeV - AndV - OrV - XorV - MulAddVS2VI - PopCountVI Indiviual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html From tom.rodriguez at oracle.com Mon Dec 9 20:10:44 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 9 Dec 2019 12:10:44 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches Message-ID: http://cr.openjdk.java.net/~never/8229377/webrev https://bugs.openjdk.java.net/browse/JDK-8229377 This is a minor improvement to the JVMCI invalidate method to avoid scanning large code caches when invalidating a single nmethod. Instead the nmethod is directly made not_entrant. In general I'm unclear what the benefit of the mark_for_deoptimization/make_marked_nmethods_not_entrant split is. Testing is in progress. JDK-8230884 had been previously duplicated against this because they overlapped a bit, but in the interest of clarity I separated them again. tom From headius at headius.com Mon Dec 9 21:31:00 2019 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 9 Dec 2019 15:31:00 -0600 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: This gets a big +1 for me, but honestly even better would be eliminating the hard inline level limit altogether in favor of other metrics (inlined code size, optimized node count, incremental inlining). Ignoring that for the moment... Anecdotally, I can say that there are many code paths in JRuby that don't inline fully *solely* because of this limit. For example, there are objects in the numeric tower that out of compatibility necessity have constructor paths that get dangerously close to 9 levels deep: calling code-> Fixnum factory -> Fixnum constructor -> Integer -> Numeric -> Object -> BasicObject -> j.l.Object ...and that would just inline into the first level of Ruby code. The last five levels here are mostly just "super()" calls. Ideally we would want to have a few levels of Ruby code inlining together too. If this path doesn't inline, we've got no chance for escape analysis and other optimizations to reduce the transient numeric objects. A question relating to JRuby: how does the inline level interact with MethodHandle/LambdaForm @ForceInline? Does that get around it? ALWAYS? Clearly we see most of our invokedynamic call sites inline, but I have little understanding of the actual cost of the five to 10 to 20 levels of LFs between an indy site and the eventual target. In any case... +1, inline all the things. - Charlie On Sun, Dec 8, 2019 at 3:42 PM Claes Redestad wrote: > Hi, > > increasing MaxInlineLevel can substantially improve performance in some > benchmarks[1], and has been reported to help applications implemented in > scala in particular. > > There is always some risk of regressions when tweaking the default > inlining settings. I've done a number of experiments to ascertain that > the effect of increasing this on a wide array of benchmarks. With 15 all > benchmarks tested are show either neutral or positive results, with no > observed regression w.r.t. compilation speed or code cache usage. > > Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ > > Thanks! > > /Claes > > [1] One http://renaissance.dev sub-benchmark improve by almost 3x with > an increase from 9 to 15. > From gromero at linux.vnet.ibm.com Mon Dec 9 22:05:04 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 9 Dec 2019 19:05:04 -0300 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters In-Reply-To: References: Message-ID: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com> Hi Martin, On 12/09/2019 11:06 AM, Doerr, Martin wrote: > nice improvement! =) Thanks for the quick review! > I'd only make the message array static: > static const char* _abortX_desc[ABORT_STATUS_LIMIT]; > > +const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = { > + "abort instruction ", > + "may succeed on retry", > + "thread conflict ", > + "buffer overflow ", > + "debug or trap hit ", > + "maximum nested depth" > +}; oh... sure. Done. > Please update the copyright in the test. Done. v2: http://cr.openjdk.java.net/~gromero/8223968/v2/ Best regards, Gustavo From vladimir.x.ivanov at oracle.com Mon Dec 9 22:11:23 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 01:11:23 +0300 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com> Hi Sandhya, We need 1 more (R)eview before I can push it. PS: and 8234392 [1] which it depends on. Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html On 09.12.2019 21:54, Viswanathan, Sandhya wrote: > Hi Vladimir, > > It will be wonderful if we can get this checked in by Thursday deadline for JDK 14. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Thursday, December 05, 2019 4:01 AM > To: hotspot compiler > Cc: Bhateja, Jatin ; Viswanathan, Sandhya > Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235405 > > Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following > operations: > - LoadVector > - StoreVector > - RoundDoubleModeV > - AndV > - OrV > - XorV > - MulAddVS2VI > - PopCountVI > > Indiviual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual > > As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > From claes.redestad at oracle.com Mon Dec 9 22:33:52 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 9 Dec 2019 23:33:52 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: Thanks for chiming in! On 2019-12-09 22:31, Charles Oliver Nutter wrote: > A question relating to JRuby: how does the inline level interact with > MethodHandle/LambdaForm?@ForceInline? Does that get around it? ALWAYS? > Clearly we see most of our invokedynamic call sites inline, but I have > little understanding of the actual cost of the five to 10 to 20 levels > of LFs between an indy site and the eventual target. From my reading of the code, there's a limit of 100 levels of @ForceInline. This seems very hard to ever hit, and would be observable if -XX:+PrintInlining ever printed "MaxForceInlineLevel" as a reason to not inline. AFAIU, and my understanding here is far from perfect, then long before hitting this limit even extreme LF/indy usage will have split deeply nested calls into smaller chunks that we chain together, in an effort to keep very complex shapes byte sized for the JIT and better enable some LF sharing optimizations on the library side. /Claes From sandhya.viswanathan at intel.com Mon Dec 9 22:32:52 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 9 Dec 2019 22:32:52 +0000 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com> References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com> Message-ID: Could we please get one more review on this? This is part of the Generic operand and libjvm size reduction. This work was split into five patches of which three are checked in. The 8235405 and 8234392 are the remaining two. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Monday, December 09, 2019 2:11 PM To: Viswanathan, Sandhya ; hotspot compiler Cc: Bhateja, Jatin Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations Hi Sandhya, We need 1 more (R)eview before I can push it. PS: and 8234392 [1] which it depends on. Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html On 09.12.2019 21:54, Viswanathan, Sandhya wrote: > Hi Vladimir, > > It will be wonderful if we can get this checked in by Thursday deadline for JDK 14. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Thursday, December 05, 2019 4:01 AM > To: hotspot compiler > Cc: Bhateja, Jatin ; Viswanathan, Sandhya > Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235405 > > Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following > operations: > - LoadVector > - StoreVector > - RoundDoubleModeV > - AndV > - OrV > - XorV > - MulAddVS2VI > - PopCountVI > > Indiviual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual > > As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > From vladimir.x.ivanov at oracle.com Mon Dec 9 22:44:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 01:44:31 +0300 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> <3c4dc007-485d-bf8b-d17a-8e63b9efefd7@oracle.com> Message-ID: > Could we please get one more review on this? > This is part of the Generic operand and libjvm size reduction. > This work was split into five patches of which three are checked in. > The 8235405 and 8234392 are the remaining two. To clarify: as I mentioned in the original email, it's the first batch of merged instructions: >> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). (I delayed posting the rest for review partly to avoid spamming the alias and to be able to promptly address review feedback.) Considering the size of those patches (~6k LOCs in total), I doubt all 7 can be fully reviewed before Thursday. But IMO it's fine to get as much as we can into 14 and put the rest into 15. Best regards, Vladimir Ivanov > -----Original Message----- > From: Vladimir Ivanov > Sent: Monday, December 09, 2019 2:11 PM > To: Viswanathan, Sandhya ; hotspot compiler > Cc: Bhateja, Jatin > Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations > > Hi Sandhya, > > We need 1 more (R)eview before I can push it. > > PS: and 8234392 [1] which it depends on. > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-December/036291.html > > On 09.12.2019 21:54, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> It will be wonderful if we can get this checked in by Thursday deadline for JDK 14. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Thursday, December 05, 2019 4:01 AM >> To: hotspot compiler >> Cc: Bhateja, Jatin ; Viswanathan, Sandhya >> Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all >> https://bugs.openjdk.java.net/browse/JDK-8235405 >> >> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following >> operations: >> - LoadVector >> - StoreVector >> - RoundDoubleModeV >> - AndV >> - OrV >> - XorV >> - MulAddVS2VI >> - PopCountVI >> >> Indiviual patches: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual >> >> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. >> >> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad will be posted for review separately (7 more patches pending). >> >> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >> From vladimir.kozlov at oracle.com Mon Dec 9 23:28:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 15:28:33 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> Message-ID: This looks good. Thanks, Vladimir On 12/5/19 1:53 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234392 > > Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() > on x86. > > It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using > Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see > Matcher::vector_width_in_bytes() [1] for details). > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-4 > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442 From vladimir.kozlov at oracle.com Tue Dec 10 00:44:28 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 16:44:28 -0800 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com> On 12/5/19 4:01 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235405 > > Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers > the following operations: > ? - LoadVector > ? - StoreVector There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size (or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute. Proposed change use only vmovdqul. I want to know rational behind the change. > ? - RoundDoubleModeV Conversion of predicate to assert vround8D_* instructions needs to be explained. Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512: http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411 Is it because 8234392 added vector_size_supported() check? > ? - AndV > ? - OrV > ? - XorV Above 3 are good. > ? - MulAddVS2VI Both related webrevs are good. > ? - PopCountVI Good. Thanks, Vladimir > > Indiviual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/individual > > As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for > reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed > earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending). > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html From vladimir.kozlov at oracle.com Tue Dec 10 00:52:50 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 16:52:50 -0800 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters In-Reply-To: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com> References: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com> Message-ID: <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com> v2 looks good to me. Thanks, Vladimir On 12/9/19 2:05 PM, Gustavo Romero wrote: > Hi Martin, > > On 12/09/2019 11:06 AM, Doerr, Martin wrote: >> nice improvement! > > =) > > Thanks for the quick review! > > >> I'd only make the message array static: >> static const char* _abortX_desc[ABORT_STATUS_LIMIT]; >> >> +const char* RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = { >> +? "abort instruction?? ", >> +? "may succeed on retry", >> +? "thread conflict???? ", >> +? "buffer overflow???? ", >> +? "debug or trap hit?? ", >> +? "maximum nested depth" >> +}; > > oh... sure. Done. > > >> Please update the copyright in the test. > > Done. > > > v2: > > http://cr.openjdk.java.net/~gromero/8223968/v2/ > > > Best regards, > Gustavo From vladimir.kozlov at oracle.com Tue Dec 10 01:01:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 17:01:34 -0800 Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks -XX:-EnableJVMCI Message-ID: https://cr.openjdk.java.net/~kvn/8235539/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8235539 Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true. Changes are prepared by Doug for GraalVM and adapted by me for JDK 14. Passed tier1 where new test runs. Thanks, Vladimir From sandhya.viswanathan at intel.com Tue Dec 10 01:13:05 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 10 Dec 2019 01:13:05 +0000 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com> References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com> Message-ID: Hi Vladimir, Thanks a lot for your review. Please see my response in your email below. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Monday, December 09, 2019 4:44 PM To: hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations On 12/5/19 4:01 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235405 > > Reduce the number of AD instructions needed to implement vector > operations by merging existing ones. The patch covers the following operations: > ? - LoadVector > ? - StoreVector There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size (or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute. Proposed change use only vmovdqul. I want to know rational behind the change. Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same, when the instruction is encoded with no mask register, which is the case here. > ? - RoundDoubleModeV Conversion of predicate to assert vround8D_* instructions needs to be explained. Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512: http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411 Is it because 8234392 added vector_size_supported() check? Sandhya >> Yes. > ? - AndV > ? - OrV > ? - XorV Above 3 are good. > ? - MulAddVS2VI Both related webrevs are good. > ? - PopCountVI Good. Thanks, Vladimir > > Indiviual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/indivi > dual > > As Jatin described, merging is applied only to AD instructions of > similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector > support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending). > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-Augu > st/034822.html From john.r.rose at oracle.com Tue Dec 10 01:27:17 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 9 Dec 2019 17:27:17 -0800 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: Reviewed by me also. This is a better way to do vector stuff. Thanks, Vladimir K, for the perceptive questions. ? John From john.r.rose at oracle.com Tue Dec 10 01:53:14 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 9 Dec 2019 17:53:14 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> Message-ID: Reviewed. The rules for NegV[DF] and CMoveV[DF] that remain looks puzzling to me, and would benefit from a brief comment for future readers, explaining why they are there. The NegV rules seem to be removable for the same reason (size limits) that the other rules are removable. If there?s a tricky reason they must be called out explicitly, it would be good to explain. The CMoveV rules probably stem from limited support for CMove, but again a comment would be good. ? John On Dec 5, 2019, at 1:53 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8234392 > > Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() on x86. > > It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() [1] for details). > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Testing: tier1-4 > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442 From vladimir.kozlov at oracle.com Tue Dec 10 02:37:05 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 18:37:05 -0800 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> <0450f922-d21f-07bb-b122-988c4cec29a7@oracle.com> Message-ID: <98701f33-67d9-6bac-765f-9975556a82bd@oracle.com> > Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same Okay. No more comments. Reviewed. Thanks, Vladimir On 12/9/19 5:13 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for your review. Please see my response in your email below. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov > Sent: Monday, December 09, 2019 4:44 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Viswanathan, Sandhya > Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations > > On 12/5/19 4:01 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/all >> https://bugs.openjdk.java.net/browse/JDK-8235405 >> >> Reduce the number of AD instructions needed to implement vector >> operations by merging existing ones. The patch covers the following operations: >> ? - LoadVector >> ? - StoreVector > > There was difference in usage of vmovdquq vs vmovdqul instructions for 64 bytes wide vectors depending on element size (or number of elements in other word). The only difference in encoding of 2 instructions is value of *vex_w* attribute. > Proposed change use only vmovdqul. I want to know rational behind the change. > > Sandhya >> The AVX512 emovdqul and emovdquq instruction behavior is same, when the instruction is encoded with no mask register, which is the case here. > >> ? - RoundDoubleModeV > > Conversion of predicate to assert vround8D_* instructions needs to be explained. > Originally they were guarded by UseAVX > 2. The code in match_rule_supported_vector() checks only AVX and not AVX512: > > http://hg.openjdk.java.net/jdk/jdk/file/2aaa8bcb90a9/src/hotspot/cpu/x86/x86.ad#l1411 > > Is it because 8234392 added vector_size_supported() check? > > Sandhya >> Yes. > >> ? - AndV >> ? - OrV >> ? - XorV > > Above 3 are good. > >> ? - MulAddVS2VI > > Both related webrevs are good. > >> ? - PopCountVI > > Good. > > Thanks, > Vladimir > >> >> Indiviual patches: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235405/webrev.00/indivi >> dual >> >> As Jatin described, merging is applied only to AD instructions of >> similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. >> >> The patch is derived from the inintial version of generic vector >> support [1]. Generic vector support was reviewed earlier and the other parts of refactorings in x86.ad? will be posted for review separately (7 more patches pending). >> >> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-Augu >> st/034822.html From vladimir.kozlov at oracle.com Tue Dec 10 04:09:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 20:09:21 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: References: Message-ID: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> But it looks like there are testing failures. The idea and changes seems fine to me. The only question I have is why JVMCI code want to deoptimize all related compiler frames immediately? Why marking is not enough? Thanks, Vladimir On 12/9/19 12:10 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8229377/webrev > https://bugs.openjdk.java.net/browse/JDK-8229377 > > This is a minor improvement to the JVMCI invalidate method to avoid scanning large code caches when invalidating a > single nmethod.? Instead the nmethod is directly made not_entrant.? In general I'm unclear what the benefit of the > mark_for_deoptimization/make_marked_nmethods_not_entrant split is. Testing is in progress. > > JDK-8230884 had been previously duplicated against this because they overlapped a bit, but in the interest of clarity I > separated them again. > > tom From vladimir.kozlov at oracle.com Tue Dec 10 04:20:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 9 Dec 2019 20:20:42 -0800 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be In-Reply-To: <87k175a5sg.fsf@redhat.com> References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> <87k175a5sg.fsf@redhat.com> Message-ID: <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com> +1 Thanks, Vladimir K On 12/9/19 7:15 AM, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ > > That looks reasonable to me. > > Roland. > From jzaugg at gmail.com Tue Dec 10 06:18:52 2019 From: jzaugg at gmail.com (Jason Zaugg) Date: Tue, 10 Dec 2019 16:18:52 +1000 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: On Mon, 9 Dec 2019 at 07:42, Claes Redestad wrote: > Hi, > > increasing MaxInlineLevel can substantially improve performance in some > benchmarks[1], and has been reported to help applications implemented in > scala in particular. > > There is always some risk of regressions when tweaking the default > inlining settings. I've done a number of experiments to ascertain that > the effect of increasing this on a wide array of benchmarks. With 15 all > benchmarks tested are show either neutral or positive results, with no > observed regression w.r.t. compilation speed or code cache usage. > Great to see! A few notes from the Scala perspective (I work on the Scala compiler and library at Lightbend.) Long chains of method calls arise in Scala applications for a few reasons: - Standard library collections have generic bridge methods that arise from refinement of the collection type constructor down the hierarchy. For instance, a generic method returning `Coll[T]` can be overridden in a subclass to return `MyCollection[...]`. - The encoding of traits (interfaces with mixin behaviour and storage) and singleton objects indirect calls through forwarder methods (these don't adapt arguments/results) - The collections hierarchy itself is relatively deep, so the chain of super() calls can be problematic (as per Charle's message about JRuby). - Symbolic method names in the collections API have alphanumeric aliases (e.g. `+=` and `addOne`) These factors tend to conspire to exhaust the default budget. Analysis with JITWatch / JMH etc have led us to refactor the library code to keep important use cases fast (example [1]). I once experimented [2] with a change to Hotspot to consider our bridge-like forwarder methods as exempt from the MaxInlineLevel budget, as is the case for lambda form frames. That change alone delivered most of the performance benefit of doubling MaxInlneLevel with an unmodified JVM. Jason Zaugg [1] https://github.com/scala/bug/issues/11627#issuecomment-514490505 [2] https://gist.github.com/retronym/d27090a68570485c3e329a52db0c7b25 From christian.hagedorn at oracle.com Tue Dec 10 06:57:34 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 10 Dec 2019 07:57:34 +0100 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be In-Reply-To: <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com> References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com> Message-ID: Thank you Roland and Vladimir for your reviews! Best regards, Christian On 10.12.19 05:20, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 12/9/19 7:15 AM, Roland Westrelin wrote: >> >>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ >> >> That looks reasonable to me. >> >> Roland. >> From goetz.lindenmaier at sap.com Tue Dec 10 07:39:33 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 10 Dec 2019 07:39:33 +0000 Subject: Please open 8152988: [AOT] Update test batch definitions to include aot-ed java.base module mode into hs-comp testing In-Reply-To: References: Message-ID: Hi, I would like to downport this change. It would be great if the bug could be opened up. http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583 https://bugs.openjdk.java.net/browse/JDK-8152988 Thanks and best regards, Goetz. From Pengfei.Li at arm.com Tue Dec 10 08:28:57 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 10 Dec 2019 08:28:57 +0000 Subject: [aarch64-port-dev ] crash due to long offset In-Reply-To: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com> References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com> Message-ID: Hi Zhuoren, > I also wrote a patch to solve this issue, please also review. > http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat > ch Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake. In "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))". Should it be "&&" instead of "&" in the middle? Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad? I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review. -- Thanks, Pengfei From tobias.hartmann at oracle.com Tue Dec 10 08:30:48 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 09:30:48 +0100 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be In-Reply-To: References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com> Message-ID: Looks good to me too. Pushed. Best regards, Tobias On 10.12.19 07:57, Christian Hagedorn wrote: > Thank you Roland and Vladimir for your reviews! > > Best regards, > Christian > > On 10.12.19 05:20, Vladimir Kozlov wrote: >> +1 >> >> Thanks, >> Vladimir K >> >> On 12/9/19 7:15 AM, Roland Westrelin wrote: >>> >>>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ >>> >>> That looks reasonable to me. >>> >>> Roland. >>> From christian.hagedorn at oracle.com Tue Dec 10 08:39:15 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 10 Dec 2019 09:39:15 +0100 Subject: [14] RFR(S): 8233032: assert(in_bb(n)) failed: must be In-Reply-To: References: <1e09a7c4-cb21-d105-9aa0-e6dac11df6ef@oracle.com> <87k175a5sg.fsf@redhat.com> <5ffa89ae-d836-d29c-c5b8-7397a02afa6e@oracle.com> Message-ID: <82d807cb-e192-1ff2-84af-d2ef31278c24@oracle.com> Thank you Tobias for your review! Best regards, Christian On 10.12.19 09:30, Tobias Hartmann wrote: > Looks good to me too. Pushed. > > Best regards, > Tobias > > On 10.12.19 07:57, Christian Hagedorn wrote: >> Thank you Roland and Vladimir for your reviews! >> >> Best regards, >> Christian >> >> On 10.12.19 05:20, Vladimir Kozlov wrote: >>> +1 >>> >>> Thanks, >>> Vladimir K >>> >>> On 12/9/19 7:15 AM, Roland Westrelin wrote: >>>> >>>>> http://cr.openjdk.java.net/~chagedorn/8233032/webrev.00/ >>>> >>>> That looks reasonable to me. >>>> >>>> Roland. >>>> From tobias.hartmann at oracle.com Tue Dec 10 08:45:04 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 09:45:04 +0100 Subject: Please open 8152988: [AOT] Update test batch definitions to include aot-ed java.base module mode into hs-comp testing In-Reply-To: References: Message-ID: <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com> Hi Goetz, done. Best regards, Tobias On 10.12.19 08:39, Lindenmaier, Goetz wrote: > Hi, > > I would like to downport this change. > It would be great if the bug could be opened up. > > http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583 > https://bugs.openjdk.java.net/browse/JDK-8152988 > > Thanks and best regards, > Goetz. > From rwestrel at redhat.com Tue Dec 10 08:45:07 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 10 Dec 2019 09:45:07 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <87wob5a8gc.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com> <87wob5a8gc.fsf@redhat.com> Message-ID: <87eexca7sc.fsf@redhat.com> New webrev with Martin's suggestion: http://cr.openjdk.java.net/~roland/8234350/webrev.01/ Roland. From tobias.hartmann at oracle.com Tue Dec 10 08:53:16 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 09:53:16 +0100 Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks -XX:-EnableJVMCI In-Reply-To: References: Message-ID: Hi Vladimir, looks good to me. Please add a bug number to the test. Also, the Expectation.name/value/origin fields are not needed, right? No new webrev required. Best regards, Tobias On 10.12.19 02:01, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8235539/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8235539 > > Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag > is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true. > > Changes are prepared by Doug for GraalVM and adapted by me for JDK 14. > > Passed tier1 where new test runs. > > Thanks, > Vladimir From tom.rodriguez at oracle.com Tue Dec 10 09:24:10 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Dec 2019 01:24:10 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> References: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> Message-ID: Vladimir Kozlov wrote on 12/9/19 8:09 PM: > But it looks like there are testing failures. None of the testing failures look like they have anything to do with these changes. They all have failed recently in other builds from what I can see. > > The idea and changes seems fine to me. The only question I have is why > JVMCI code want to deoptimize all related compiler frames immediately? > Why marking is not enough? I'm not sure what you're asking. mark_for_deoptimization doesn't really do anything on it's own. You have to do the code cache scan to make_not_entrant and then do the stack walks to invalidate any frames which are still using that code. I'm not changing the semantic of the original function either, just skipping a useless full code cache scan. tom > > Thanks, > Vladimir > > On 12/9/19 12:10 PM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/8229377/webrev >> https://bugs.openjdk.java.net/browse/JDK-8229377 >> >> This is a minor improvement to the JVMCI invalidate method to avoid >> scanning large code caches when invalidating a single nmethod. >> Instead the nmethod is directly made not_entrant.? In general I'm >> unclear what the benefit of the >> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. >> Testing is in progress. >> >> JDK-8230884 had been previously duplicated against this because they >> overlapped a bit, but in the interest of clarity I separated them again. >> >> tom From tobias.hartmann at oracle.com Tue Dec 10 09:29:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 10:29:31 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Message-ID: Hi Christian, as we've discussed offline, this looks good to me. Although the change is quite complex, I think we should target JDK 14 for the reasons that Vladimir already brought up. Also, the entry->outcnt() > 1 condition makes sure this code is only executed in very specific/rare cases. Nevertheless, it would be good to get more reviews. Best regards, Tobias On 28.11.19 10:24, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8233033 > http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ > > The C2 compiled code produces a wrong result for 'iFld' in the test case. It is -8 instead of -7. > The loop in the test case is partially peeled and then unswitched. The wrong result is produced > because a wrong state is transferred to the interpreter when an uncommon trap is hit in the C2 > compiled code in the fast version of the unswitched loop. > > The problem is when unswitching the loop, we clone the original loop predicates for the slow and > fast version of the loop [1] but we do not account for partially peeled statements that are control > dependent on the loop predicates (i.e. need to be executed after the predicates). As a result, these > are executed before the cloned loop predicates. > > The situation of the test case method PartialPeelingUnswitch::test() is visualized in [2]. IfTrue > 118, the entry control of the original loop which follows right after the loop predicates, has an > output edge to the StoreI 353 node. This node belongs to the "iFld += -7" statement which was > partially peeled before. When creating the slow version of the loop and cloning the predicates in > PhaseIdealLoop::create_slow_version_of_loop(), this control dependency is lost. StoreI 353 is still > dependent on IfTrue 118 instead of IfTrue 472 (fast loop entry control) and IfTrue 476 (slow loop > entry control). The original loop predicates are later removed and thus, when hitting the uncommon > trap in the fast loop, we accidentally executed "iFld += -7" (StoreI 353) already even though the > interpreter assumes C2 has not executed any statements in the loop. As a result, "iFld += -7" is > executed twice in a row which produces a wrong result. > > The fix is to replace the control input of all statements that have a control input from the > original loop entry control (and are not the "loop selection" IfNode) with the fast and slow entry > control, respectively. Since the statements cannot have two control inputs they need to be cloned > together with all following nodes on a path to the loop phi nodes. The output of the last node > before a loop phi on such a path needs to be adjusted to only point to the phi node belonging to the > fast loop. The last node on the cloned path is set to the phi node belonging to the slow loop. The > fix is visualized in [3]. The control input of StoreI 353 is now the entry control of the fast loop > (IfTrue 472) and its output only points to the corresponding Phi 442 of the fast loop. The same was > done for the cloned node StoreI 476 of StoreI 353 for the slow loop. > > This bug can also be reproduced with JDK 11. Should we target this fix to 14 or defer it to 15 > (since it's a more complex one)? > > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272 > [2] https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png > [3] https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png From tobias.hartmann at oracle.com Tue Dec 10 09:34:43 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 10:34:43 +0100 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: References: <7176592d-4784-0797-4457-463773464536@oracle.com> <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com> Message-ID: Hi Liu, looks good and trivial to me. I'll sponsor. Best regards, Tobias On 09.12.19 09:17, Liu Xin wrote: > Hi, Tobias,? > > Thanks for your feedback.? > Here is the new webrev.? I update?what you pointed out.? > https://cr.openjdk.java.net/~xliu/8235383/02/webrev/ > > The patch passed hotspot-tier1 for both fastdebug and release builds.? > > thanks, > --lx > > > On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann > wrote: > > Hi Liu, > > thanks for adding the test. > > We try to avoid bug ids as test names. In this case, I would suggest something like > TestPrintIRDuringConstruction with a compiler.c1 package declaration. Also, the first line of the > copyright header looks wrong (should look like this [1]). > > line 27-29: I don't think you need these lines > line 42: "initiail" -> "initial" > > Thanks, > Tobias > > [1] > http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java > > On 07.12.19 05:26, Liu Xin wrote: > > hi, Tobias,? > > > > Thank you for reviewing it. I add a regression test about it. Could you take a look? > > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/ > > > > thanks, > > > > --lx > > > > > > > > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann > > >> wrote: > > > >? ? ?Hi Liu, > > > >? ? ?your fix looks good to me but could you please add a regression test? > > > >? ? ?Thanks, > >? ? ?Tobias > > > >? ? ?On 06.12.19 08:23, Liu Xin wrote: > >? ? ?> Hi, Reviewers, > >? ? ?> > >? ? ?> Could you review this very simple bugfix for C1? > >? ? ?> JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > >? ? ?> Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > >? ? ?> > >? ? ?> The root cause is some instructions are going to be eliminated, so they are > >? ? ?> not assigned to any valid bci. > >? ? ?> In present of? -XX:+PrintIRDuringConstruction -XX:+Verbose,? C1 will print > >? ? ?> them out and then hit the assert. > >? ? ?> > >? ? ?> Yes, I can twiddle graph_builder to assign right BCIs to them,? but I would > >? ? ?> like to have a more robust InstructionPrinter::print_line. the CR will > >? ? ?> leave blanks in the position of bci. > >? ? ?> Eliminated store for object 0: > >? ? ?> .? ? ? 0? ? a67? ? a58._24 := a54 (L) next > >? ? ?> Eliminated load: > >? ? ?> .? ? ? 0? ? i35? ? a11._24 (I) position > >? ? ?> > >? ? ?> thanks, > >? ? ?> --lx > >? ? ?> > > > From christian.hagedorn at oracle.com Tue Dec 10 09:38:03 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 10 Dec 2019 10:38:03 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Message-ID: Thank you Tobias for your review! Best regards, Christian On 10.12.19 10:29, Tobias Hartmann wrote: > Hi Christian, > > as we've discussed offline, this looks good to me. Although the change is quite complex, I think we > should target JDK 14 for the reasons that Vladimir already brought up. Also, the entry->outcnt() > 1 > condition makes sure this code is only executed in very specific/rare cases. > > Nevertheless, it would be good to get more reviews. > > Best regards, > Tobias > > On 28.11.19 10:24, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8233033 >> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ >> >> The C2 compiled code produces a wrong result for 'iFld' in the test case. It is -8 instead of -7. >> The loop in the test case is partially peeled and then unswitched. The wrong result is produced >> because a wrong state is transferred to the interpreter when an uncommon trap is hit in the C2 >> compiled code in the fast version of the unswitched loop. >> >> The problem is when unswitching the loop, we clone the original loop predicates for the slow and >> fast version of the loop [1] but we do not account for partially peeled statements that are control >> dependent on the loop predicates (i.e. need to be executed after the predicates). As a result, these >> are executed before the cloned loop predicates. >> >> The situation of the test case method PartialPeelingUnswitch::test() is visualized in [2]. IfTrue >> 118, the entry control of the original loop which follows right after the loop predicates, has an >> output edge to the StoreI 353 node. This node belongs to the "iFld += -7" statement which was >> partially peeled before. When creating the slow version of the loop and cloning the predicates in >> PhaseIdealLoop::create_slow_version_of_loop(), this control dependency is lost. StoreI 353 is still >> dependent on IfTrue 118 instead of IfTrue 472 (fast loop entry control) and IfTrue 476 (slow loop >> entry control). The original loop predicates are later removed and thus, when hitting the uncommon >> trap in the fast loop, we accidentally executed "iFld += -7" (StoreI 353) already even though the >> interpreter assumes C2 has not executed any statements in the loop. As a result, "iFld += -7" is >> executed twice in a row which produces a wrong result. >> >> The fix is to replace the control input of all statements that have a control input from the >> original loop entry control (and are not the "loop selection" IfNode) with the fast and slow entry >> control, respectively. Since the statements cannot have two control inputs they need to be cloned >> together with all following nodes on a path to the loop phi nodes. The output of the last node >> before a loop phi on such a path needs to be adjusted to only point to the phi node belonging to the >> fast loop. The last node on the cloned path is set to the phi node belonging to the slow loop. The >> fix is visualized in [3]. The control input of StoreI 353 is now the entry control of the fast loop >> (IfTrue 472) and its output only points to the corresponding Phi 442 of the fast loop. The same was >> done for the cloned node StoreI 476 of StoreI 353 for the slow loop. >> >> This bug can also be reproduced with JDK 11. Should we target this fix to 14 or defer it to 15 >> (since it's a more complex one)? >> >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/6f42d2a19117/src/hotspot/share/opto/loopUnswitch.cpp#l272 >> [2] https://bugs.openjdk.java.net/secure/attachment/85593/wrong_dependencies.png >> [3] https://bugs.openjdk.java.net/secure/attachment/85592/fixed_dependencies.png From martin.doerr at sap.com Tue Dec 10 09:43:06 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 10 Dec 2019 09:43:06 +0000 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters In-Reply-To: <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com> References: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com> <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com> Message-ID: +1 Thanks, Martin > -----Original Message----- > From: Vladimir Kozlov > Sent: Dienstag, 10. Dezember 2019 01:53 > To: Gustavo Romero ; Doerr, Martin > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XS): 8223968: Add abort type description to RTM statistic > counters > > v2 looks good to me. > > Thanks, > Vladimir > > On 12/9/19 2:05 PM, Gustavo Romero wrote: > > Hi Martin, > > > > On 12/09/2019 11:06 AM, Doerr, Martin wrote: > >> nice improvement! > > > > =) > > > > Thanks for the quick review! > > > > > >> I'd only make the message array static: > >> static const char* _abortX_desc[ABORT_STATUS_LIMIT]; > >> > >> +const char* > RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = { > >> +? "abort instruction?? ", > >> +? "may succeed on retry", > >> +? "thread conflict???? ", > >> +? "buffer overflow???? ", > >> +? "debug or trap hit?? ", > >> +? "maximum nested depth" > >> +}; > > > > oh... sure. Done. > > > > > >> Please update the copyright in the test. > > > > Done. > > > > > > v2: > > > > http://cr.openjdk.java.net/~gromero/8223968/v2/ > > > > > > Best regards, > > Gustavo From goetz.lindenmaier at sap.com Tue Dec 10 09:54:54 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 10 Dec 2019 09:54:54 +0000 Subject: Please open 8152988: [AOT] Update test batch definitions to include aot-ed java.base module mode into hs-comp testing In-Reply-To: <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com> References: <75d120c2-19ea-a180-3680-b884195dccbd@oracle.com> Message-ID: That's great, thanks! Best regards, Goetz. > -----Original Message----- > From: Tobias Hartmann > Sent: Dienstag, 10. Dezember 2019 09:45 > To: Lindenmaier, Goetz ; hotspot compiler > > Subject: Re: Please open 8152988: [AOT] Update test batch definitions to > include aot-ed java.base module mode into hs-comp testing > > Hi Goetz, > > done. > > Best regards, > Tobias > > On 10.12.19 08:39, Lindenmaier, Goetz wrote: > > Hi, > > > > I would like to downport this change. > > It would be great if the bug could be opened up. > > > > http://hg.openjdk.java.net/jdk/jdk/rev/ff10f8f3a583 > > https://bugs.openjdk.java.net/browse/JDK-8152988 > > > > Thanks and best regards, > > Goetz. > > From tobias.hartmann at oracle.com Tue Dec 10 10:31:00 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 10 Dec 2019 11:31:00 +0100 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <87eexca7sc.fsf@redhat.com> References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com> <87wob5a8gc.fsf@redhat.com> <87eexca7sc.fsf@redhat.com> Message-ID: <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com> Hi Roland, looks good to me. Best regards, Tobias On 10.12.19 09:45, Roland Westrelin wrote: > > New webrev with Martin's suggestion: > > http://cr.openjdk.java.net/~roland/8234350/webrev.01/ > > Roland. > From rwestrel at redhat.com Tue Dec 10 10:46:56 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 10 Dec 2019 11:46:56 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> Message-ID: <87blsga25b.fsf@redhat.com> Hi Christian, > http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ Is this correct? 364 set_idom(stmt, iffast_pred, dom_depth(stmt)); 366 set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt)); stmt is a non CFG so it doesn't have an idom but a control? Roland. From christian.hagedorn at oracle.com Tue Dec 10 11:43:29 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 10 Dec 2019 12:43:29 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: <87blsga25b.fsf@redhat.com> References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> <87blsga25b.fsf@redhat.com> Message-ID: Hi Roland You're right, it should be set_ctrl() instead. I changed it and added an additional non CFG sanity assertion check: http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/ Best regards, Christian On 10.12.19 11:46, Roland Westrelin wrote: > > Hi Christian, > >> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ > > Is this correct? > > 364 set_idom(stmt, iffast_pred, dom_depth(stmt)); > 366 set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt)); > > stmt is a non CFG so it doesn't have an idom but a control? > > Roland. > From rwestrel at redhat.com Tue Dec 10 12:25:23 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 10 Dec 2019 13:25:23 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> <87blsga25b.fsf@redhat.com> Message-ID: <878snk9xl8.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/ Looks good to me. Roland. From nils.eliasson at oracle.com Tue Dec 10 14:00:44 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 10 Dec 2019 15:00:44 +0100 Subject: RFR(S): Clean-up BarrierSetC2 Message-ID: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> Hi, I can across a lot of dead code in BarrierSetC2 while debugging a completely separate problem. This is leftovers from legacy barrier implementations that no longer is needed. This is a quick cleanup where non-used parts are removed. I tried to keep it as simple as possible to avoid introducing any bugs. There are additional cleanups that can be done in these files later. Bug: https://bugs.openjdk.java.net/browse/JDK-8235653 Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01 Please review, Nils Eliasson From aph at redhat.com Tue Dec 10 14:05:25 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 10 Dec 2019 14:05:25 +0000 Subject: [aarch64-port-dev ] crash due to long offset In-Reply-To: References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com> Message-ID: <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com> On 12/10/19 8:28 AM, Pengfei Li (Arm Technology China) wrote: > Hi Zhuoren, > >> I also wrote a patch to solve this issue, please also review. >> http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat >> ch > Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake. > > In "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))". > Should it be "&&" instead of "&" in the middle? > > Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad? > > I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review. I'm looking at this. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From claes.redestad at oracle.com Tue Dec 10 14:13:31 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 10 Dec 2019 15:13:31 +0100 Subject: RFR(S): Clean-up BarrierSetC2 In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> Message-ID: <3aedf4b7-2987-6d8b-f308-d922d3160df9@oracle.com> Looks good! /Claes On 2019-12-10 15:00, Nils Eliasson wrote: > Hi, > > I can across a lot of dead code in BarrierSetC2 while debugging a > completely separate problem. This is leftovers from legacy barrier > implementations that no longer is needed. > > This is a quick cleanup where non-used parts are removed. I tried to > keep it as simple as possible to avoid introducing any bugs. There are > additional cleanups that can be done in these files later. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235653 > > Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01 > > > Please review, > > Nils Eliasson > From martin.doerr at sap.com Tue Dec 10 15:29:49 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 10 Dec 2019 15:29:49 +0000 Subject: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && (use == sfpt || !use->is_reachable_from_root())) failed: missed a node In-Reply-To: <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com> References: <871ru2diu9.fsf@redhat.com> <8736e4ayfa.fsf@redhat.com> <87wob5a8gc.fsf@redhat.com> <87eexca7sc.fsf@redhat.com> <830aca53-29db-23f5-1853-cd1974ac9d60@oracle.com> Message-ID: +1 It's better readable this way. Thanks, Martin > -----Original Message----- > From: Tobias Hartmann > Sent: Dienstag, 10. Dezember 2019 11:31 > To: Roland Westrelin ; Doerr, Martin > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XS): 8234350: assert(mode == ControlAroundStripMined && > (use == sfpt || !use->is_reachable_from_root())) failed: missed a node > > Hi Roland, > > looks good to me. > > Best regards, > Tobias > > On 10.12.19 09:45, Roland Westrelin wrote: > > > > New webrev with Martin's suggestion: > > > > http://cr.openjdk.java.net/~roland/8234350/webrev.01/ > > > > Roland. > > From rkennke at redhat.com Tue Dec 10 15:37:00 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 10 Dec 2019 16:37:00 +0100 Subject: RFR(S): Clean-up BarrierSetC2 In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> Message-ID: Looks good to me, too (lots of this stuff was for Shenandoah pre-LRB). Thanks, Roman > Hi, > > I can across a lot of dead code in BarrierSetC2 while debugging a > completely separate problem. This is leftovers from legacy barrier > implementations that no longer is needed. > > This is a quick cleanup where non-used parts are removed. I tried to > keep it as simple as possible to avoid introducing any bugs. There are > additional cleanups that can be done in these files later. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235653 > > Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01 > > > Please review, > > Nils Eliasson > From martin.doerr at sap.com Tue Dec 10 16:14:58 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 10 Dec 2019 16:14:58 +0000 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com> References: <9321e601-372a-a76a-a732-9d42c48d2e6a@oracle.com> <27846f7b-c7c5-fe2c-cb6f-052be3c8c376@oracle.com> <67699754-ea63-6e49-905d-39ae0b04ad38@oracle.com> Message-ID: Hi Claes, I certainly like to have the improvement for C2. But I can't investigate impact on C1 so quickly. It may be true that increasing inlining depth is not as problematic as increasing MaxInlineSize for C1. So wrt. this issue I abstain. To make sure what we discussed won't get forgotten, I've created https://bugs.openjdk.java.net/browse/JDK-8235673 Feel free to edit or comment. Best regards, Martin > -----Original Message----- > From: Claes Redestad > Sent: Montag, 9. Dezember 2019 13:44 > To: Doerr, Martin ; Vladimir Kozlov > ; hotspot compiler dev at openjdk.java.net> > Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel > > Hi, > > Nils raised this issue in the bug, and while I think it's a fair point I > think it's orthogonal to whether or not we can/should tune the default > here and now. I think the data we have speaks in favor of doing this > tuning for JDK 14, and then re-evaluate with more real-world data for > JDK 15, possibly dialing back the C1 defaults. > > When implementing such flags we can also evaluate if C1 should be even > more conservative than it is today. It's also worth thinking about > whether or not we should introduce different settings for C1 level 1, 2 > and 3.. > > /Claes > > On 2019-12-09 12:16, Doerr, Martin wrote: > > Hi, > > > > I think tuning inlining makes sense for C2. > > > > The problem I see is that C1 uses the same inlining flags. > > C1 doesn't have the concept of uncommon traps so it compiles all paths. > > That's why I think this change is only good for C2. > > > > So I suggest separating C1 inlining flags before tuning them for C2 like in the > example below (note that new flags require CSR). > > Feedback for this idea is welcome. > > > > Best regards, > > Martin > > > > > > > > diff -r 45fceff98bb5 src/hotspot/share/c1/c1_globals.hpp > > --- a/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 10:26:41 2019 > +0100 > > +++ b/src/hotspot/share/c1/c1_globals.hpp Mon Dec 09 12:08:37 2019 > +0100 > > @@ -174,6 +174,12 @@ > > develop_pd(bool, RoundFPResults, \ > > "Indicates whether rounding is needed for floating point results")\ > > \ > > + product(intx, C1MaxInlineSize, 35, \ > > + "The maximum bytecode size of a method to be inlined by C1") \ > > + product(intx, C1MaxInlineLevel, 9, \ > > + "The maximum number of nested calls that are inlined by C1") \ > > + product(intx, C1MaxRecursiveInlineLevel, 1, \ > > + "maximum number of nested recursive calls that are inlined") \ > > develop(intx, NestedInliningSizeRatio, 90, \ > > "Percentage of prev. allowed inline size in recursive inlining") \ > > range(0, 100) \ > > > > > > > > > >> -----Original Message----- > >> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Claes Redestad > >> Sent: Montag, 9. Dezember 2019 11:15 > >> To: Vladimir Kozlov ; hotspot compiler > >> > >> Subject: Re: RFR: 8234863: Increase default value of MaxInlineLevel > >> > >> Thanks for the review! > >> > >> /Claes > >> > >> On 2019-12-09 01:52, Vladimir Kozlov wrote: > >>> Nice finding! Good. > >>> > >>> Thanks, > >>> Vladimir > >>> > >>> On 12/8/19 1:44 PM, Claes Redestad wrote: > >>>> Hi, > >>>> > >>>> increasing MaxInlineLevel can substantially improve performance in > some > >>>> benchmarks[1], and has been reported to help applications > implemented > >> in > >>>> scala in particular. > >>>> > >>>> There is always some risk of regressions when tweaking the default > >>>> inlining settings. I've done a number of experiments to ascertain that > >>>> the effect of increasing this on a wide array of benchmarks. With 15 all > >>>> benchmarks tested are show either neutral or positive results, with no > >>>> observed regression w.r.t. compilation speed or code cache usage. > >>>> > >>>> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ > >>>> > >>>> Thanks! > >>>> > >>>> /Claes > >>>> > >>>> [1] One http://renaissance.dev sub-benchmark improve by almost 3x > >> with > >>>> an increase from 9 to 15. From sandhya.viswanathan at intel.com Tue Dec 10 16:57:12 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 10 Dec 2019 16:57:12 +0000 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: Thanks a lot John. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of John Rose Sent: Monday, December 09, 2019 5:27 PM To: Vladimir Ivanov Cc: hotspot compiler Subject: Re: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations Reviewed by me also. This is a better way to do vector stuff. Thanks, Vladimir K, for the perceptive questions. ? John From vladimir.x.ivanov at oracle.com Tue Dec 10 17:00:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 20:00:32 +0300 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> Message-ID: <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> Thanks for the reviews, Vladimir and John. > The rules for NegV[DF] and CMoveV[DF] that remain looks > puzzling to me, and would benefit from a brief comment for > future readers, explaining why they are there. > > The NegV rules seem to be removable for the same reason > (size limits) that the other rules are removable. If there?s a The checks are there to ensure AVX512DQ is present and vlen check limits it to 512-bit vectors: case Op_NegVF: if ((vlen == 16) && (VM_Version::supports_avx512dq() == false)) ret_value = false; break; case Op_NegVD: if ((vlen == 8) && (VM_Version::supports_avx512dq() == false)) ret_value = false; break; Similar checks for byte operations are redundant, because Matcher::vector_width_in_bytes() explicitly checks for AVX512BW: if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR)) size = (VM_Version::supports_avx512bw()) ? 64 : 32; I can add a comment that AVX512DQ is required, but it doesn't look like an improvement since the checks say exactly the same. > tricky reason they must be called out explicitly, it would be > good to explain. The CMoveV rules probably stem from limited > support for CMove, but again a comment would be good. Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks are there to avoid matching failures. I'll put a comment that it's an implementation limitation. Best regards, Vladimir Ivanov > On Dec 5, 2019, at 1:53 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8234392 >> >> Make basic type available in Matcher::match_rule_supported_vector() and refactor Matcher::match_rule_supported_vector() on x86. >> >> It enables significant simplification of Matcher::match_rule_supported_vector()on x86 by using Matcher::vector_size_supported() to cover cases like AXV2 is required for 256-bit operaions on integral (see Matcher::vector_width_in_bytes() [1] for details). >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Testing: tier1-4 >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1442 > From vladimir.x.ivanov at oracle.com Tue Dec 10 17:12:35 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 20:12:35 +0300 Subject: [14] RFR (L): 8235405: C2: Merge AD instructions for different vector operations In-Reply-To: References: <8487281f-14c5-1516-8bf6-d4ac7a3ca98d@oracle.com> Message-ID: <6e7d058f-ac92-45e8-8b6b-5213a55dce4c@oracle.com> Thanks for reviews, Vladimir and John. And thanks for handling the review and answering the questions, Sandhya. Best regards, Vladimir Ivanov On 10.12.2019 04:27, John Rose wrote: > Reviewed by me also. This is a better way to do vector stuff. > > Thanks, Vladimir K, for the perceptive questions. > > ? John > From bob.vandette at oracle.com Tue Dec 10 17:22:28 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 10 Dec 2019 12:22:28 -0500 Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks -XX:-EnableJVMCI In-Reply-To: References: Message-ID: <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com> Looks good to me. Bob. > On Dec 9, 2019, at 8:01 PM, Vladimir Kozlov wrote: > > https://cr.openjdk.java.net/~kvn/8235539/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8235539 > > Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true. > > Changes are prepared by Doug for GraalVM and adapted by me for JDK 14. > > Passed tier1 where new test runs. > > Thanks, > Vladimir From gromero at linux.vnet.ibm.com Tue Dec 10 17:23:04 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 10 Dec 2019 14:23:04 -0300 Subject: RFR(XS): 8223968: Add abort type description to RTM statistic counters In-Reply-To: References: <0e4e3c6c-647f-453a-3284-acc463c88e1f@linux.vnet.ibm.com> <3aefe4b7-1be7-64cd-f27c-1db9e9bb29b7@oracle.com> Message-ID: Thanks a lot Martin and Vladimir for the reviews. Pushed to jdk/jdk. Best regards, Gustavo On 12/10/2019 06:43 AM, Doerr, Martin wrote: > +1 > > Thanks, > Martin > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Dienstag, 10. Dezember 2019 01:53 >> To: Gustavo Romero ; Doerr, Martin >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(XS): 8223968: Add abort type description to RTM statistic >> counters >> >> v2 looks good to me. >> >> Thanks, >> Vladimir >> >> On 12/9/19 2:05 PM, Gustavo Romero wrote: >>> Hi Martin, >>> >>> On 12/09/2019 11:06 AM, Doerr, Martin wrote: >>>> nice improvement! >>> >>> =) >>> >>> Thanks for the quick review! >>> >>> >>>> I'd only make the message array static: >>>> static const char* _abortX_desc[ABORT_STATUS_LIMIT]; >>>> >>>> +const char* >> RTMLockingCounters::_abortX_desc[ABORT_STATUS_LIMIT] = { >>>> +? "abort instruction?? ", >>>> +? "may succeed on retry", >>>> +? "thread conflict???? ", >>>> +? "buffer overflow???? ", >>>> +? "debug or trap hit?? ", >>>> +? "maximum nested depth" >>>> +}; >>> >>> oh... sure. Done. >>> >>> >>>> Please update the copyright in the test. >>> >>> Done. >>> >>> >>> v2: >>> >>> http://cr.openjdk.java.net/~gromero/8223968/v2/ >>> >>> >>> Best regards, >>> Gustavo From vladimir.kozlov at oracle.com Tue Dec 10 17:35:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 09:35:08 -0800 Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks -XX:-EnableJVMCI In-Reply-To: References: Message-ID: <537cfe7f-8710-3ebe-2e15-c4751326cf79@oracle.com> Thank you, Tobias On 12/10/19 12:53 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Please add a bug number to the test. Also, the Expectation.name/value/origin fields are not needed, > right? No new webrev required. I added @bug. But I would leave fields there to match code in GraalVM. Thanks, Vladimir > > Best regards, > Tobias > > On 10.12.19 02:01, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8235539 >> >> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag >> is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true. >> >> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14. >> >> Passed tier1 where new test runs. >> >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Tue Dec 10 17:35:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 09:35:42 -0800 Subject: [14] RFR(M) 8235539: [JVMCI] -XX:+EnableJVMCIProduct breaks -XX:-EnableJVMCI In-Reply-To: <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com> References: <6F62B34F-993A-44AE-BD6E-AB7CCB3F18BF@oracle.com> Message-ID: <2c073a26-60cc-56e4-7820-72ebebc7c0f7@oracle.com> Thank you, Bob Vladimir On 12/10/19 9:22 AM, Bob Vandette wrote: > Looks good to me. > > Bob. > >> On Dec 9, 2019, at 8:01 PM, Vladimir Kozlov wrote: >> >> https://cr.openjdk.java.net/~kvn/8235539/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8235539 >> >> Allow to reset EnableJVMCI and UseJVMCICompiler values on command line when EnableJVMCIProduct flag is used (added by JDK-8232118). EnableJVMCIProduct flag sets EnableJVMCI and UseJVMCICompiler to true. >> >> Changes are prepared by Doug for GraalVM and adapted by me for JDK 14. >> >> Passed tier1 where new test runs. >> >> Thanks, >> Vladimir > From navy.xliu at gmail.com Tue Dec 10 18:19:10 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Tue, 10 Dec 2019 10:19:10 -0800 Subject: RFR(XXS): C1 compilation fails with -XX:+PrintIRDuringConstruction -XX:+Verbose In-Reply-To: References: <7176592d-4784-0797-4457-463773464536@oracle.com> <83d26c83-d39c-3381-f3d4-8f6ecab847fa@oracle.com> Message-ID: hi, Tobias, Great, thanks! thanks, --lx On Tue, Dec 10, 2019 at 1:34 AM Tobias Hartmann wrote: > Hi Liu, > > looks good and trivial to me. I'll sponsor. > > Best regards, > Tobias > > On 09.12.19 09:17, Liu Xin wrote: > > Hi, Tobias, > > > > Thanks for your feedback. > > Here is the new webrev. I update what you pointed out. > > https://cr.openjdk.java.net/~xliu/8235383/02/webrev/ > > > > The patch passed hotspot-tier1 for both fastdebug and release builds. > > > > thanks, > > --lx > > > > > > On Sun, Dec 8, 2019 at 11:15 PM Tobias Hartmann < > tobias.hartmann at oracle.com > > > wrote: > > > > Hi Liu, > > > > thanks for adding the test. > > > > We try to avoid bug ids as test names. In this case, I would suggest > something like > > TestPrintIRDuringConstruction with a compiler.c1 package > declaration. Also, the first line of the > > copyright header looks wrong (should look like this [1]). > > > > line 27-29: I don't think you need these lines > > line 42: "initiail" -> "initial" > > > > Thanks, > > Tobias > > > > [1] > > > http://hg.openjdk.java.net/jdk/jdk/raw-file/fb39a8d1d101/test/hotspot/jtreg/compiler/c1/TestGotoIfMain.java > > > > On 07.12.19 05:26, Liu Xin wrote: > > > hi, Tobias, > > > > > > Thank you for reviewing it. I add a regression test about it. > Could you take a look? > > > https://cr.openjdk.java.net/~xliu/8235383/01/webrev/ > > > > > > thanks, > > > > > > --lx > > > > > > > > > > > > On Fri, Dec 6, 2019 at 12:24 AM Tobias Hartmann < > tobias.hartmann at oracle.com > > > > > tobias.hartmann at oracle.com>>> wrote: > > > > > > Hi Liu, > > > > > > your fix looks good to me but could you please add a > regression test? > > > > > > Thanks, > > > Tobias > > > > > > On 06.12.19 08:23, Liu Xin wrote: > > > > Hi, Reviewers, > > > > > > > > Could you review this very simple bugfix for C1? > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235383 > > > > Webrev: https://cr.openjdk.java.net/~xliu/8235383/webrev/ > > > > > > > > The root cause is some instructions are going to be > eliminated, so they are > > > > not assigned to any valid bci. > > > > In present of -XX:+PrintIRDuringConstruction -XX:+Verbose, > C1 will print > > > > them out and then hit the assert. > > > > > > > > Yes, I can twiddle graph_builder to assign right BCIs to > them, but I would > > > > like to have a more robust InstructionPrinter::print_line. > the CR will > > > > leave blanks in the position of bci. > > > > Eliminated store for object 0: > > > > . 0 a67 a58._24 := a54 (L) next > > > > Eliminated load: > > > > . 0 i35 a11._24 (I) position > > > > > > > > thanks, > > > > --lx > > > > > > > > > > From john.r.rose at oracle.com Tue Dec 10 18:43:08 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 10:43:08 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> Message-ID: <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> On Dec 10, 2019, at 9:00 AM, Vladimir Ivanov wrote: > > The checks are there to ensure AVX512DQ is present and vlen check limits it to 512-bit vectors: > > case Op_NegVF: > if ((vlen == 16) && (VM_Version::supports_avx512dq() == false)) > ret_value = false; > break; > > case Op_NegVD: > if ((vlen == 8) && (VM_Version::supports_avx512dq() == false)) > ret_value = false; > break; > > Similar checks for byte operations are redundant, because Matcher::vector_width_in_bytes() explicitly checks for AVX512BW: > > if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR)) > size = (VM_Version::supports_avx512bw()) ? 64 : 32; It?s true that 512bw and 512dq are different branches of 512, so it is permissible to detect them in different places. But? it would seem easier to follow to check both in the one place: if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR)) size = (VM_Version::supports_avx512bw()) ? 64 : 32; + if (UseAVX > 2 && (bt == T_FLOAT || bt == T_DOUBLE)) + size = (VM_Version::supports_avx512dq()) ? 64 : 32; This is probably an oversimplification, so file it as BS (brain storming). If such a consolidation of sizing logic is possible, it can be done as a separate cleanup. (Side observation: Since Abs and Neg are treated together in the vabsnegf macro-instruction, it seems odd that there would be special processing of NegVF and not AbsVF also. More cleanups to do?) > I can add a comment that AVX512DQ is required, but it doesn't look like an improvement since the checks say exactly the same. Maybe add a comment cross-referencing the two hunks of logic, since they are not parallel in the sources? (For whatever reasons; it doesn?t matter?) It?s important to be able to find all of the vector-sizing logic so it can be analyzed together, even if it must be distributed into different locations. > tricky reason they must be called out explicitly, it would be >> good to explain. The CMoveV rules probably stem from limited >> support for CMove, but again a comment would be good. > > Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks are there to avoid matching failures. I'll put a comment that it's an implementation limitation. > Yes, comments. The matcher logic is tricky, and the structure of AVX512 is tricky, and comments would help navigate it all. ? John From vladimir.x.ivanov at oracle.com Tue Dec 10 19:35:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 22:35:31 +0300 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes Message-ID: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all https://bugs.openjdk.java.net/browse/JDK-8235688 Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following node types: - AddV(B/S/I/L/F/D) - SubVB/.../SubVD - MulVB/.../MulVD Individual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html From vladimir.x.ivanov at oracle.com Tue Dec 10 20:03:03 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2019 23:03:03 +0300 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> Message-ID: <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> >> Similar checks for byte operations are redundant, because >> Matcher::vector_width_in_bytes() explicitly checks for AVX512BW: >> >> ?if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR)) >> ???size = (VM_Version::supports_avx512bw()) ? 64 : 32; > > It?s true that 512bw and 512dq are different branches of 512, so it > is permissible to detect them in different places. ?But? it would > seem easier to follow to check both in the one place: > > ? ?if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR)) > ? ? ?size = (VM_Version::supports_avx512bw()) ? 64 : 32; > +? if (UseAVX > 2 && (bt == T_FLOAT || bt == T_DOUBLE)) > + ? ?size = (VM_Version::supports_avx512dq()) ? 64 : 32; > > This is probably an oversimplification, so file it as BS (brain storming). > If such a consolidation of sizing logic is possible, it can be done as > a separate cleanup. Unfortunately, single- and double-precision FP operations are scattered between AVX512F and AVX512DQ, so I doubt it makes sense to limit FP operations to 256-bit vectors when AVX512DQ is not available. It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline. Anyway, I got your idea and it makes perfect sense to me to collect such ideas. > (Side observation: ?Since Abs and Neg are treated together in the?vabsnegf > macro-instruction, it seems odd that there would be special processing > of NegVF and not AbsVF also. ?More cleanups to do?) Yes, it's a first round and there are other opportunities left. More will come after the patches are upstreamed. >> I can add a comment that AVX512DQ is required, but it doesn't look >> like an improvement since the checks say exactly the same. > > Maybe add a comment cross-referencing the two hunks of logic, > since they are not parallel in the sources? ?(For whatever reasons; > it doesn?t matter?) ?It?s important to be able to find all of the > vector-sizing logic so it can be analyzed together, even if it must > be distributed into different locations. > >> tricky reason they must be called out explicitly, it would be >>> good to explain. ?The CMoveV rules probably stem from limited >>> support for CMove, but again a comment would be good. >> >> Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks >> are there to avoid matching failures. I'll put a comment that it's an >> implementation limitation. >> > > Yes, comments. ?The matcher logic is tricky, and the structure > of AVX512 is tricky, and comments would help navigate it all. Yes, the complexity (very) quickly adds up... (Or "multiplies up"?..) Best regards, Vladimir Ivanov From ekaterina.pavlova at oracle.com Tue Dec 10 20:08:38 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Tue, 10 Dec 2019 12:08:38 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: References: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com> Message-ID: Igor, please see new webrev here: http://cr.openjdk.java.net/~epavlova//8215728/webrev.01/ thanks, -katya On 11/13/19 8:59 PM, Igor Ignatyev wrote: > if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode. > -- Igor > >> On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova wrote: >> >> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher. >> However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so >> getModuleExports() function works properly for graal modules. >> Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg >> directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it. >> See also discussion regarding this issue in JDK-8216551. >> >> Anyway, I understand the point regarding tier1 and will see what can be done. >> >> thanks, >> -katya >> >> On 11/13/19 1:45 PM, Igor Ignatyev wrote: >>> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group. >>> -- Igor >>>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova wrote: >>>> >>>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. >>>> >>>> On 11/13/19 1:30 PM, Igor Ignatyev wrote: >>>>> Hi Katya, >>>>> shouldn't this group be also into tier1_compiler group? >>>>> -- Igor >>>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >>>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >>>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >>>>>> time to execute. >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >>>>>> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >>>>>> testing: tier1 >>>>>> >>>>>> >>>>>> thanks, >>>>>> -katya >>>> >> > From tom.rodriguez at oracle.com Tue Dec 10 20:24:46 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Dec 2019 12:24:46 -0800 Subject: RFR(S) 8229961: Assert failure in compiler/graalunit/HotspotTest.java Message-ID: http://cr.openjdk.java.net/~never/8229961/webrev https://bugs.openjdk.java.net/browse/JDK-8229961 The JVMCI InstalledCode object maintains a link back to the CodeBlob it's associated with. In several places JVMCI extracts the CodeBlob* or nmethod* from the InstalledCode and the operates on it. In most other cases where an nmethod is being examined there are external factors keeping it alive but in this case there aren't. The actions of the concurrent sweeper can transition or potentially free the nmethod while it's being used. Getting all the way to freeing would take a very adversarial schedule but JVMCI should provide more safety for the these code paths. This fix adds an nmethodLocker to these usages and through careful use of locks attempts to ensure that the resulting locked nmethod is alive and locked when it's returned. I had to modify some of the jtreg tests because they were badly abusing the InstalledCode API and violating some assumptions about the relationship between nmethods and their wrapper InstalledCode objects. Testing was clean apart from a test compilation problem and existing issue. I'm resubmitting with the fixed jtreg test. From john.r.rose at oracle.com Tue Dec 10 20:32:26 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 12:32:26 -0800 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> Message-ID: <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com> Reviewed. I concur with the occasional simplification of the printed formats. (The combined diff hunks are epic diff salad; such is life. Despite all that it reads OK in Emacs diff-auto-refine-mode. And the individual diffs are very helpful as a backup, especially for byte multiply.) ? John > On Dec 10, 2019, at 11:35 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235688 > > Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following node types: > - AddV(B/S/I/L/F/D) > - SubVB/.../SubVD > - MulVB/.../MulVD > > Individual patches: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual Typo somewhere: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/indiviual/ > > As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html From john.r.rose at oracle.com Tue Dec 10 20:34:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 12:34:39 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> Message-ID: On Dec 10, 2019, at 12:03 PM, Vladimir Ivanov wrote: > >> This is probably an oversimplification, so file it as BS (brain storming). >> If such a consolidation of sizing logic is possible, it can be done as >> a separate cleanup. > > Unfortunately, single- and double-precision FP operations are scattered between AVX512F and AVX512DQ, so I doubt it makes sense to limit FP operations to 256-bit vectors when AVX512DQ is not available. > > It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline. Yeah, I saw that coming after I visited the trusty intrinsics guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/ > > Anyway, I got your idea and it makes perfect sense to me to collect such ideas. Thanks! Next review, please? From igor.ignatyev at oracle.com Tue Dec 10 21:33:13 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 10 Dec 2019 13:33:13 -0800 Subject: RFR(T/XS) 8215728: [Graal] we should run some Graal tests in tier1 In-Reply-To: References: <9d853b2a-05b8-b5eb-dbe7-f6a00396269b@oracle.com> <72C6DC4D-1FD2-4710-8982-B08BCDA9023B@oracle.com> <93eb8e6c-9b36-5797-4445-635db0c0e509@oracle.com> Message-ID: <15931779-635A-452F-810D-C4A86FB79B5B@oracle.com> LGTM. -- Igor > On Dec 10, 2019, at 12:08 PM, Ekaterina Pavlova wrote: > > Igor, > please see new webrev here: > http://cr.openjdk.java.net/~epavlova//8215728/webrev.01/ > > thanks, > -katya > > On 11/13/19 8:59 PM, Igor Ignatyev wrote: >> if GraalUnitTestLauncher has to be run w/ -XX:+EnableJVMCI, I guess we just have to switch back to othervm mode. >> -- Igor >>> On Nov 13, 2019, at 8:52 PM, Ekaterina Pavlova wrote: >>> >>> -XX:+EnableJVMCI is already passed to the spawn JVM by GraalUnitTestLauncher. >>> However -XX:+EnableJVMCI is also required for GraalUnitTestLauncher itself so >>> getModuleExports() function works properly for graal modules. >>> Also note that we can't pass '-XX:+EnableJVMCI' to GraalUnitTestLauncher in jtreg >>> directive as we use '@run main compiler.graalunit.common.GraalUnitTestLauncher' to launch it. >>> See also discussion regarding this issue in JDK-8216551. >>> >>> Anyway, I understand the point regarding tier1 and will see what can be done. >>> >>> thanks, >>> -katya >>> >>> On 11/13/19 1:45 PM, Igor Ignatyev wrote: >>>> all tier1 groups are expected to runnable as-is, so I think we need to update GraalUnitTestLauncher to pass -XX:+EnableJVMCI to the spawn JVM, and updated the test descriptions (and generateTests.sh) to require JVM w/ jvmci feature (@requires vm.jvmci)? then this will be a proper tier1 group. >>>> -- Igor >>>>> On Nov 13, 2019, at 1:38 PM, Ekaterina Pavlova wrote: >>>>> >>>>> The tests require -XX:+EnableJVMCI to be run with, so this is why I created separate group. >>>>> >>>>> On 11/13/19 1:30 PM, Igor Ignatyev wrote: >>>>>> Hi Katya, >>>>>> shouldn't this group be also into tier1_compiler group? >>>>>> -- Igor >>>>>>> On Nov 13, 2019, at 1:28 PM, Ekaterina Pavlova wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> please review this small patch which defines jtreg test group 'tier1_compiler_graal' so we can run it >>>>>>> as part of tier1 testing. compiler/graalunit/HotspotTest.java is the most frequent failed test based on >>>>>>> bugs statistics, so I included only this group of tests for now. They also takes reasonable amount of >>>>>>> time to execute. >>>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8215728 >>>>>>> webrev: http://cr.openjdk.java.net/~epavlova//8215728/webrev.00/index.html >>>>>>> testing: tier1 >>>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> -katya >>>>> >>> > From richard.reingruber at sap.com Tue Dec 10 21:45:28 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 10 Dec 2019 21:45:28 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Message-ID: Hi, I would like to get reviews please for http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ Corresponding RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the change is being tested at SAP since I posted the first RFR some months ago. The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI agents request capabilities that allow them to access local variable values. E.g. if you start-up with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right from the beginning, well before a debugger attaches -- if ever one should do so. With the enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based optimizations are reverted just before an agent acquires the reference to an object. In the JBS item you'll find more details. Thanks, Richard. [1] Experimental fix for JDK-8214584 based on JDK-8227745 http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch From vladimir.kozlov at oracle.com Tue Dec 10 21:56:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 13:56:04 -0800 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> Message-ID: <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com> On 12/10/19 11:35 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235688 > > Reduce the number of AD instructions needed to implement vector > operations by merging existing ones. The patch covers the following node > types: > ? - AddV(B/S/I/L/F/D) > ? - SubVB/.../SubVD > ? - MulVB/.../MulVD > > Individual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual The link does not work because directory name on server is 'indiviual' There is inconsistency in method names and comments string in some cases (I expects that number '2' is removed from names as in most cases): instruct vadd2L(vec dst, vec src) %{ ... format %{ "paddq $dst,$src\t! add packed2L" %} format %{ "addpd $dst,$src\t! add packed2D" %} format %{ "psubq $dst,$src\t! sub packed2L" %} format %{ "subpd $dst,$src\t! sub packed2D" %} format %{ "vsubpd $dst,$src1,$src2\t! sub packed2D" %} format %{ "vsubpd $dst,$src,$mem\t! sub packed2D" %} format %{ "mulpd $dst,$src\t! mul packed2D" %} The rest seems fine. I agree with short formats in webrev.12.vmul_byte. Thanks, Vladimir > > As Jatin described, merging is applied only to AD instructions of > similar shape. There are some more opportunities for reduction/merging > left, but they are deliberately left out for future work. > > The patch is derived from the inintial version of generic vector support > [1]. Generic vector support was reviewed earlier [2]. > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > > [2] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html > From vladimir.kozlov at oracle.com Tue Dec 10 22:14:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 14:14:21 -0800 Subject: RFR(S) 8229961: Assert failure in compiler/graalunit/HotspotTest.java In-Reply-To: References: Message-ID: Looks good. Thank you for fixing JVMCI tests! Vladimir On 12/10/19 12:24 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8229961/webrev > https://bugs.openjdk.java.net/browse/JDK-8229961 > > The JVMCI InstalledCode object maintains a link back to the CodeBlob > it's associated with.? In several places JVMCI extracts the CodeBlob* or > nmethod* from the InstalledCode and the operates on it.? In most other > cases where an nmethod is being examined there are external factors > keeping it alive but in this case there aren't.? The actions of the > concurrent sweeper can transition or potentially free the nmethod while > it's being used.? Getting all the way to freeing would take a very > adversarial schedule but JVMCI should provide more safety for the these > code paths. > > This fix adds an nmethodLocker to these usages and through careful use > of locks attempts to ensure that the resulting locked nmethod is alive > and locked when it's returned.? I had to modify some of the jtreg tests > because they were badly abusing the InstalledCode API and violating some > assumptions about the relationship between nmethods and their wrapper > InstalledCode objects. > > Testing was clean apart from a test compilation problem and existing > issue.? I'm resubmitting with the fixed jtreg test. From john.r.rose at oracle.com Tue Dec 10 22:17:07 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 14:17:07 -0800 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com> Message-ID: <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com> Nice catch! > On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov wrote: > > There is inconsistency in method names and comments string in some cases (I expects that number '2' is removed from names as in most cases): From vladimir.x.ivanov at oracle.com Tue Dec 10 22:29:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 01:29:01 +0300 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes Message-ID: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all https://bugs.openjdk.java.net/browse/JDK-8235719 Merge AD instructions for the following vector nodes: - LShiftV*, RShiftV*, URShiftV* - AbsV* - NegV* Individual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html From vladimir.x.ivanov at oracle.com Tue Dec 10 22:35:51 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 01:35:51 +0300 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com> <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com> Message-ID: <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com> Thanks for reviews, Vladimir and John. Updated webrev (sorry, no individual patches this time): http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.01 Best regards, Vladimir Ivanov On 11.12.2019 01:17, John Rose wrote: > Nice catch! > >> On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov wrote: >> >> There is inconsistency in method names and comments string in some cases (I expects that number '2' is removed from names as in most cases): > From vladimir.x.ivanov at oracle.com Tue Dec 10 22:41:06 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 01:41:06 +0300 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> <273D2CF0-F8F5-4542-8770-22EF728529C4@oracle.com> Message-ID: > I concur with the occasional simplification of the printed formats. It would be nice for ADLC to automatically generate AD instruction description when "format" declaration is omitted. Just printing instruction name and operand values should be as informative most of the time. Best regards, Vladimir Ivanov > (The combined diff hunks are epic diff salad; such is life. > Despite all that it reads OK in Emacs diff-auto-refine-mode. > And the individual diffs are very helpful as a backup, > especially for byte multiply.) > > ? John > >> On Dec 10, 2019, at 11:35 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/all >> https://bugs.openjdk.java.net/browse/JDK-8235688 >> >> Reduce the number of AD instructions needed to implement vector operations by merging existing ones. The patch covers the following node types: >> - AddV(B/S/I/L/F/D) >> - SubVB/.../SubVD >> - MulVB/.../MulVD >> >> Individual patches: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/individual > > Typo somewhere: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.00/indiviual/ >> >> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. >> >> The patch is derived from the inintial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. >> >> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >> >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html > From john.r.rose at oracle.com Tue Dec 10 23:31:13 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 15:31:13 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> Message-ID: Actually I have one more comment about the new classification logic: The ?ret_value? idiom is terrible. I see a function which is complex, with something like this at the top: if (? simple size check ?) { ret_value = false; // size not supported } ? I want to read something more decisive like this: if (? simple size check ?) { return false; // size not supported } ? The ?ret_value? thingy adds only noise, and no clarity. It is more than an annoyance, hence my comment here. The problem is that if I want to understand the quick check above, I have to scroll down *past all the other checks* to see if some joker does ?ret_value = true? before the ?return ret_value?, subverting my understanding of the code. Basically, the ?ret_value? nonsense makes the code impossible to break into understandable parts. ? Grumpy John On Dec 10, 2019, at 12:34 PM, John Rose wrote: > >> It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline. > > Yeah, I saw that coming after I visited the trusty intrinsics guide > https://software.intel.com/sites/landingpage/IntrinsicsGuide/ >> >> Anyway, I got your idea and it makes perfect sense to me to collect such ideas. > From rkennke at redhat.com Tue Dec 10 23:57:48 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Dec 2019 00:57:48 +0100 Subject: RFR: 8235729: Shenandoah: Remove useless casting to non-constant Message-ID: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com> We have a couple of code-paths in Shenandoah's C2 barrier code that cast values to non-constant. This appears to be a left-over from pre-LRB barrier scheme. This patch removes those casts as well as the relevant methods in type.hpp/cpp. Shenandoah has been the only user of this. Bug: https://bugs.openjdk.java.net/browse/JDK-8235729 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8235729/webrev.00/ Testing: hotspot_gc_shenandoah, ctw-tests with default and traversal mode Can I please get a review of the change? Thanks, Roman From john.r.rose at oracle.com Wed Dec 11 00:00:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 10 Dec 2019 16:00:50 -0800 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> Message-ID: Thank you, reviewed. For consistency, I?d expect to see the AD file mention vshiftq instead of in or in addition to the very specific evpsraq. Maybe actually call vshiftq (consistent with other parts of AD) and comment that it calls evpsraq? I had to look inside the macro assembler to verify that evpsraq was properly aligned with the other cases. Or just leave a comment saying this is what vshiftq would do also, like the other instructions. For vabsnegF, I suggest adding a comment here: + predicate(n->as_Vector()->length() == 2 || + // case 4 is handled as a 1-operand instruction by vabsneg4F + n->as_Vector()->length() == 8 || + n->as_Vector()->length() == 16); ? Commentator John On Dec 10, 2019, at 2:29 PM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235719 > > Merge AD instructions for the following vector nodes: > - LShiftV*, RShiftV*, URShiftV* > - AbsV* > - NegV* > > Individual patches: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual > > As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. > > The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html From gromero at linux.vnet.ibm.com Wed Dec 11 00:04:31 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 10 Dec 2019 21:04:31 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 Message-ID: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> Hi, Could the following change be reviewed please? Bug : https://bugs.openjdk.java.net/browse/JDK-8234599 Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/ POWER9 does not have any cache management or barrier instructions aimed specifically to deal with persistent memory (NVDIMM), hence in Power ISA v3.0 there are no instructions like data cache flush/store or sync instructions specific to persistent memory. Nonetheless, in some cases (like through hardware emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct mapping of persisnt memory) and a /dev/pmem device is available it's possible to use data cache and sync instructions in the ISA (which are not explicitly aimed to persistent memory) on a memory region backed by DAX, i.e. mapped using new mmap() flag MAP_SYNC for persistent memory), so these instructions will have the same semantics as the instructions for data cache flush / store and sync on other architectures supporting NVDIMM and that have explicit instructions to deal with persistent memory. This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync instruction to sync data to a memory backed by DAX (to persistent memory) when that's required. Moreover, that change also paves the way for supporting NVDIMM on future Power CPUs that might have new data cache management and barrier instructions explicitly to deal with persistent memory. The change was developed and tested using a P9 QEMU VM (emulation) with NVDIMM support. For details on how to setup a proper QEMU + Linux kernel w/ a /dev/pmem device configured please see recipe in [2]. The JVM on a POWER9 machine with a /dev/pmem device properly set and with that change applied is able to pass the test for JEP-352 [3]. The JVM is also able to pass all tests of Mashona library [4]. When DAX is not supported, like on POWER8 and POWER9 w/o DAX support, OS won't support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when code will try to allocate memory backed by a persistent memory device, so the JVM will get a "java.io.IOException: Operation not supported". Naturally, on machines that don't support writebacks or pmem supports_data_cache_line_flush() will return false, but even in that case JVM will hit a EOPNOTSUPP and get a "java.io.IOException: Operation not supported" sooner than it has the chance to try to emit any writeback + sync instructions. Thank you. Best regards, Gustavo [1] http://man7.org/linux/man-pages/man2/mmap.2.html [2] https://github.com/gromero/nvdimm [3] http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/MappedByteBuffer/PmemTest.java [4] https://github.com/jhalliday/mashona From vladimir.kozlov at oracle.com Wed Dec 11 00:14:17 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 16:14:17 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: References: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> Message-ID: On 12/10/19 1:24 AM, Tom Rodriguez wrote: > > > Vladimir Kozlov wrote on 12/9/19 8:09 PM: >> But it looks like there are testing failures. > > None of the testing failures look like they have anything to do with > these changes.? They all have failed recently in other builds from what > I can see. You are right. I was concern about 2 kitchensink failures but they are present in current code too based on failure history. > >> >> The idea and changes seems fine to me. The only question I have is why >> JVMCI code want to deoptimize all related compiler frames immediately? >> Why marking is not enough? > > I'm not sure what you're asking.? mark_for_deoptimization doesn't really > do anything on it's own.? You have to do the code cache scan to > make_not_entrant and then do the stack walks to invalidate any frames > which are still using that code.? I'm not changing the semantic of the > original function either, just skipping a useless full code cache scan. Yes, right. I am confusing because not in all places we do this sequence: mark_for_deoptimization, make_not_entrant, patch frames. Patching frames are not always at the same place as we do marking. I also noticed that you changed under which locks make_not_entrant is done. May be better to pass nmethod into deoptimize_all_marked() and do it there (by default pass NULL)? Thanks, Vladimir > > tom > >> >> Thanks, >> Vladimir >> >> On 12/9/19 12:10 PM, Tom Rodriguez wrote: >>> http://cr.openjdk.java.net/~never/8229377/webrev >>> https://bugs.openjdk.java.net/browse/JDK-8229377 >>> >>> This is a minor improvement to the JVMCI invalidate method to avoid >>> scanning large code caches when invalidating a single nmethod. >>> Instead the nmethod is directly made not_entrant.? In general I'm >>> unclear what the benefit of the >>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. >>> Testing is in progress. >>> >>> JDK-8230884 had been previously duplicated against this because they >>> overlapped a bit, but in the interest of clarity I separated them again. >>> >>> tom From vladimir.kozlov at oracle.com Wed Dec 11 00:18:15 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 16:18:15 -0800 Subject: [14] RFR (L): 8235688: C2: Merge AD instructions for AddV, SubV, and MulV nodes In-Reply-To: <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com> References: <01485ec9-7b6d-4184-71d0-17674216a941@oracle.com> <5d88513e-8826-bb1e-47ad-74175df47ead@oracle.com> <8B6F7BDE-DC8A-49AA-BB5A-0CABAF8F9D5C@oracle.com> <6ef57374-a8db-6128-3de5-a04ab8a848e8@oracle.com> Message-ID: <3d787888-4031-8c0c-8d9f-fe9e9e588fd8@oracle.com> Good. Thanks, Vladimir On 12/10/19 2:35 PM, Vladimir Ivanov wrote: > Thanks for reviews, Vladimir and John. > > Updated webrev (sorry, no individual patches this time): > > ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235688/webrev.01 > > Best regards, > Vladimir Ivanov > > On 11.12.2019 01:17, John Rose wrote: >> Nice catch! >> >>> On Dec 10, 2019, at 1:56 PM, Vladimir Kozlov >>> wrote: >>> >>> There is inconsistency in method names and comments string in some >>> cases (I expects that number '2' is removed from names as in most >>> cases): >> From smita.kamath at intel.com Wed Dec 11 01:41:12 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Wed, 11 Dec 2019 01:41:12 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 Message-ID: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> Hi, As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 Link to webrev : http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. [1] https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 2C 5-471) [2] https://software.intel.com/en-us/articles/intel-software-development-emulator Regards, Smita Kamath From vladimir.kozlov at oracle.com Wed Dec 11 02:48:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Dec 2019 18:48:40 -0800 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> Message-ID: <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> In general I don't like using switches in this changes. In most examples you have only 2 instructions to choose from which could be done with 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and never will be hit - you should hit first checks in supported vector size code. I may prefer to see 2 AD instructions as you had in previous changes. In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8. Why you need predicate for vabsnegD ? Other length is not supported anyway. Vladimir On 12/10/19 2:29 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all > https://bugs.openjdk.java.net/browse/JDK-8235719 > > Merge AD instructions for the following vector nodes: > ? - LShiftV*, RShiftV*, URShiftV* > ? - AbsV* > ? - NegV* > > Individual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual > > As Jatin described, merging is applied only to AD instructions of > similar shape. There are some more opportunities for reduction/merging > left, but they are deliberately left out for future work. > > The patch is derived from the initial version of generic vector support > [1]. Generic vector support was reviewed earlier [2]. > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html > > > [2] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html > From david.holmes at oracle.com Wed Dec 11 07:02:31 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Dec 2019 17:02:31 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: Message-ID: Hi Richard, On 11/12/2019 7:45 am, Reingruber, Richard wrote: > Hi, > > I would like to get reviews please for > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ > > Corresponding RFE: > https://bugs.openjdk.java.net/browse/JDK-8227745 > > Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 > And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] > > Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the > change is being tested at SAP since I posted the first RFR some months ago. > > The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI > agents request capabilities that allow them to access local variable values. E.g. if you start-up > with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right > from the beginning, well before a debugger attaches -- if ever one should do so. With the > enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based > optimizations are reverted just before an agent acquires the reference to an object. In the JBS item > you'll find more details. Most of the details here are in areas I can comment on in detail, but I did take an initial general look at things. The only thing that jumped out at me is that I think the DeoptimizeObjectsALotThread should be a hidden thread. + bool is_hidden_from_external_view() const { return true; } Also I don't see any testing of the DeoptimizeObjectsALotThread. Without active testing this will just bit-rot. Also on the tests I don't understand your @requires clause: @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & (vm.opt.TieredCompilation != true)) This seems to require that TieredCompilation is disabled, but tiered is our normal mode of operation. ?? Thanks, David > Thanks, > Richard. > > [1] Experimental fix for JDK-8214584 based on JDK-8227745 > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch > From igor.veresov at oracle.com Wed Dec 11 07:11:26 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 10 Dec 2019 23:11:26 -0800 Subject: RFR(XL) 8235634: Update Graal Message-ID: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com> Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 The list of changes can be found in the JBS issue. Thanks, igor From tom.rodriguez at oracle.com Wed Dec 11 07:21:12 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 10 Dec 2019 23:21:12 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: References: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> Message-ID: <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com> > Yes, right. I am confusing because not in all places we do this > sequence: mark_for_deoptimization, make_not_entrant, patch frames. > Patching frames are not always at the same place as we do marking. > > I also noticed that you changed under which locks make_not_entrant is > done. May be better to pass nmethod into deoptimize_all_marked() and do > it there (by default pass NULL)? You mean the CodeCache_lock? The CompiledMethod_lock is only thing required for make_not_entrant and that's acquired in make_not_entrant_or_zombie. The CodeCache_lock is required for the safe iteration over the CodeCache itself. Though I kind of like your suggestion of passing the nmethod to the call and performing the make_not_entrant call there instead. It keeps the logic together. I'll make that change. tom > > Thanks, > Vladimir > >> >> tom >> >>> >>> Thanks, >>> Vladimir >>> >>> On 12/9/19 12:10 PM, Tom Rodriguez wrote: >>>> http://cr.openjdk.java.net/~never/8229377/webrev >>>> https://bugs.openjdk.java.net/browse/JDK-8229377 >>>> >>>> This is a minor improvement to the JVMCI invalidate method to avoid >>>> scanning large code caches when invalidating a single nmethod. >>>> Instead the nmethod is directly made not_entrant.? In general I'm >>>> unclear what the benefit of the >>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. >>>> Testing is in progress. >>>> >>>> JDK-8230884 had been previously duplicated against this because they >>>> overlapped a bit, but in the interest of clarity I separated them >>>> again. >>>> >>>> tom From tobias.hartmann at oracle.com Wed Dec 11 08:04:07 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 11 Dec 2019 09:04:07 +0100 Subject: RFR(S): Clean-up BarrierSetC2 In-Reply-To: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> Message-ID: <334bc847-a427-5d33-e094-53961f08ad23@oracle.com> Hi Nils, nice cleanup! Why did you change "Node* var" -> "Node *var" in LoadNode::split_through_phi? Thanks, Tobias On 10.12.19 15:00, Nils Eliasson wrote: > Hi, > > I can across a lot of dead code in BarrierSetC2 while debugging a completely separate problem. This > is leftovers from legacy barrier implementations that no longer is needed. > > This is a quick cleanup where non-used parts are removed. I tried to keep it as simple as possible > to avoid introducing any bugs. There are additional cleanups that can be done in these files later. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235653 > > Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01 > > > Please review, > > Nils Eliasson > From nils.eliasson at oracle.com Wed Dec 11 09:16:12 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 11 Dec 2019 10:16:12 +0100 Subject: RFR(S): Clean-up BarrierSetC2 In-Reply-To: <334bc847-a427-5d33-e094-53961f08ad23@oracle.com> References: <6dc55d99-3e16-d04e-6dfe-ac5d3c80b4fb@oracle.com> <334bc847-a427-5d33-e094-53961f08ad23@oracle.com> Message-ID: <13d4c447-789d-8122-6d8a-23f5be200395@oracle.com> Hi Tobias, That's my IDE working against me. I'll change that back before pushing. Thanks! Nils On 2019-12-11 09:04, Tobias Hartmann wrote: > Hi Nils, > > nice cleanup! > > Why did you change "Node* var" -> "Node *var" in LoadNode::split_through_phi? > > Thanks, > Tobias > > On 10.12.19 15:00, Nils Eliasson wrote: >> Hi, >> >> I can across a lot of dead code in BarrierSetC2 while debugging a completely separate problem. This >> is leftovers from legacy barrier implementations that no longer is needed. >> >> This is a quick cleanup where non-used parts are removed. I tried to keep it as simple as possible >> to avoid introducing any bugs. There are additional cleanups that can be done in these files later. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8235653 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8235653/webrev.01 >> >> >> Please review, >> >> Nils Eliasson >> From vladimir.x.ivanov at oracle.com Wed Dec 11 09:37:43 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 12:37:43 +0300 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> Message-ID: > http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/ Looks good. Best regards, Vladimir Ivanov From tobias.hartmann at oracle.com Wed Dec 11 09:38:46 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 11 Dec 2019 10:38:46 +0100 Subject: [14] RFR(S): 8235452: Strip mined loop verification fails with assert(is_OuterStripMinedLoop()) failed: invalid node class In-Reply-To: References: <019c6bfa-bc6d-7fee-6343-c4b6a05a3d43@oracle.com> <8736dtbnm1.fsf@redhat.com> <5090a16e-f97d-a88e-8047-875a4afa9c87@oracle.com> Message-ID: <3606cc43-5acb-b2ee-b39e-cf5f1c377e97@oracle.com> Thanks Vladimir! Best regards, Tobias On 11.12.19 10:37, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8235452/webrev.01/ > > Looks good. > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Dec 11 09:51:25 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 12:51:25 +0300 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> <87blsga25b.fsf@redhat.com> Message-ID: > http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/ Looks good. Best regards, Vladimir Ivanov > On 10.12.19 11:46, Roland Westrelin wrote: >> >> Hi Christian, >> >>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ >> >> Is this correct? >> >> 364???????? set_idom(stmt, iffast_pred, dom_depth(stmt)); >> 366???????? set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt)); >> >> stmt is a non CFG so it doesn't have an idom but a control? >> >> Roland. >> From rwestrel at redhat.com Wed Dec 11 09:56:24 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 11 Dec 2019 10:56:24 +0100 Subject: RFR(S): 8235636: gc/shenandoah/compiler/TestUnsafeOffheapSwap.java fails after JDK-8226411 Message-ID: <87zhfz89tj.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8235636/webrev.00/ ShenandoahBarrierC2Support::is_dominator_same_ctrl() must take anti-dependencies into account otherwise when updating raw memory after a barrier is expanded, a load may be reordered wrt a store. Roland. From christian.hagedorn at oracle.com Wed Dec 11 10:02:42 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 11 Dec 2019 11:02:42 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: <878snk9xl8.fsf@redhat.com> References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> <87blsga25b.fsf@redhat.com> <878snk9xl8.fsf@redhat.com> Message-ID: <9ce1eba1-2fc1-67fb-c815-81cc5ee3e06c@oracle.com> Thank you for your review Roland! Best regards, Christian On 10.12.19 13:25, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/ > > Looks good to me. > > Roland. > From christian.hagedorn at oracle.com Wed Dec 11 10:03:13 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 11 Dec 2019 11:03:13 +0100 Subject: [14] RFR(M): 8233033: Results of execution differ for C2 and C1/Xint In-Reply-To: References: <3f59a7dc-8228-aa7a-8b82-b06605f29bc9@oracle.com> <87blsga25b.fsf@redhat.com> Message-ID: <30e6086b-e309-c402-21f9-36b2ab4fb8de@oracle.com> Thank you for your review Vladimir! Best regards, Christian On 11.12.19 10:51, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.01/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> On 10.12.19 11:46, Roland Westrelin wrote: >>> >>> Hi Christian, >>> >>>> http://cr.openjdk.java.net/~chagedorn/8233033/webrev.00/ >>> >>> Is this correct? >>> >>> 364???????? set_idom(stmt, iffast_pred, dom_depth(stmt)); >>> 366???????? set_idom(cloned_stmt, ifslow_pred, dom_depth(stmt)); >>> >>> stmt is a non CFG so it doesn't have an idom but a control? >>> >>> Roland. >>> From vladimir.x.ivanov at oracle.com Wed Dec 11 10:09:02 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 13:09:02 +0300 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> Message-ID: <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com> Yes, fully agree. Updated version: http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/ Got rid of ret_value and enhanced the comments, but also fixed long-standing bug you noticed: AbsVF should have the same additional checks as NegVF (since 512bit vandps is also introduced in AVX512DQ). Best regards, Vladimir Ivanov On 11.12.2019 02:31, John Rose wrote: > Actually I have one more comment about the new classification logic: > > The ?ret_value? idiom is terrible. > > I see a function which is complex, with something like this at the top: > > ? if (? simple size check ?) { > ? ? ret_value = false; ? // size not supported > ? } ? > > I want to read something more decisive like this: > > ? if (? simple size check ?) { > ? ? return false; ? // size not supported > ? } ? > > The ?ret_value? thingy adds only noise, and no clarity. > > It is more than an annoyance, hence my comment here. > The problem is that if I want to understand the quick check > above, I have to scroll down *past all the other checks* to see > if some joker does ?ret_value = true? before the ?return ret_value?, > subverting my understanding of the code. ?Basically, the ?ret_value? > nonsense makes the code impossible to break into understandable > parts. > > ? Grumpy John > > On Dec 10, 2019, at 12:34 PM, John Rose > wrote: >> >>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL >>> extensions), but maybe there'll be a moment when Skylake >>> (F+CD+BW+DQ+VL) can be chosen as the baseline. >> >> Yeah, I saw that coming after I visited the trusty intrinsics guide >> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ >>> >>> Anyway, I got your idea and it makes perfect sense to me to collect >>> such ideas. >> > From vladimir.x.ivanov at oracle.com Wed Dec 11 10:31:04 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 13:31:04 +0300 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> Message-ID: <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com> Thanks for reviews, Vladimir and John. Updated version: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/ On 11.12.2019 05:48, Vladimir Kozlov wrote: > In general I don't like using switches in this changes. In most examples > you have only 2 instructions to choose from which could be done with > 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and > never will be hit - you should hit first checks in supported vector size > code. Didn't have strong opinion about them (and still don't), so I refactored most of the switches to branches. Let me know how it looks now. Regarding ShouldNotReachHere(): it would be unfortunate if we have to take size code increase into account when using it to mark never-taken. Do you prefer "assert(false,...)" instead on for default case in switches? > I may prefer to see 2 AD instructions as you had in previous changes. Considering the main motivation is to reduce the number of instructions used, that would be a counter change. As I write later to John, I would like to see the dispatching be hidden inside MacroAssembler. It'll address your current concerns, right? > In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8. Good catch! Fixed. > Why you need predicate for vabsnegD ? Other length is not supported anyway. Agree, fixed. On 11.12.2019 03:00, John Rose wrote: > Thank you, reviewed. > > For consistency, I?d expect to see the AD file mention vshiftq > instead of in or in addition to the very specific evpsraq. > Maybe actually call vshiftq (consistent with other parts of > AD) and comment that it calls evpsraq? I had to look inside the > macro assembler to verify that evpsraq was properly aligned > with the other cases. Or just leave a comment saying this is > what vshiftq would do also, like the other instructions. In general, I'd like to see all the hardware-specific dispatching logic to be moved into MacroAssembler and AD file just to call into them. But we (Jatin, Sandhya, and me) decided to limit the amount of refactorings and upstream what Jatin ended up with. Are you fine with covering evpsraq case in a follow-up change? > For vabsnegF, I suggest adding a comment here: > > + predicate(n->as_Vector()->length() == 2 || > + // case 4 is handled as a 1-operand instruction by vabsneg4F > + n->as_Vector()->length() == 8 || > + n->as_Vector()->length() == 16); > I took slightly different route and rewrote it as follows: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{ + predicate(n->as_Vector()->length() != 4); // handled by 1-operand instruction vabsneg4F instruct vabsneg4F(vec dst, rRegI scratch) %{ + predicate(n->as_Vector()->length() == 4); It looks clearer than the previous version. Best regards, Vladimir Ivanov > > Vladimir > > On 12/10/19 2:29 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all >> https://bugs.openjdk.java.net/browse/JDK-8235719 >> >> Merge AD instructions for the following vector nodes: >> ?? - LShiftV*, RShiftV*, URShiftV* >> ?? - AbsV* >> ?? - NegV* >> >> Individual patches: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual >> >> >> As Jatin described, merging is applied only to AD instructions of >> similar shape. There are some more opportunities for reduction/merging >> left, but they are deliberately left out for future work. >> >> The patch is derived from the initial version of generic vector >> support [1]. Generic vector support was reviewed earlier [2]. >> >> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >> >> >> [2] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html >> From rkennke at redhat.com Wed Dec 11 10:55:36 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Dec 2019 11:55:36 +0100 Subject: RFR(S): 8235636: gc/shenandoah/compiler/TestUnsafeOffheapSwap.java fails after JDK-8226411 In-Reply-To: <87zhfz89tj.fsf@redhat.com> References: <87zhfz89tj.fsf@redhat.com> Message-ID: <73794273-64de-0ff5-e1a7-3cfb8202a69f@redhat.com> Looks good! Thanks! I presume that you have run appropriate testing.? Roman > http://cr.openjdk.java.net/~roland/8235636/webrev.00/ > > ShenandoahBarrierC2Support::is_dominator_same_ctrl() must take > anti-dependencies into account otherwise when updating raw memory after > a barrier is expanded, a load may be reordered wrt a store. > > Roland. > From vladimir.x.ivanov at oracle.com Wed Dec 11 11:53:17 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 14:53:17 +0300 Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, FmaV, AddReductionV, and MulReductionV nodes Message-ID: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/ https://bugs.openjdk.java.net/browse/JDK-8235756 Merge AD instructions for the following vector nodes: - DivVF/DivVD - SqrtVF/SqrtVD - FmaVF/FmaVD - AddReductionV* - MulReductionV* Individual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Dec 11 11:55:49 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2019 14:55:49 +0300 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com> References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com> Message-ID: > Yes, fully agree. Updated version: > ? http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/ > > Got rid of ret_value and enhanced the comments, but also fixed > long-standing bug you noticed: AbsVF should have the same additional > checks as NegVF (since 512bit vandps is also introduced in AVX512DQ). Additionally, got rid of ret_value in Matcher::match_rule_supported() and moved Op_RoundDoubleModeV there since it doesn't depend on vector length: + case Op_RoundDoubleModeV: + if (VM_Version::supports_avx() == false) { + return false; // 128bit vroundpd is not available + } break; Best regards, Vladimir Ivanov > On 11.12.2019 02:31, John Rose wrote: >> Actually I have one more comment about the new classification logic: >> >> The ?ret_value? idiom is terrible. >> >> I see a function which is complex, with something like this at the top: >> >> ?? if (? simple size check ?) { >> ?? ? ret_value = false; ? // size not supported >> ?? } ? >> >> I want to read something more decisive like this: >> >> ?? if (? simple size check ?) { >> ?? ? return false; ? // size not supported >> ?? } ? >> >> The ?ret_value? thingy adds only noise, and no clarity. >> >> It is more than an annoyance, hence my comment here. >> The problem is that if I want to understand the quick check >> above, I have to scroll down *past all the other checks* to see >> if some joker does ?ret_value = true? before the ?return ret_value?, >> subverting my understanding of the code. ?Basically, the ?ret_value? >> nonsense makes the code impossible to break into understandable >> parts. >> >> ? Grumpy John >> >> On Dec 10, 2019, at 12:34 PM, John Rose > > wrote: >>> >>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL >>>> extensions), but maybe there'll be a moment when Skylake >>>> (F+CD+BW+DQ+VL) can be chosen as the baseline. >>> >>> Yeah, I saw that coming after I visited the trusty intrinsics guide >>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ >>>> >>>> Anyway, I got your idea and it makes perfect sense to me to collect >>>> such ideas. >>> >> From felix.yang at huawei.com Wed Dec 11 12:39:46 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 11 Dec 2019 12:39:46 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation Message-ID: Hi, Please review this patch fixing a crash in C2 superword transform phase. Bug: https://bugs.openjdk.java.net/browse/JDK-8235762 This is similar to JDK-8229694, but a little bit more complex. The JVM crashes in the testcase when trying to dereference mem at [9] which is NULL. This happens when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value with SuperWord::align_to_ref() [8]. The corresponding field _align_to_ref is set to NULL at [7] since best_align_to_mem_ref was assigned NULL just before at [6]. _packset contained two packs and there were no memory operations left to be processed (memops is empty). As a result SuperWord::find_align_to_ref() will return NULL since it cannot find an alignment from two different types of mem operations. The loop in Test::vMeth is unrolled two times. As a result, main loop contains the following 12 memory operations: 954 LoadI === 633 985 955 [[ 953 ]] @int[int:>=0]:exact+any *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13 950 StoreI === 981 985 951 953 [[ 946 948 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16 948 LoadI === 647 950 949 [[ 947 ]] @int[int:>=0]:exact+any *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23 946 StoreI === 981 950 949 947 [[ 320 342 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26 320 LoadI === 633 946 318 [[ 321 ]] @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:13 342 StoreI === 981 946 340 321 [[ 391 369 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !jvms: Test::vMeth @ bci:16 369 LoadI === 647 342 367 [[ 370 ]] @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:23 391 StoreI === 981 342 367 370 [[ 772 985 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26 958 StoreB === 981 986 959 10 [[ 470 ]] @byte[int:>=0]:exact+any *, idx=10; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms: Test::vMeth @ bci:44 470 StoreB === 981 958 468 10 [[ 774 986 ]] @byte[int:>=0]:exact+any *, idx=10; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @ bci:44 961 StoreL === 981 984 962 12 [[ 431 ]] @long[int:>=0]:exact+any *, idx=8; Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth @ bci:35 431 StoreL === 981 961 429 12 [[ 776 984 ]] @long[int:>=0]:exact+any *, idx=8; Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @ bci:35 Consider the while loop at [1]: iter1: memops.size() = 12 mem_ref = 342 alignment: 342=0, 946=0, 391=4, 950=4 create_pack=true _packset: <342, 950> best_align_to_mem_ref = 342 iter2: memops.size() = 8 mem_ref = 431 alignment: 431=0, 961=8 create_pack=true _packset: <342, 950>, <431, 961> best_align_to_mem_ref = 342 iter3: memops.size() = 6 mem_ref = 470 alignment: 470=0,958=1 create_pack=true _packset: <342, 950>, <431, 961>, <470, 958> best_align_to_mem_ref = 342 iter4: memops.size() = 4 mem_ref = 320 alignment: 320=0, 954=4 create_pack=true _packset: <342, 950>, <431, 961>, <470, 958>, <320, 954> best_align_to_mem_ref = 342 iter5: memops.size() = 2 mem_ref = 369 alignment: 369=0, 948=4 create_pack=false _packset: <431, 961>, <470, 958> best_align_to_mem_ref = NULL (<342, 950> and <320, 954> are removed from _packset at [4]) For iter5, create_pack is set to false at [2]. As a result, the two memory operations in memops are poped at [3]. And 431 and 470 are pushed to memops at [5]. Then memops for call find_align_to_ref at [6] only contains 431 and 470 which are different in type. As a result, find_align_to_ref returns NULL. [1] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l582 [2] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l637 [3] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l684 [4] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l693 [5] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l715 [6] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l717 [7] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l742 [8] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l2346 [9] http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l3626 Proposed fix chooses one alignment from the remaining packs ignoring whether the memory operations are comparable or not. This is currently under testing: https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.patch diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp --- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019 +0100 +++ b/src/hotspot/share/opto/superword.cpp Wed Dec 11 20:17:10 2019 +0800 @@ -576,12 +576,13 @@ } Node_List align_to_refs; + int max_idx; int best_iv_adjustment = 0; MemNode* best_align_to_mem_ref = NULL; while (memops.size() != 0) { // Find a memory reference to align to. - MemNode* mem_ref = find_align_to_ref(memops); + MemNode* mem_ref = find_align_to_ref(memops, max_idx); if (mem_ref == NULL) break; align_to_refs.push(mem_ref); int iv_adjustment = get_iv_adjustment(mem_ref); @@ -699,34 +700,36 @@ // Put memory ops from remaining packs back on memops list for // the best alignment search. uint orig_msize = memops.size(); - if (_packset.length() == 1 && orig_msize == 0) { - // If there are no remaining memory ops and only 1 pack we have only one choice - // for the alignment - Node_List* p = _packset.at(0); - assert(p->size() > 0, "sanity"); + for (int i = 0; i < _packset.length(); i++) { + Node_List* p = _packset.at(i); MemNode* s = p->at(0)->as_Mem(); assert(!same_velt_type(s, mem_ref), "sanity"); - best_align_to_mem_ref = s; - } else { - for (int i = 0; i < _packset.length(); i++) { - Node_List* p = _packset.at(i); - MemNode* s = p->at(0)->as_Mem(); - assert(!same_velt_type(s, mem_ref), "sanity"); - memops.push(s); + memops.push(s); + } + best_align_to_mem_ref = find_align_to_ref(memops, max_idx); + if (best_align_to_mem_ref == NULL) { + if (TraceSuperWord) { + tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL"); } - best_align_to_mem_ref = find_align_to_ref(memops); - if (best_align_to_mem_ref == NULL) { - if (TraceSuperWord) { - tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL"); + if (_packset.length() > 0) { + if (orig_msize == 0) { + best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); + } else { + for (int i = 0; i < orig_msize; i++) { + memops.remove(0); + } + best_align_to_mem_ref = find_align_to_ref(memops, max_idx); + assert(best_align_to_mem_ref == NULL, "sanity"); + best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); } - break; } - best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref); - NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);) - // Restore list. - while (memops.size() > orig_msize) - (void)memops.pop(); + break; } + best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref); + NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);) + // Restore list. + while (memops.size() > orig_msize) + (void)memops.pop(); } } // unaligned memory accesses @@ -761,7 +764,7 @@ // Find a memory reference to align the loop induction variable to. // Looks first at stores then at loads, looking for a memory reference // with the largest number of references similar to it. -MemNode* SuperWord::find_align_to_ref(Node_List &memops) { +MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) { GrowableArray cmp_ct(arena(), memops.size(), memops.size(), 0); // Count number of comparable memory ops @@ -848,6 +851,7 @@ } #endif + idx = max_idx; if (max_ct > 0) { #ifdef ASSERT if (TraceSuperWord) { diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp --- a/src/hotspot/share/opto/superword.hpp Wed Dec 11 12:12:39 2019 +0100 +++ b/src/hotspot/share/opto/superword.hpp Wed Dec 11 20:17:10 2019 +0800 @@ -408,7 +408,7 @@ void print_loop(bool whole); #endif // Find a memory reference to align the loop induction variable to. - MemNode* find_align_to_ref(Node_List &memops); + MemNode* find_align_to_ref(Node_List &memops, int &idx); // Calculate loop's iv adjustment for this memory ops. int get_iv_adjustment(MemNode* mem); // Can the preloop align the reference to position zero in the vector? Thanks, Felix From tobias.hartmann at oracle.com Wed Dec 11 12:50:00 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 11 Dec 2019 13:50:00 +0100 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: Message-ID: Hi Felix, is this a duplicate of JDK-8235700? Please create a webrev and also include the regression test. Thanks, Tobias On 11.12.19 13:39, Yangfei (Felix) wrote: > Hi, > > Please review this patch fixing a crash in C2 superword transform phase. > Bug: https://bugs.openjdk.java.net/browse/JDK-8235762 > > This is similar to JDK-8229694, but a little bit more complex. > The JVM crashes in the testcase when trying to dereference mem at [9] which is NULL. > This happens when converting packs into vector nodes in SuperWord::output() by reading an unexpected NULL value with SuperWord::align_to_ref() [8]. > The corresponding field _align_to_ref is set to NULL at [7] since best_align_to_mem_ref was assigned NULL just before at [6]. > _packset contained two packs and there were no memory operations left to be processed (memops is empty). As a result SuperWord::find_align_to_ref() > will return NULL since it cannot find an alignment from two different types of mem operations. > > The loop in Test::vMeth is unrolled two times. As a result, main loop contains the following 12 memory operations: > > 954 LoadI === 633 985 955 [[ 953 ]] @int[int:>=0]:exact+any *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13 > 950 StoreI === 981 985 951 953 [[ 946 948 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16 > 948 LoadI === 647 950 949 [[ 947 ]] @int[int:>=0]:exact+any *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23 > 946 StoreI === 981 950 949 947 [[ 320 342 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26 > 320 LoadI === 633 946 318 [[ 321 ]] @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:13 > 342 StoreI === 981 946 340 321 [[ 391 369 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !jvms: Test::vMeth @ bci:16 > 369 LoadI === 647 342 367 [[ 370 ]] @int[int:>=0]:exact+any *, idx=6; #int !jvms: Test::vMeth @ bci:23 > 391 StoreI === 981 342 367 370 [[ 772 985 ]] @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26 > > 958 StoreB === 981 986 959 10 [[ 470 ]] @byte[int:>=0]:exact+any *, idx=10; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms: Test::vMeth @ bci:44 > 470 StoreB === 981 958 468 10 [[ 774 986 ]] @byte[int:>=0]:exact+any *, idx=10; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @ bci:44 > > 961 StoreL === 981 984 962 12 [[ 431 ]] @long[int:>=0]:exact+any *, idx=8; Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth @ bci:35 > 431 StoreL === 981 961 429 12 [[ 776 984 ]] @long[int:>=0]:exact+any *, idx=8; Memory: @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @ bci:35 > > > Consider the while loop at [1]: > iter1: > memops.size() = 12 mem_ref = 342 alignment: 342=0, 946=0, 391=4, 950=4 create_pack=true _packset: <342, 950> best_align_to_mem_ref = 342 > iter2: > memops.size() = 8 mem_ref = 431 alignment: 431=0, 961=8 create_pack=true _packset: <342, 950>, <431, 961> best_align_to_mem_ref = 342 > iter3: > memops.size() = 6 mem_ref = 470 alignment: 470=0,958=1 create_pack=true _packset: <342, 950>, <431, 961>, <470, 958> best_align_to_mem_ref = 342 > iter4: > memops.size() = 4 mem_ref = 320 alignment: 320=0, 954=4 create_pack=true _packset: <342, 950>, <431, 961>, <470, 958>, <320, 954> best_align_to_mem_ref = 342 > iter5: > memops.size() = 2 mem_ref = 369 alignment: 369=0, 948=4 create_pack=false _packset: <431, 961>, <470, 958> best_align_to_mem_ref = NULL (<342, 950> and <320, 954> are removed from _packset at [4]) > > For iter5, create_pack is set to false at [2]. As a result, the two memory operations in memops are poped at [3]. And 431 and 470 are pushed to memops at [5]. > > Then memops for call find_align_to_ref at [6] only contains 431 and 470 which are different in type. As a result, find_align_to_ref returns NULL. > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l582 > [2] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l637 > [3] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l684 > [4] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l693 > [5] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l715 > [6] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l717 > [7] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l742 > [8] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l2346 > [9] > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share/opto/superword.cpp#l3626 > > Proposed fix chooses one alignment from the remaining packs ignoring whether the memory operations are comparable or not. > This is currently under testing: https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.patch > > diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp > --- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019 +0100 > +++ b/src/hotspot/share/opto/superword.cpp Wed Dec 11 20:17:10 2019 +0800 > @@ -576,12 +576,13 @@ > } > Node_List align_to_refs; > + int max_idx; > int best_iv_adjustment = 0; > MemNode* best_align_to_mem_ref = NULL; > while (memops.size() != 0) { > // Find a memory reference to align to. > - MemNode* mem_ref = find_align_to_ref(memops); > + MemNode* mem_ref = find_align_to_ref(memops, max_idx); > if (mem_ref == NULL) break; > align_to_refs.push(mem_ref); > int iv_adjustment = get_iv_adjustment(mem_ref); > @@ -699,34 +700,36 @@ > // Put memory ops from remaining packs back on memops list for > // the best alignment search. > uint orig_msize = memops.size(); > - if (_packset.length() == 1 && orig_msize == 0) { > - // If there are no remaining memory ops and only 1 pack we have only one choice > - // for the alignment > - Node_List* p = _packset.at(0); > - assert(p->size() > 0, "sanity"); > + for (int i = 0; i < _packset.length(); i++) { > + Node_List* p = _packset.at(i); > MemNode* s = p->at(0)->as_Mem(); > assert(!same_velt_type(s, mem_ref), "sanity"); > - best_align_to_mem_ref = s; > - } else { > - for (int i = 0; i < _packset.length(); i++) { > - Node_List* p = _packset.at(i); > - MemNode* s = p->at(0)->as_Mem(); > - assert(!same_velt_type(s, mem_ref), "sanity"); > - memops.push(s); > + memops.push(s); > + } > + best_align_to_mem_ref = find_align_to_ref(memops, max_idx); > + if (best_align_to_mem_ref == NULL) { > + if (TraceSuperWord) { > + tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL"); > } > - best_align_to_mem_ref = find_align_to_ref(memops); > - if (best_align_to_mem_ref == NULL) { > - if (TraceSuperWord) { > - tty->print_cr("SuperWord::find_adjacent_refs(): best_align_to_mem_ref == NULL"); > + if (_packset.length() > 0) { > + if (orig_msize == 0) { > + best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); > + } else { > + for (int i = 0; i < orig_msize; i++) { > + memops.remove(0); > + } > + best_align_to_mem_ref = find_align_to_ref(memops, max_idx); > + assert(best_align_to_mem_ref == NULL, "sanity"); > + best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); > } > - break; > } > - best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref); > - NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);) > - // Restore list. > - while (memops.size() > orig_msize) > - (void)memops.pop(); > + break; > } > + best_iv_adjustment = get_iv_adjustment(best_align_to_mem_ref); > + NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, best_iv_adjustment);) > + // Restore list. > + while (memops.size() > orig_msize) > + (void)memops.pop(); > } > } // unaligned memory accesses > @@ -761,7 +764,7 @@ > // Find a memory reference to align the loop induction variable to. > // Looks first at stores then at loads, looking for a memory reference > // with the largest number of references similar to it. > -MemNode* SuperWord::find_align_to_ref(Node_List &memops) { > +MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) { > GrowableArray cmp_ct(arena(), memops.size(), memops.size(), 0); > // Count number of comparable memory ops > @@ -848,6 +851,7 @@ > } > #endif > + idx = max_idx; > if (max_ct > 0) { > #ifdef ASSERT > if (TraceSuperWord) { > diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp > --- a/src/hotspot/share/opto/superword.hpp Wed Dec 11 12:12:39 2019 +0100 > +++ b/src/hotspot/share/opto/superword.hpp Wed Dec 11 20:17:10 2019 +0800 > @@ -408,7 +408,7 @@ > void print_loop(bool whole); > #endif > // Find a memory reference to align the loop induction variable to. > - MemNode* find_align_to_ref(Node_List &memops); > + MemNode* find_align_to_ref(Node_List &memops, int &idx); > // Calculate loop's iv adjustment for this memory ops. > int get_iv_adjustment(MemNode* mem); > // Can the preloop align the reference to position zero in the vector? > > > Thanks, > Felix > From felix.yang at huawei.com Wed Dec 11 12:55:31 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 11 Dec 2019 12:55:31 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: Message-ID: Hi Tobias, I didn't noticed JDK-8235700 when I started to fix this bug several days ago. Sure, I will prepare a webrev. Comments are welcome. Thanks, Felix > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Wednesday, December 11, 2019 8:50 PM > To: Yangfei (Felix) ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation > > Hi Felix, > > is this a duplicate of JDK-8235700? > > Please create a webrev and also include the regression test. > > Thanks, > Tobias > > On 11.12.19 13:39, Yangfei (Felix) wrote: > > Hi, > > > > Please review this patch fixing a crash in C2 superword transform phase. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235762 > > > > This is similar to JDK-8229694, but a little bit more complex. > > The JVM crashes in the testcase when trying to dereference mem at [9] > which is NULL. > > This happens when converting packs into vector nodes in SuperWord::output() > by reading an unexpected NULL value with SuperWord::align_to_ref() [8]. > > The corresponding field _align_to_ref is set to NULL at [7] since > best_align_to_mem_ref was assigned NULL just before at [6]. > > _packset contained two packs and there were no memory operations left > > to be processed (memops is empty). As a result > SuperWord::find_align_to_ref() will return NULL since it cannot find an > alignment from two different types of mem operations. > > > > The loop in Test::vMeth is unrolled two times. As a result, main loop > contains the following 12 memory operations: > > > > 954 LoadI === 633 985 955 [[ 953 ]] @int[int:>=0]:exact+any > *, idx=6; #int !orig=320 !jvms: Test::vMeth @ bci:13 > > 950 StoreI === 981 985 951 953 [[ 946 948 ]] > @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any > *, idx=6; !orig=342 !jvms: Test::vMeth @ bci:16 > > 948 LoadI === 647 950 949 [[ 947 ]] @int[int:>=0]:exact+any > *, idx=6; #int !orig=369 !jvms: Test::vMeth @ bci:23 > > 946 StoreI === 981 950 949 947 [[ 320 342 ]] > @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any > *, idx=6; !orig=391,989 !jvms: Test::vMeth @ bci:26 > > 320 LoadI === 633 946 318 [[ 321 ]] @int[int:>=0]:exact+any > *, idx=6; #int !jvms: Test::vMeth @ bci:13 > > 342 StoreI === 981 946 340 321 [[ 391 369 ]] > @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any > *, idx=6; !jvms: Test::vMeth @ bci:16 > > 369 LoadI === 647 342 367 [[ 370 ]] @int[int:>=0]:exact+any > *, idx=6; #int !jvms: Test::vMeth @ bci:23 > > 391 StoreI === 981 342 367 370 [[ 772 985 ]] > @int[int:>=0]:exact+any *, idx=6; Memory: @int[int:>=0]:NotNull:exact+any > *, idx=6; !orig=989 !jvms: Test::vMeth @ bci:26 > > > > 958 StoreB === 981 986 959 10 [[ 470 ]] > @byte[int:>=0]:exact+any *, idx=10; Memory: > @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=470,[990] !jvms: > Test::vMeth @ bci:44 > > 470 StoreB === 981 958 468 10 [[ 774 986 ]] > @byte[int:>=0]:exact+any *, idx=10; Memory: > @byte[int:>=0]:NotNull:exact+any *, idx=10; !orig=[990] !jvms: Test::vMeth @ > bci:44 > > > > 961 StoreL === 981 984 962 12 [[ 431 ]] > @long[int:>=0]:exact+any *, idx=8; Memory: > @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=431,[991] !jvms: Test::vMeth > @ bci:35 > > 431 StoreL === 981 961 429 12 [[ 776 984 ]] > @long[int:>=0]:exact+any *, idx=8; Memory: > @long[int:>=0]:NotNull:exact+any *, idx=8; !orig=[991] !jvms: Test::vMeth @ > bci:35 > > > > > > Consider the while loop at [1]: > > iter1: > > memops.size() = 12 mem_ref = 342 alignment: 342=0, 946=0, 391=4, > > 950=4 create_pack=true _packset: <342, 950> best_align_to_mem_ref > = > > 342 > > iter2: > > memops.size() = 8 mem_ref = 431 alignment: 431=0, 961=8 > > create_pack=true _packset: <342, 950>, <431, 961> > > best_align_to_mem_ref = 342 > > iter3: > > memops.size() = 6 mem_ref = 470 alignment: 470=0,958=1 > > create_pack=true _packset: <342, 950>, <431, 961>, <470, 958> > > best_align_to_mem_ref = 342 > > iter4: > > memops.size() = 4 mem_ref = 320 alignment: 320=0, 954=4 > > create_pack=true _packset: <342, 950>, <431, 961>, <470, 958>, <320, > > 954> best_align_to_mem_ref = 342 > > iter5: > > memops.size() = 2 mem_ref = 369 alignment: 369=0, 948=4 > > create_pack=false _packset: <431, 961>, <470, 958> > > best_align_to_mem_ref = NULL (<342, 950> and <320, 954> are removed > > from _packset at [4]) > > > > For iter5, create_pack is set to false at [2]. As a result, the two memory > operations in memops are poped at [3]. And 431 and 470 are pushed to > memops at [5]. > > > > Then memops for call find_align_to_ref at [6] only contains 431 and 470 which > are different in type. As a result, find_align_to_ref returns NULL. > > > > [1] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l582 > > [2] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l637 > > [3] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l684 > > [4] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l693 > > [5] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l715 > > [6] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l717 > > [7] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l742 > > [8] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l2346 > > [9] > > http://hg.openjdk.java.net/jdk/jdk/file/1a7175456d29/src/hotspot/share > > /opto/superword.cpp#l3626 > > > > Proposed fix chooses one alignment from the remaining packs ignoring > whether the memory operations are comparable or not. > > This is currently under testing: > > https://bugs.openjdk.java.net/secure/attachment/86058/fix-superword.pa > > tch > > > > diff -r 6cf6761c444e src/hotspot/share/opto/superword.cpp > > --- a/src/hotspot/share/opto/superword.cpp Wed Dec 11 12:12:39 2019 > > +0100 > > +++ b/src/hotspot/share/opto/superword.cpp Wed Dec 11 20:17:10 > 2019 +0800 > > @@ -576,12 +576,13 @@ > > } > > Node_List align_to_refs; > > + int max_idx; > > int best_iv_adjustment = 0; > > MemNode* best_align_to_mem_ref = NULL; > > while (memops.size() != 0) { > > // Find a memory reference to align to. > > - MemNode* mem_ref = find_align_to_ref(memops); > > + MemNode* mem_ref = find_align_to_ref(memops, max_idx); > > if (mem_ref == NULL) break; > > align_to_refs.push(mem_ref); > > int iv_adjustment = get_iv_adjustment(mem_ref); @@ -699,34 > > +700,36 @@ > > // Put memory ops from remaining packs back on memops list for > > // the best alignment search. > > uint orig_msize = memops.size(); > > - if (_packset.length() == 1 && orig_msize == 0) { > > - // If there are no remaining memory ops and only 1 pack we > have only one choice > > - // for the alignment > > - Node_List* p = _packset.at(0); > > - assert(p->size() > 0, "sanity"); > > + for (int i = 0; i < _packset.length(); i++) { > > + Node_List* p = _packset.at(i); > > MemNode* s = p->at(0)->as_Mem(); > > assert(!same_velt_type(s, mem_ref), "sanity"); > > - best_align_to_mem_ref = s; > > - } else { > > - for (int i = 0; i < _packset.length(); i++) { > > - Node_List* p = _packset.at(i); > > - MemNode* s = p->at(0)->as_Mem(); > > - assert(!same_velt_type(s, mem_ref), "sanity"); > > - memops.push(s); > > + memops.push(s); > > + } > > + best_align_to_mem_ref = find_align_to_ref(memops, max_idx); > > + if (best_align_to_mem_ref == NULL) { > > + if (TraceSuperWord) { > > + tty->print_cr("SuperWord::find_adjacent_refs(): > > + best_align_to_mem_ref == NULL"); > > } > > - best_align_to_mem_ref = find_align_to_ref(memops); > > - if (best_align_to_mem_ref == NULL) { > > - if (TraceSuperWord) { > > - tty->print_cr("SuperWord::find_adjacent_refs(): > best_align_to_mem_ref == NULL"); > > + if (_packset.length() > 0) { > > + if (orig_msize == 0) { > > + best_align_to_mem_ref = > memops.at(max_idx)->as_Mem(); > > + } else { > > + for (int i = 0; i < orig_msize; i++) { > > + memops.remove(0); > > + } > > + best_align_to_mem_ref = find_align_to_ref(memops, > max_idx); > > + assert(best_align_to_mem_ref == NULL, "sanity"); > > + best_align_to_mem_ref = > memops.at(max_idx)->as_Mem(); > > } > > - break; > > } > > - best_iv_adjustment = > get_iv_adjustment(best_align_to_mem_ref); > > - > NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, > best_iv_adjustment);) > > - // Restore list. > > - while (memops.size() > orig_msize) > > - (void)memops.pop(); > > + break; > > } > > + best_iv_adjustment = > get_iv_adjustment(best_align_to_mem_ref); > > + > NOT_PRODUCT(find_adjacent_refs_trace_1(best_align_to_mem_ref, > best_iv_adjustment);) > > + // Restore list. > > + while (memops.size() > orig_msize) > > + (void)memops.pop(); > > } > > } // unaligned memory accesses > > @@ -761,7 +764,7 @@ > > // Find a memory reference to align the loop induction variable to. > > // Looks first at stores then at loads, looking for a memory reference > > // with the largest number of references similar to it. > > -MemNode* SuperWord::find_align_to_ref(Node_List &memops) { > > +MemNode* SuperWord::find_align_to_ref(Node_List &memops, int &idx) { > > GrowableArray cmp_ct(arena(), memops.size(), memops.size(), 0); > > // Count number of comparable memory ops @@ -848,6 +851,7 @@ > > } > > #endif > > + idx = max_idx; > > if (max_ct > 0) { > > #ifdef ASSERT > > if (TraceSuperWord) { > > diff -r 6cf6761c444e src/hotspot/share/opto/superword.hpp > > --- a/src/hotspot/share/opto/superword.hpp Wed Dec 11 12:12:39 > 2019 +0100 > > +++ b/src/hotspot/share/opto/superword.hpp Wed Dec 11 20:17:10 > 2019 +0800 > > @@ -408,7 +408,7 @@ > > void print_loop(bool whole); > > #endif > > // Find a memory reference to align the loop induction variable to. > > - MemNode* find_align_to_ref(Node_List &memops); > > + MemNode* find_align_to_ref(Node_List &memops, int &idx); > > // Calculate loop's iv adjustment for this memory ops. > > int get_iv_adjustment(MemNode* mem); > > // Can the preloop align the reference to position zero in the vector? > > > > > > Thanks, > > Felix > > From kirk at kodewerk.com Wed Dec 11 13:14:17 2019 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Wed, 11 Dec 2019 14:14:17 +0100 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: <22E085AB-D5FF-478C-9BDC-2BB0C021A656@kodewerk.com> +1 to what Charlie has said. Additionally we?ve been exploring EA with an eye on improving it?s ability to catch and eliminate transients. So far the work is showing lots of promise. Kind regards, Kirk > On Dec 9, 2019, at 10:31 PM, Charles Oliver Nutter wrote: > > This gets a big +1 for me, but honestly even better would be eliminating > the hard inline level limit altogether in favor of other metrics (inlined > code size, optimized node count, incremental inlining). Ignoring that for > the moment... > > Anecdotally, I can say that there are many code paths in JRuby that don't > inline fully *solely* because of this limit. For example, there are objects > in the numeric tower that out of compatibility necessity have constructor > paths that get dangerously close to 9 levels deep: > > calling code-> Fixnum factory -> Fixnum constructor -> Integer -> Numeric > -> Object -> BasicObject -> j.l.Object > > ...and that would just inline into the first level of Ruby code. The last > five levels here are mostly just "super()" calls. Ideally we would want to > have a few levels of Ruby code inlining together too. > > If this path doesn't inline, we've got no chance for escape analysis and > other optimizations to reduce the transient numeric objects. > > A question relating to JRuby: how does the inline level interact with > MethodHandle/LambdaForm @ForceInline? Does that get around it? ALWAYS? > Clearly we see most of our invokedynamic call sites inline, but I have > little understanding of the actual cost of the five to 10 to 20 levels of > LFs between an indy site and the eventual target. > > In any case... +1, inline all the things. > > - Charlie > > On Sun, Dec 8, 2019 at 3:42 PM Claes Redestad > wrote: > >> Hi, >> >> increasing MaxInlineLevel can substantially improve performance in some >> benchmarks[1], and has been reported to help applications implemented in >> scala in particular. >> >> There is always some risk of regressions when tweaking the default >> inlining settings. I've done a number of experiments to ascertain that >> the effect of increasing this on a wide array of benchmarks. With 15 all >> benchmarks tested are show either neutral or positive results, with no >> observed regression w.r.t. compilation speed or code cache usage. >> >> Webrev: http://cr.openjdk.java.net/~redestad/8234863/open.00/ >> >> Thanks! >> >> /Claes >> >> [1] One http://renaissance.dev sub-benchmark improve by almost 3x with >> an increase from 9 to 15. >> From nils.eliasson at oracle.com Wed Dec 11 13:46:31 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 11 Dec 2019 14:46:31 +0100 Subject: RFR: 8234328: VectorSet::clear can cause fragmentation In-Reply-To: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> References: <47dce9ee-0e62-7375-4dff-2924f824ecc6@oracle.com> Message-ID: <68719475-5d01-333f-6ac9-00bd201be370@oracle.com> Hi Claes, Your change looks good. Reviewed. Regards, Nils On 2019-11-19 11:18, Claes Redestad wrote: > Hi, > > today, VectorSet::clear "reclaims" storage when the size is large. > > However, since the backing array is allocated in a resource arena, this > is dubious since the currently retained memory is only actually freed > and made reusable if it's currently the last chunk of memory allocated > in the arena. This means a clear() is likely to just waste the allocated > memory until we exit the current resource scope > > Instead, I propose a strategy where instead of "freeing" we keep track > of the currently allocated size of the VectorSet separately from the in- > use size. We can then defer the memset to reset/clear the memory to the > next time we need to grow, thus avoiding unnecessary reallocations and > memsets. This limits the memory waste. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234328 > Webrev: http://cr.openjdk.java.net/~redestad/8234328/open.00/ > > Testing: tier1-3 > > Either of reset() or clear() could now be removed, which seems like a > straightforward follow-up RFE. With some convincing I could roll it into > this patch. > > Thanks! > > /Claes From zhuoren.wz at alibaba-inc.com Tue Dec 10 15:43:23 2019 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Tue, 10 Dec 2019 23:43:23 +0800 Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gY3Jhc2ggZHVlIHRvIGxvbmcgb2Zmc2V0?= In-Reply-To: <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com> References: <2d65a25f-8e38-4da4-a6ef-b4ff9ba847fe.zhuoren.wz@alibaba-inc.com> , <26ad8b0f-95a6-f126-7ea1-aa59291145d4@redhat.com> Message-ID: Pengfei, thanks for your comments. Here is updated patch. http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.01/ Haley, please also have a look at this patch. Regards, Zhuoren ------------------------------------------------------------------ From:Andrew Haley Sent At:2019 Dec. 10 (Tue.) 22:06 To:Pengfei Li (Arm Technology China) Cc:hotspot compiler ; aarch64-port-dev at openjdk.java.net Subject:Re: [aarch64-port-dev ] crash due to long offset On 12/10/19 8:28 AM, Pengfei Li (Arm Technology China) wrote: > Hi Zhuoren, > >> I also wrote a patch to solve this issue, please also review. >> http://cr.openjdk.java.net/~wzhuo/BigOffsetAarch64/webrev.00/jdk13u.pat >> ch > Thanks for your patch. I (NOT a reviewer) eyeballed your fix and found a probable mistake. > > In "enc_class aarch64_enc_str(iRegL src, memory mem) %{ ... %}", you have "if (($mem$$index == -1) && ($mem$$disp > 0)& (($mem$$disp & 0x7) != 0) && ($mem$$disp > 255))". > Should it be "&&" instead of "&" in the middle? > > Another question: Is it possible to add the logic into loadStore() or another new function instead of duplicating it everywhere in aarch64.ad? > > I've also CC'ed this to hotspot-compiler-dev because all hotspot compiler patches (including AArch64 specific) should go through it for review. I'm looking at this. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From matthias.baesken at sap.com Wed Dec 11 14:31:19 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Wed, 11 Dec 2019 14:31:19 +0000 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for posting this . I put your change into our internal build+test queue . We currently do not have something like you described ( a P9 QEMU VM (emulation) with NVDIMM support ) in our test landscape, but It does not hurt to have the patch in our builds/tests anyway .... Best regards, Matthias > > Hi, > > Could the following change be reviewed please? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8234599 > Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/ > > POWER9 does not have any cache management or barrier instructions aimed > specifically to deal with persistent memory (NVDIMM), hence in Power ISA > v3.0 > there are no instructions like data cache flush/store or sync instructions > specific to persistent memory. Nonetheless, in some cases (like through > hardware > emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct > mapping of > persisnt memory) and a /dev/pmem device is available it's possible to use > data > cache and sync instructions in the ISA (which are not explicitly aimed to > persistent memory) on a memory region backed by DAX, i.e. mapped using > new > mmap() flag MAP_SYNC for persistent memory), so these instructions will > have the > same semantics as the instructions for data cache flush / store and sync on > other architectures supporting NVDIMM and that have explicit instructions to > deal > with persistent memory. > > This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync > instruction to sync data to a memory backed by DAX (to persistent memory) > when > that's required. > > Moreover, that change also paves the way for supporting NVDIMM on > future > Power CPUs that might have new data cache management and barrier > instructions > explicitly to deal with persistent memory. > > The change was developed and tested using a P9 QEMU VM (emulation) > with NVDIMM > support. For details on how to setup a proper QEMU + Linux kernel w/ a > /dev/pmem > device configured please see recipe in [2]. > > The JVM on a POWER9 machine with a /dev/pmem device properly set and > with that > change applied is able to pass the test for JEP-352 [3]. The JVM is also able > to pass all tests of Mashona library [4]. > > When DAX is not supported, like on POWER8 and POWER9 w/o DAX support, > OS won't > support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when > code will > try to allocate memory backed by a persistent memory device, so the JVM > will get > a "java.io.IOException: Operation not supported". Naturally, on machines > that don't support writebacks or pmem supports_data_cache_line_flush() > will > return false, but even in that case JVM will hit a EOPNOTSUPP and get a > "java.io.IOException: Operation not supported" sooner than it has the > chance to > try to emit any writeback + sync instructions. > > Thank you. > > Best regards, > Gustavo > > [1] http://man7.org/linux/man-pages/man2/mmap.2.html > [2] https://github.com/gromero/nvdimm > [3] > http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/Ma > ppedByteBuffer/PmemTest.java > [4] https://github.com/jhalliday/mashona From christian.hagedorn at oracle.com Wed Dec 11 14:53:47 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 11 Dec 2019 15:53:47 +0100 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: Message-ID: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Hi Felix Thanks for working on that. Your fix also seems to work for JDK-8235700. I closed that one as a duplicate of yours. >>> + for (int i = 0; i < orig_msize; i++) { Should be uint since orig_msize is a uint >>> + best_align_to_mem_ref = find_align_to_ref(memops, >> max_idx); >>> + assert(best_align_to_mem_ref == NULL, "sanity"); You can merge these two lines together into assert(find_align_to_ref(memops, max_idx) == NULL, "sanity"); since the call belongs to the sanity check. Or just surround it by a #ifdef ASSERT. >>> + idx = max_idx; Is max_idx always guaranteed to be valid and not -1 when accessing it later? Best regards, Christian From richard.reingruber at sap.com Wed Dec 11 15:07:29 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 11 Dec 2019 15:07:29 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: Message-ID: Hi David, > Most of the details here are in areas I can comment on in detail, but I > did take an initial general look at things. Thanks for taking the time! > The only thing that jumped out at me is that I think the > DeoptimizeObjectsALotThread should be a hidden thread. > > + bool is_hidden_from_external_view() const { return true; } Yes, it should. Will add the method like above. > Also I don't see any testing of the DeoptimizeObjectsALotThread. Without > active testing this will just bit-rot. DeoptimizeObjectsALot is meant for stress testing with a larger workload. I will add a minimal test to keep it fresh. > Also on the tests I don't understand your @requires clause: > > @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & > (vm.opt.TieredCompilation != true)) > > This seems to require that TieredCompilation is disabled, but tiered is > our normal mode of operation. ?? > I removed the clause. I guess I wanted to target the tests towards the code they are supposed to test, and it's easier to analyze failures w/o tiered compilation and with just one compiler thread. Additionally I will make use of compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. Thanks, Richard. -----Original Message----- From: David Holmes Sent: Mittwoch, 11. Dezember 2019 08:03 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, On 11/12/2019 7:45 am, Reingruber, Richard wrote: > Hi, > > I would like to get reviews please for > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ > > Corresponding RFE: > https://bugs.openjdk.java.net/browse/JDK-8227745 > > Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 > And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] > > Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the > change is being tested at SAP since I posted the first RFR some months ago. > > The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI > agents request capabilities that allow them to access local variable values. E.g. if you start-up > with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right > from the beginning, well before a debugger attaches -- if ever one should do so. With the > enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based > optimizations are reverted just before an agent acquires the reference to an object. In the JBS item > you'll find more details. Most of the details here are in areas I can comment on in detail, but I did take an initial general look at things. The only thing that jumped out at me is that I think the DeoptimizeObjectsALotThread should be a hidden thread. + bool is_hidden_from_external_view() const { return true; } Also I don't see any testing of the DeoptimizeObjectsALotThread. Without active testing this will just bit-rot. Also on the tests I don't understand your @requires clause: @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & (vm.opt.TieredCompilation != true)) This seems to require that TieredCompilation is disabled, but tiered is our normal mode of operation. ?? Thanks, David > Thanks, > Richard. > > [1] Experimental fix for JDK-8214584 based on JDK-8227745 > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch > From felix.yang at huawei.com Wed Dec 11 15:11:42 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 11 Dec 2019 15:11:42 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi Christian, Thanks for the suggestions. Comments inlined. > -----Original Message----- > From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com] > Sent: Wednesday, December 11, 2019 10:54 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation > > Hi Felix > > Thanks for working on that. Your fix also seems to work for JDK-8235700. > I closed that one as a duplicate of yours. > > >>> + for (int i = 0; i < orig_msize; i++) { > > Should be uint since orig_msize is a uint -- Yes, will modify accordingly when I am preparing webrev. > > >>> + best_align_to_mem_ref = find_align_to_ref(memops, > >> max_idx); > >>> + assert(best_align_to_mem_ref == NULL, "sanity"); > > You can merge these two lines together into assert(find_align_to_ref(memops, > max_idx) == NULL, "sanity"); since the call belongs to the sanity check. Or just > surround it by a #ifdef ASSERT. -- The purpose of line 721 here is to calculate the max_idx. So I don't think it's suitable to treat this line as assertion logic. > >>> + idx = max_idx; > > Is max_idx always guaranteed to be valid and not -1 when accessing it later? -- Yes, I think so. When memops is not empty and the memory ops in memops are not comparable, find_align_to_ref will always sets its max_idx. Thanks, Felix From rwestrel at redhat.com Wed Dec 11 15:36:12 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 11 Dec 2019 16:36:12 +0100 Subject: RFR: 8235729: Shenandoah: Remove useless casting to non-constant In-Reply-To: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com> References: <65746d9b-1b54-dd75-d340-a9f8e76929be@redhat.com> Message-ID: <87r21a98nn.fsf@redhat.com> > http://cr.openjdk.java.net/~rkennke/JDK-8235729/webrev.00/ Looks good to me. Roland. From martin.doerr at sap.com Wed Dec 11 16:55:23 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 11 Dec 2019 16:55:23 +0000 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for implementing it. Unfortunately, we can't test it at the moment. I have a few change requests: macroAssembler_ppc.cpp I don't like silently emitting nothing in case !VM_Version::supports_data_cache_line_flush(). If you want to check for it, I suggest to assert VM_Version::supports_data_cache_line_flush() and avoid generating the stub otherwise (stubGenerator_ppc). ppc.ad The predicates are redundant and should better get removed (useless evaluation). cacheWBPreSync could use cost 0 for clearity. (The costs don't have any effect because there is no choice for the matcher.) stubGenerator_ppc.cpp I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that "bool true" is one Byte with the value 1 and the C calling convention enforces extension to 8 Byte. I would have used andi_ + bne to be on the safe side, but I believe your version is ok. Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'. Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Mittwoch, 11. Dezember 2019 01:05 > Cc: Andrew Dinn ; Baesken, Matthias > ; Doerr, Martin ; > hotspot-compiler-dev at openjdk.java.net > Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for > JEP-352 > > Hi, > > Could the following change be reviewed please? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8234599 > Webrev: http://cr.openjdk.java.net/~gromero/8234599/v1/ > > POWER9 does not have any cache management or barrier instructions aimed > specifically to deal with persistent memory (NVDIMM), hence in Power ISA > v3.0 > there are no instructions like data cache flush/store or sync instructions > specific to persistent memory. Nonetheless, in some cases (like through > hardware > emulation) POWER9 can support NVDIMM and if Linux supports DAX (direct > mapping of > persisnt memory) and a /dev/pmem device is available it's possible to use > data > cache and sync instructions in the ISA (which are not explicitly aimed to > persistent memory) on a memory region backed by DAX, i.e. mapped using > new > mmap() flag MAP_SYNC for persistent memory), so these instructions will > have the > same semantics as the instructions for data cache flush / store and sync on > other architectures supporting NVDIMM and that have explicit instructions to > deal > with persistent memory. > > This change adds support for JEP-352 on POWER9 using 'dcbst' plus a sync > instruction to sync data to a memory backed by DAX (to persistent memory) > when > that's required. > > Moreover, that change also paves the way for supporting NVDIMM on > future > Power CPUs that might have new data cache management and barrier > instructions > explicitly to deal with persistent memory. > > The change was developed and tested using a P9 QEMU VM (emulation) > with NVDIMM > support. For details on how to setup a proper QEMU + Linux kernel w/ a > /dev/pmem > device configured please see recipe in [2]. > > The JVM on a POWER9 machine with a /dev/pmem device properly set and > with that > change applied is able to pass the test for JEP-352 [3]. The JVM is also able > to pass all tests of Mashona library [4]. > > When DAX is not supported, like on POWER8 and POWER9 w/o DAX support, > OS won't > support mmap()'s MAP_SYNC flag, so kernel will return EOPNOTSUP when > code will > try to allocate memory backed by a persistent memory device, so the JVM > will get > a "java.io.IOException: Operation not supported". Naturally, on machines > that don't support writebacks or pmem supports_data_cache_line_flush() > will > return false, but even in that case JVM will hit a EOPNOTSUPP and get a > "java.io.IOException: Operation not supported" sooner than it has the > chance to > try to emit any writeback + sync instructions. > > Thank you. > > Best regards, > Gustavo > > [1] http://man7.org/linux/man-pages/man2/mmap.2.html > [2] https://github.com/gromero/nvdimm > [3] > http://hg.openjdk.java.net/jdk/jdk/file/336885e766af/test/jdk/java/nio/Ma > ppedByteBuffer/PmemTest.java > [4] https://github.com/jhalliday/mashona From john.r.rose at oracle.com Wed Dec 11 16:59:53 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 11 Dec 2019 08:59:53 -0800 Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, FmaV, AddReductionV, and MulReductionV nodes In-Reply-To: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> Message-ID: <80057428-11F9-41AB-AFE5-533873E05E1B@oracle.com> Whew! Reviewed. I did not (a) verify that the reduction code was correct before the change, nor did I (b) verify that there are no functional changes due to the change, though I did spot-check. Please, tell me the various reduction templates are adequately covered by our tests. For later, I wish we could handle the reduction cases more mechanically. We now have hand-maintained reduction trees for each power of two lane count. Clearly this could be more mechanized, by writing the reduction of 2N lanes to N lanes once by hand, and then applying it lg N times for any particular N. Obviously, don?t do that now. ? John > On Dec 11, 2019, at 3:53 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/ > https://bugs.openjdk.java.net/browse/JDK-8235756 > > Merge AD instructions for the following vector nodes: > - DivVF/DivVD > - SqrtVF/SqrtVD > - FmaVF/FmaVD > - AddReductionV* > - MulReductionV* > > Individual patches: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov From john.r.rose at oracle.com Wed Dec 11 17:03:29 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 11 Dec 2019 09:03:29 -0800 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com> References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com> Message-ID: <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com> Thanks for taking my comments into account. I looked again at your webrev and like it even better. Any of the current micro-versions on the table is OK with me. ? John > On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov wrote: > > Thanks for reviews, Vladimir and John. > > Updated version: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/ > > On 11.12.2019 05:48, Vladimir Kozlov wrote: >> In general I don't like using switches in this changes. In most examples you have only 2 instructions to choose from which could be done with 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and never will be hit - you should hit first checks in supported vector size code. > > Didn't have strong opinion about them (and still don't), so I refactored most of the switches to branches. Let me know how it looks now. > > Regarding ShouldNotReachHere(): it would be unfortunate if we have to take size code increase into account when using it to mark never-taken. > > Do you prefer "assert(false,...)" instead on for default case in switches? > >> I may prefer to see 2 AD instructions as you had in previous changes. > > Considering the main motivation is to reduce the number of instructions used, that would be a counter change. As I write later to John, I would like to see the dispatching be hidden inside MacroAssembler. It'll address your current concerns, right? > >> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8. > > Good catch! Fixed. > >> Why you need predicate for vabsnegD ? Other length is not supported anyway. > > Agree, fixed. > > > On 11.12.2019 03:00, John Rose wrote: > > Thank you, reviewed. > > > > For consistency, I?d expect to see the AD file mention vshiftq > > instead of in or in addition to the very specific evpsraq. > > Maybe actually call vshiftq (consistent with other parts of > > AD) and comment that it calls evpsraq? I had to look inside the > > macro assembler to verify that evpsraq was properly aligned > > with the other cases. Or just leave a comment saying this is > > what vshiftq would do also, like the other instructions. > > In general, I'd like to see all the hardware-specific dispatching logic to be moved into MacroAssembler and AD file just to call into them. > > But we (Jatin, Sandhya, and me) decided to limit the amount of refactorings and upstream what Jatin ended up with. > > Are you fine with covering evpsraq case in a follow-up change? > > > > For vabsnegF, I suggest adding a comment here: > > > > + predicate(n->as_Vector()->length() == 2 || > > + // case 4 is handled as a 1-operand instruction by vabsneg4F > > + n->as_Vector()->length() == 8 || > > + n->as_Vector()->length() == 16); > > > > I took slightly different route and rewrote it as follows: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html > > +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{ > + predicate(n->as_Vector()->length() != 4); // handled by 1-operand instruction vabsneg4F > > instruct vabsneg4F(vec dst, rRegI scratch) %{ > + predicate(n->as_Vector()->length() == 4); > > It looks clearer than the previous version. > > Best regards, > Vladimir Ivanov > > >> Vladimir >> On 12/10/19 2:29 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all >>> https://bugs.openjdk.java.net/browse/JDK-8235719 >>> >>> Merge AD instructions for the following vector nodes: >>> - LShiftV*, RShiftV*, URShiftV* >>> - AbsV* >>> - NegV* >>> >>> Individual patches: >>> >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual >>> >>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. >>> >>> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. >>> >>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >>> >>> Contributed-by: Jatin Bhateja >>> Reviewed-by: vlivanov, sviswanathan, ? >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >>> >>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html From tom.rodriguez at oracle.com Wed Dec 11 17:46:42 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 11 Dec 2019 09:46:42 -0800 Subject: RFR(S) 8229961: Assert failure in compiler/graalunit/HotspotTest.java In-Reply-To: References: Message-ID: <05bd6b65-d9ba-d69f-e3c8-1e8d830581c1@oracle.com> Thanks! tom Vladimir Kozlov wrote on 12/10/19 2:14 PM: > Looks good. Thank you for fixing JVMCI tests! > > Vladimir > > On 12/10/19 12:24 PM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/8229961/webrev >> https://bugs.openjdk.java.net/browse/JDK-8229961 >> >> The JVMCI InstalledCode object maintains a link back to the CodeBlob >> it's associated with.? In several places JVMCI extracts the CodeBlob* >> or nmethod* from the InstalledCode and the operates on it.? In most >> other cases where an nmethod is being examined there are external >> factors keeping it alive but in this case there aren't.? The actions >> of the concurrent sweeper can transition or potentially free the >> nmethod while it's being used.? Getting all the way to freeing would >> take a very adversarial schedule but JVMCI should provide more safety >> for the these code paths. >> >> This fix adds an nmethodLocker to these usages and through careful use >> of locks attempts to ensure that the resulting locked nmethod is alive >> and locked when it's returned.? I had to modify some of the jtreg >> tests because they were badly abusing the InstalledCode API and >> violating some assumptions about the relationship between nmethods and >> their wrapper InstalledCode objects. >> >> Testing was clean apart from a test compilation problem and existing >> issue.? I'm resubmitting with the fixed jtreg test. From vladimir.kozlov at oracle.com Wed Dec 11 18:55:05 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 10:55:05 -0800 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> Message-ID: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> Hi Kamath, First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. In vm_version_x86.cpp you need to add more %s in print statement for new output. You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. You need to add intrinsics to Graal's test to ignore them: http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 Thanks, Vladimir On 12/10/19 5:41 PM, Kamath, Smita wrote: > Hi, > > > As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. > > Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 > > Link to webrev : http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ > > > > I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. > > > [1] https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 2C 5-471) > > [2] https://software.intel.com/en-us/articles/intel-software-development-emulator > > > Regards, > > Smita Kamath > From vladimir.kozlov at oracle.com Wed Dec 11 19:34:45 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 11:34:45 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com> Message-ID: <6050ff35-2842-7a02-34f9-5dbe9a260d7c@oracle.com> I know it is pain but our coding stile requires to use {} for 'if' statement. And it will help visually to separate 'return' from 'break' statements. Otherwise looks good. Thanks, Vladimir On 12/11/19 3:55 AM, Vladimir Ivanov wrote: > >> Yes, fully agree. Updated version: >> ?? http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/ >> >> Got rid of ret_value and enhanced the comments, but also fixed >> long-standing bug you noticed: AbsVF should have the same additional >> checks as NegVF (since 512bit vandps is also introduced in AVX512DQ). > > Additionally, got rid of ret_value in Matcher::match_rule_supported() > and moved Op_RoundDoubleModeV there since it doesn't depend on vector > length: > > +??? case Op_RoundDoubleModeV: > +????? if (VM_Version::supports_avx() == false) { > +??????? return false; // 128bit vroundpd is not available > +????? } break; > > Best regards, > Vladimir Ivanov > >> On 11.12.2019 02:31, John Rose wrote: >>> Actually I have one more comment about the new classification logic: >>> >>> The ?ret_value? idiom is terrible. >>> >>> I see a function which is complex, with something like this at the top: >>> >>> ?? if (? simple size check ?) { >>> ?? ? ret_value = false; ? // size not supported >>> ?? } ? >>> >>> I want to read something more decisive like this: >>> >>> ?? if (? simple size check ?) { >>> ?? ? return false; ? // size not supported >>> ?? } ? >>> >>> The ?ret_value? thingy adds only noise, and no clarity. >>> >>> It is more than an annoyance, hence my comment here. >>> The problem is that if I want to understand the quick check >>> above, I have to scroll down *past all the other checks* to see >>> if some joker does ?ret_value = true? before the ?return ret_value?, >>> subverting my understanding of the code. ?Basically, the ?ret_value? >>> nonsense makes the code impossible to break into understandable >>> parts. >>> >>> ? Grumpy John >>> >>> On Dec 10, 2019, at 12:34 PM, John Rose >> > wrote: >>>> >>>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL >>>>> extensions), but maybe there'll be a moment when Skylake >>>>> (F+CD+BW+DQ+VL) can be chosen as the baseline. >>>> >>>> Yeah, I saw that coming after I visited the trusty intrinsics guide >>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ >>>>> >>>>> Anyway, I got your idea and it makes perfect sense to me to collect >>>>> such ideas. >>>> >>> From tom.rodriguez at oracle.com Wed Dec 11 20:17:06 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 11 Dec 2019 12:17:06 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com> References: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com> Message-ID: <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com> http://cr.openjdk.java.net/~never/8229377.1/webrev/ I have moved all the logic over into deoptimize_all_marked and added some comments to the method. I also rearranged the invalidation logic to make it cleaner. I've submitted the new version for testing. tom Tom Rodriguez wrote on 12/10/19 11:21 PM: >> Yes, right. I am confusing because not in all places we do this >> sequence: mark_for_deoptimization, make_not_entrant, patch frames. >> Patching frames are not always at the same place as we do marking. >> >> I also noticed that you changed under which locks make_not_entrant is >> done. May be better to pass nmethod into deoptimize_all_marked() and >> do it there (by default pass NULL)? > > You mean the CodeCache_lock?? The CompiledMethod_lock is only thing > required for make_not_entrant and that's acquired in > make_not_entrant_or_zombie.? The CodeCache_lock is required for the safe > iteration over the CodeCache itself. > > Though I kind of like your suggestion of passing the nmethod to the call > and performing the make_not_entrant call there instead.? It keeps the > logic together.? I'll make that change. > > tom > >> >> Thanks, >> Vladimir >> >>> >>> tom >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/9/19 12:10 PM, Tom Rodriguez wrote: >>>>> http://cr.openjdk.java.net/~never/8229377/webrev >>>>> https://bugs.openjdk.java.net/browse/JDK-8229377 >>>>> >>>>> This is a minor improvement to the JVMCI invalidate method to avoid >>>>> scanning large code caches when invalidating a single nmethod. >>>>> Instead the nmethod is directly made not_entrant.? In general I'm >>>>> unclear what the benefit of the >>>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. >>>>> Testing is in progress. >>>>> >>>>> JDK-8230884 had been previously duplicated against this because >>>>> they overlapped a bit, but in the interest of clarity I separated >>>>> them again. >>>>> >>>>> tom From vladimir.kozlov at oracle.com Wed Dec 11 20:42:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 12:42:33 -0800 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com> References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com> <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com> Message-ID: Changes mostly fine. My concern is only about vshiftL_arith_reg() for RShiftVL. It should use 'if' instead of switch statement and supported length should be filtered by match_rule_supported_vector(). Predicate should only check UseAVX <= 2. Thanks, Vladimir On 12/11/19 9:03 AM, John Rose wrote: > Thanks for taking my comments into account. > I looked again at your webrev and like it even better. > Any of the current micro-versions on the table is OK with me. > > ? John > >> On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov wrote: >> >> Thanks for reviews, Vladimir and John. >> >> Updated version: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/ >> >> On 11.12.2019 05:48, Vladimir Kozlov wrote: >>> In general I don't like using switches in this changes. In most examples you have only 2 instructions to choose from which could be done with 'if/else'. 'default: ShouldNotReachHere()' is big code if inlined and never will be hit - you should hit first checks in supported vector size code. >> >> Didn't have strong opinion about them (and still don't), so I refactored most of the switches to branches. Let me know how it looks now. >> >> Regarding ShouldNotReachHere(): it would be unfortunate if we have to take size code increase into account when using it to mark never-taken. >> >> Do you prefer "assert(false,...)" instead on for default case in switches? >> >>> I may prefer to see 2 AD instructions as you had in previous changes. >> >> Considering the main motivation is to reduce the number of instructions used, that would be a counter change. As I write later to John, I would like to see the dispatching be hidden inside MacroAssembler. It'll address your current concerns, right? >> >>> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8. >> >> Good catch! Fixed. >> >>> Why you need predicate for vabsnegD ? Other length is not supported anyway. >> >> Agree, fixed. >> >> >> On 11.12.2019 03:00, John Rose wrote: >>> Thank you, reviewed. >>> >>> For consistency, I?d expect to see the AD file mention vshiftq >>> instead of in or in addition to the very specific evpsraq. >>> Maybe actually call vshiftq (consistent with other parts of >>> AD) and comment that it calls evpsraq? I had to look inside the >>> macro assembler to verify that evpsraq was properly aligned >>> with the other cases. Or just leave a comment saying this is >>> what vshiftq would do also, like the other instructions. >> >> In general, I'd like to see all the hardware-specific dispatching logic to be moved into MacroAssembler and AD file just to call into them. >> >> But we (Jatin, Sandhya, and me) decided to limit the amount of refactorings and upstream what Jatin ended up with. >> >> Are you fine with covering evpsraq case in a follow-up change? >> >> >>> For vabsnegF, I suggest adding a comment here: >>> >>> + predicate(n->as_Vector()->length() == 2 || >>> + // case 4 is handled as a 1-operand instruction by vabsneg4F >>> + n->as_Vector()->length() == 8 || >>> + n->as_Vector()->length() == 16); >>> >> >> I took slightly different route and rewrote it as follows: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html >> >> +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{ >> + predicate(n->as_Vector()->length() != 4); // handled by 1-operand instruction vabsneg4F >> >> instruct vabsneg4F(vec dst, rRegI scratch) %{ >> + predicate(n->as_Vector()->length() == 4); >> >> It looks clearer than the previous version. >> >> Best regards, >> Vladimir Ivanov >> >> >>> Vladimir >>> On 12/10/19 2:29 PM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all >>>> https://bugs.openjdk.java.net/browse/JDK-8235719 >>>> >>>> Merge AD instructions for the following vector nodes: >>>> - LShiftV*, RShiftV*, URShiftV* >>>> - AbsV* >>>> - NegV* >>>> >>>> Individual patches: >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual >>>> >>>> As Jatin described, merging is applied only to AD instructions of similar shape. There are some more opportunities for reduction/merging left, but they are deliberately left out for future work. >>>> >>>> The patch is derived from the initial version of generic vector support [1]. Generic vector support was reviewed earlier [2]. >>>> >>>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >>>> >>>> Contributed-by: Jatin Bhateja >>>> Reviewed-by: vlivanov, sviswanathan, ? >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >>>> >>>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html > From vladimir.kozlov at oracle.com Wed Dec 11 20:46:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 12:46:33 -0800 Subject: RFR(S) 8229377: [JVMCI] Improve InstalledCode.invalidate for large code caches In-Reply-To: <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com> References: <5423b1de-d210-63aa-5bcc-d60ee18e12ef@oracle.com> <16bf48b3-2be5-4631-8d39-2939aa29d6b9@oracle.com> <3a5a5cde-32f9-bb0c-9d7d-fe6866a092ba@oracle.com> Message-ID: <94a5c390-78ca-0d29-712e-30fb3e4d5524@oracle.com> Nice. Thanks, Vladimir On 12/11/19 12:17 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8229377.1/webrev/ > > I have moved all the logic over into deoptimize_all_marked and added > some comments to the method.? I also rearranged the invalidation logic > to make it cleaner.? I've submitted the new version for testing. > > tom > > Tom Rodriguez wrote on 12/10/19 11:21 PM: >>> Yes, right. I am confusing because not in all places we do this >>> sequence: mark_for_deoptimization, make_not_entrant, patch frames. >>> Patching frames are not always at the same place as we do marking. >>> >>> I also noticed that you changed under which locks make_not_entrant is >>> done. May be better to pass nmethod into deoptimize_all_marked() and >>> do it there (by default pass NULL)? >> >> You mean the CodeCache_lock?? The CompiledMethod_lock is only thing >> required for make_not_entrant and that's acquired in >> make_not_entrant_or_zombie.? The CodeCache_lock is required for the >> safe iteration over the CodeCache itself. >> >> Though I kind of like your suggestion of passing the nmethod to the >> call and performing the make_not_entrant call there instead.? It keeps >> the logic together.? I'll make that change. >> >> tom >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> tom >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/9/19 12:10 PM, Tom Rodriguez wrote: >>>>>> http://cr.openjdk.java.net/~never/8229377/webrev >>>>>> https://bugs.openjdk.java.net/browse/JDK-8229377 >>>>>> >>>>>> This is a minor improvement to the JVMCI invalidate method to >>>>>> avoid scanning large code caches when invalidating a single >>>>>> nmethod. Instead the nmethod is directly made not_entrant.? In >>>>>> general I'm unclear what the benefit of the >>>>>> mark_for_deoptimization/make_marked_nmethods_not_entrant split is. >>>>>> Testing is in progress. >>>>>> >>>>>> JDK-8230884 had been previously duplicated against this because >>>>>> they overlapped a bit, but in the interest of clarity I separated >>>>>> them again. >>>>>> >>>>>> tom From david.holmes at oracle.com Wed Dec 11 21:02:57 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Dec 2019 07:02:57 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: Message-ID: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> On 12/12/2019 1:07 am, Reingruber, Richard wrote: > Hi David, > > > Most of the details here are in areas I can comment on in detail, but I > > did take an initial general look at things. > > Thanks for taking the time! Apologies the above should read: "Most of the details here are in areas I *can't* comment on in detail ..." David > > The only thing that jumped out at me is that I think the > > DeoptimizeObjectsALotThread should be a hidden thread. > > > > + bool is_hidden_from_external_view() const { return true; } > > Yes, it should. Will add the method like above. > > > Also I don't see any testing of the DeoptimizeObjectsALotThread. Without > > active testing this will just bit-rot. > > DeoptimizeObjectsALot is meant for stress testing with a larger workload. I will add a minimal test > to keep it fresh. > > > Also on the tests I don't understand your @requires clause: > > > > @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & > > (vm.opt.TieredCompilation != true)) > > > > This seems to require that TieredCompilation is disabled, but tiered is > > our normal mode of operation. ?? > > > > I removed the clause. I guess I wanted to target the tests towards the code they are supposed to > test, and it's easier to analyze failures w/o tiered compilation and with just one compiler thread. > > Additionally I will make use of compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. > > Thanks, > Richard. > > -----Original Message----- > From: David Holmes > Sent: Mittwoch, 11. Dezember 2019 08:03 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, > > On 11/12/2019 7:45 am, Reingruber, Richard wrote: >> Hi, >> >> I would like to get reviews please for >> >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >> >> Corresponding RFE: >> https://bugs.openjdk.java.net/browse/JDK-8227745 >> >> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >> >> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the >> change is being tested at SAP since I posted the first RFR some months ago. >> >> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI >> agents request capabilities that allow them to access local variable values. E.g. if you start-up >> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right >> from the beginning, well before a debugger attaches -- if ever one should do so. With the >> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based >> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item >> you'll find more details. > > Most of the details here are in areas I can comment on in detail, but I > did take an initial general look at things. > > The only thing that jumped out at me is that I think the > DeoptimizeObjectsALotThread should be a hidden thread. > > + bool is_hidden_from_external_view() const { return true; } > > Also I don't see any testing of the DeoptimizeObjectsALotThread. Without > active testing this will just bit-rot. > > Also on the tests I don't understand your @requires clause: > > @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & > (vm.opt.TieredCompilation != true)) > > This seems to require that TieredCompilation is disabled, but tiered is > our normal mode of operation. ?? > > Thanks, > David > >> Thanks, >> Richard. >> >> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >> From vladimir.x.ivanov at oracle.com Wed Dec 11 21:39:47 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 12 Dec 2019 00:39:47 +0300 Subject: [14] RFR (L): 8235719: C2: Merge AD instructions for ShiftV, AbsV, and NegV nodes In-Reply-To: References: <6dc1af79-126c-0760-5a82-ca6bda3723bc@oracle.com> <58948fc2-43fe-bd12-2185-4ab12580486a@oracle.com> <6f1312ca-9b0f-c1e9-8db0-292974c7990d@oracle.com> <441B6171-212F-4305-BE13-1C4E65D250DB@oracle.com> Message-ID: > My concern is only about vshiftL_arith_reg() for RShiftVL. It should use > 'if' instead of switch statement and supported length should be filtered > by match_rule_supported_vector(). Predicate should only check UseAVX <= 2. Good point, Vladimir. Fully agree. Will make the adjustment before the push. Best regards, Vladimir Ivanov > On 12/11/19 9:03 AM, John Rose wrote: >> Thanks for taking my comments into account. >> I looked again at your webrev and like it even better. >> Any of the current micro-versions on the table is OK with me. >> >> ? John >> >>> On Dec 11, 2019, at 2:31 AM, Vladimir Ivanov >>> wrote: >>> >>> Thanks for reviews, Vladimir and John. >>> >>> Updated version: >>> ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/ >>> >>> On 11.12.2019 05:48, Vladimir Kozlov wrote: >>>> In general I don't like using switches in this changes. In most >>>> examples you have only 2 instructions to choose from which could be >>>> done with 'if/else'. 'default: ShouldNotReachHere()' is big code if >>>> inlined and never will be hit - you should hit first checks in >>>> supported vector size code. >>> >>> Didn't have strong opinion about them (and still don't), so I >>> refactored most of the switches to branches.? Let me know how it >>> looks now. >>> >>> Regarding ShouldNotReachHere(): it would be unfortunate if we have to >>> take size code increase into account when using it to mark never-taken. >>> >>> Do you prefer "assert(false,...)" instead on for default case in >>> switches? >>> >>>> I may prefer to see 2 AD instructions as you had in previous changes. >>> >>> Considering the main motivation is to reduce the number of >>> instructions used, that would be a counter change. As I write later >>> to John, I would like to see the dispatching be hidden inside >>> MacroAssembler. It'll address your current concerns, right? >>> >>>> In vabsnegF() switch cases should be 2,8,16 instead of 2,4,8. >>> >>> Good catch! Fixed. >>> >>>> Why you need predicate for vabsnegD ? Other length is not supported >>>> anyway. >>> >>> Agree, fixed. >>> >>> >>> On 11.12.2019 03:00, John Rose wrote: >>>> Thank you, reviewed. >>>> >>>> For consistency, I?d expect to see the AD file mention vshiftq >>>> instead of in or in addition to the very specific evpsraq. >>>> Maybe actually call vshiftq (consistent with other parts of >>>> AD) and comment that it calls evpsraq?? I had to look inside the >>>> macro assembler to verify that evpsraq was properly aligned >>>> with the other cases.? Or just leave a comment saying this is >>>> what vshiftq would do also, like the other instructions. >>> >>> In general, I'd like to see all the hardware-specific dispatching >>> logic to be moved into MacroAssembler and AD file just to call into >>> them. >>> >>> But we (Jatin, Sandhya, and me) decided to limit the amount of >>> refactorings and upstream what Jatin ended up with. >>> >>> Are you fine with covering evpsraq case in a follow-up change? >>> >>> >>>> For vabsnegF, I suggest adding a comment here: >>>> >>>> +? predicate(n->as_Vector()->length() == 2 || >>>> +??????????? // case 4 is handled as a 1-operand instruction by >>>> vabsneg4F >>>> +??????????? n->as_Vector()->length() == 8 || >>>> +??????????? n->as_Vector()->length() == 16); >>>> >>> >>> I took slightly different route and rewrote it as follows: >>> >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.01/individual/webrev.08.vabsneg_float/src/hotspot/cpu/x86/x86.ad.udiff.html >>> >>> >>> +instruct vabsnegF(vec dst, vec src, rRegI scratch) %{ >>> +? predicate(n->as_Vector()->length() != 4); // handled by 1-operand >>> instruction vabsneg4F >>> >>> instruct vabsneg4F(vec dst, rRegI scratch) %{ >>> +? predicate(n->as_Vector()->length() == 4); >>> >>> It looks clearer than the previous version. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>>> Vladimir >>>> On 12/10/19 2:29 PM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/all >>>>> https://bugs.openjdk.java.net/browse/JDK-8235719 >>>>> >>>>> Merge AD instructions for the following vector nodes: >>>>> ??? - LShiftV*, RShiftV*, URShiftV* >>>>> ??? - AbsV* >>>>> ??? - NegV* >>>>> >>>>> Individual patches: >>>>> >>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235719/webrev.00/individual >>>>> >>>>> >>>>> As Jatin described, merging is applied only to AD instructions of >>>>> similar shape. There are some more opportunities for >>>>> reduction/merging left, but they are deliberately left out for >>>>> future work. >>>>> >>>>> The patch is derived from the initial version of generic vector >>>>> support [1]. Generic vector support was reviewed earlier [2]. >>>>> >>>>> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >>>>> >>>>> Contributed-by: Jatin Bhateja >>>>> Reviewed-by: vlivanov, sviswanathan, ? >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> [1] >>>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034822.html >>>>> >>>>> >>>>> [2] >>>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036016.html >>>>> >> From john.r.rose at oracle.com Wed Dec 11 22:33:16 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 11 Dec 2019 14:33:16 -0800 Subject: [14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information In-Reply-To: References: <0ca6d7f4-cc2d-43c3-5399-078733882a52@oracle.com> <8eeabbc8-4226-ae66-9e60-901ac25b65b1@oracle.com> <053A1066-E9BF-4A1A-811A-A32BAFC5AB47@oracle.com> <175c4ef5-dbe5-4ae8-a042-d23b0af466a2@oracle.com> <8ffcbd45-9027-9247-70d8-50257751069b@oracle.com> Message-ID: Two non-grumpy thumbs up! (Braces or not, either way, that?s up to Vladimir K.) > On Dec 11, 2019, at 3:55 AM, Vladimir Ivanov wrote: > > >> Yes, fully agree. Updated version: >> http://cr.openjdk.java.net/~vlivanov/8234392/webrev.01/ >> Got rid of ret_value and enhanced the comments, but also fixed long-standing bug you noticed: AbsVF should have the same additional checks as NegVF (since 512bit vandps is also introduced in AVX512DQ). > > Additionally, got rid of ret_value in Matcher::match_rule_supported() and moved Op_RoundDoubleModeV there since it doesn't depend on vector length: > > + case Op_RoundDoubleModeV: > + if (VM_Version::supports_avx() == false) { > + return false; // 128bit vroundpd is not available > + } break; > > Best regards, > Vladimir Ivanov > >> On 11.12.2019 02:31, John Rose wrote: >>> Actually I have one more comment about the new classification logic: >>> >>> The ?ret_value? idiom is terrible. >>> >>> I see a function which is complex, with something like this at the top: >>> >>> if (? simple size check ?) { >>> ret_value = false; // size not supported >>> } ? >>> >>> I want to read something more decisive like this: >>> >>> if (? simple size check ?) { >>> return false; // size not supported >>> } ? >>> >>> The ?ret_value? thingy adds only noise, and no clarity. >>> >>> It is more than an annoyance, hence my comment here. >>> The problem is that if I want to understand the quick check >>> above, I have to scroll down *past all the other checks* to see >>> if some joker does ?ret_value = true? before the ?return ret_value?, >>> subverting my understanding of the code. Basically, the ?ret_value? >>> nonsense makes the code impossible to break into understandable >>> parts. >>> >>> ? Grumpy John >>> >>> On Dec 10, 2019, at 12:34 PM, John Rose > wrote: >>>> >>>>> It would severely penalize Xeon Phis (which lack BW, DQ, and VL extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) can be chosen as the baseline. >>>> >>>> Yeah, I saw that coming after I visited the trusty intrinsics guide >>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ >>>>> >>>>> Anyway, I got your idea and it makes perfect sense to me to collect such ideas. >>>> >>> From vladimir.kozlov at oracle.com Wed Dec 11 23:00:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 15:00:08 -0800 Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, FmaV, AddReductionV, and MulReductionV nodes In-Reply-To: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> Message-ID: <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com> webrev.03.vsqrt_double changes include code from webrev.02.vsqrt_float and webrev.05.vfma_double - from webrev.04.vfma_float. Otherwise Div, Sqrt, Fma code changes are fine. Even with above mess-up. May be you need to separate Reduction changes into a separate patch so we can have more time to review it. It is more complex. Thanks, Vladimir On 12/11/19 3:53 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/ > https://bugs.openjdk.java.net/browse/JDK-8235756 > > Merge AD instructions for the following vector nodes: > ? - DivVF/DivVD > ? - SqrtVF/SqrtVD > ? - FmaVF/FmaVD > ? - AddReductionV* > ? - MulReductionV* > > Individual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual > > Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Dec 11 23:30:31 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Dec 2019 15:30:31 -0800 Subject: RFR(XL) 8235634: Update Graal In-Reply-To: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com> References: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com> Message-ID: Looks good. Testing results are good too. Thanks, Vladimir On 12/10/19 11:11 PM, Igor Veresov wrote: > Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 > > The list of changes can be found in the JBS issue. > > Thanks, > igor > > > From ekaterina.pavlova at oracle.com Thu Dec 12 00:41:39 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 11 Dec 2019 16:41:39 -0800 Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run with ZGC Message-ID: <7f7ece2c-de80-99bc-68ef-e4db180fce47@oracle.com> Please review the urgent fix for regression introduced by JDK-8215728. graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there. The tests didn't run before because they had @requires vm.opt.final.EnableJVMCI == true With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore but set it explicitly in graal tests. Not all GCs support JVMCI compiler and in particular ZGC. The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests. Also removed graal unit tests from hotspot_compiler_xcomp group. @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI. JBS: https://bugs.openjdk.java.net/browse/JDK-8235773 webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html testing: tier1-3 regards, -katya From igor.ignatyev at oracle.com Thu Dec 12 00:52:52 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Wed, 11 Dec 2019 16:52:52 -0800 Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run with ZGC Message-ID: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com> ? LGTM. One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ? ? Igor > On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova wrote: > > ? > Please review the urgent fix for regression introduced by JDK-8215728. > > graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there. > The tests didn't run before because they had > @requires vm.opt.final.EnableJVMCI == true > > With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore > but set it explicitly in graal tests. > > Not all GCs support JVMCI compiler and in particular ZGC. > The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests. > Also removed graal unit tests from hotspot_compiler_xcomp group. > > @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI. > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235773 > webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html > testing: tier1-3 > > regards, > -katya From ekaterina.pavlova at oracle.com Thu Dec 12 01:15:27 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 11 Dec 2019 17:15:27 -0800 Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run with ZGC In-Reply-To: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com> References: <84F0084D-E8C3-4DF2-9783-0BC6AC1E8ED8@oracle.com> Message-ID: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com> On 12/11/19 4:52 PM, Igor Ignatev wrote: > ? > LGTM. > One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ? agree, this is better, regenerated webrev thanks! -katya > ? Igor > >> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova wrote: >> >> ? >> Please review the urgent fix for regression introduced by JDK-8215728. >> >> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there. >> The tests didn't run before because they had >> @requires vm.opt.final.EnableJVMCI == true >> >> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore >> but set it explicitly in graal tests. >> >> Not all GCs support JVMCI compiler and in particular ZGC. >> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests. >> Also removed graal unit tests from hotspot_compiler_xcomp group. >> >> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI. >> >> >> >> ????JBS: https://bugs.openjdk.java.net/browse/JDK-8235773 >> ?webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html >> testing: tier1-3 >> >> regards, >> -katya >> From igor.ignatyev at oracle.com Thu Dec 12 01:21:03 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Wed, 11 Dec 2019 17:21:03 -0800 Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run with ZGC In-Reply-To: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com> References: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com> Message-ID: <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com> Perfect. Ship it. ? Igor > On Dec 11, 2019, at 5:15 PM, Ekaterina Pavlova wrote: > > ?On 12/11/19 4:52 PM, Igor Ignatev wrote: >> ? >> LGTM. >> One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ? > > agree, this is better, regenerated webrev > > thanks! > > -katya > >> ? Igor >>>> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova wrote: >>> >>> ? >>> Please review the urgent fix for regression introduced by JDK-8215728. >>> >>> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there. >>> The tests didn't run before because they had >>> @requires vm.opt.final.EnableJVMCI == true >>> >>> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore >>> but set it explicitly in graal tests. >>> >>> Not all GCs support JVMCI compiler and in particular ZGC. >>> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests. >>> Also removed graal unit tests from hotspot_compiler_xcomp group. >>> >>> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI. >>> >>> >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235773 >>> webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html >>> testing: tier1-3 >>> >>> regards, >>> -katya >>> > From ekaterina.pavlova at oracle.com Thu Dec 12 01:37:10 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 11 Dec 2019 17:37:10 -0800 Subject: RFR(T/XS) 8235773: Tier3 fails because graalunit tests started to run with ZGC In-Reply-To: <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com> References: <62f24e6d-3256-e46d-e868-09208780939e@oracle.com> <8A188BA9-BDC3-4A1B-AD38-ACE16657AA83@oracle.com> Message-ID: <25a68d6b-2f14-654e-8b15-74bb8de5a2ba@oracle.com> done On 12/11/19 5:21 PM, Igor Ignatev wrote: > Perfect. Ship it. > > ? Igor > >> On Dec 11, 2019, at 5:15 PM, Ekaterina Pavlova wrote: >> >> ?On 12/11/19 4:52 PM, Igor Ignatev wrote: >>> ? >>> LGTM. >>> One comment though, won?t it be more appropriate to add graal unit tests to tier1_compiler_not_xcomp ? >> >> agree, this is better, regenerated webrev >> >> thanks! >> >> -katya >> >>> ? Igor >>>>> On Dec 11, 2019, at 4:41 PM, Ekaterina Pavlova wrote: >>>> >>>> ? >>>> Please review the urgent fix for regression introduced by JDK-8215728. >>>> >>>> graalunit tests are part of hotspot_compiler_all_gcs jtreg group and started to fail there. >>>> The tests didn't run before because they had >>>> @requires vm.opt.final.EnableJVMCI == true >>>> >>>> With JDK-8215728 this check was deleted as we don't rely on extra EnableJVMCI flag anymore >>>> but set it explicitly in graal tests. >>>> >>>> Not all GCs support JVMCI compiler and in particular ZGC. >>>> The fix removes graaalunit tests from hotspot_compiler_all_gcs similar way it is done for aot_jvmci tests. >>>> Also removed graal unit tests from hotspot_compiler_xcomp group. >>>> >>>> @requires vm.jvmci was added to all tests so the tests are not run on platforms which don't support JVMCI. >>>> >>>> >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8235773 >>>> webrev: http://cr.openjdk.java.net/~epavlova//8235773/webrev.00/index.html >>>> testing: tier1-3 >>>> >>>> regards, >>>> -katya >>>> >> > From ekaterina.pavlova at oracle.com Thu Dec 12 03:41:33 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 11 Dec 2019 19:41:33 -0800 Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp Message-ID: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com> Please review one more issue which could result in tier1/tier3 failures. Tier1 supposes to run only subset of graal unit tests as defined by tier1_compiler_graal group. So compiler/graalunit were wrongly included into tier1_compiler_not_xcomp group. JBS: https://bugs.openjdk.java.net/browse/JDK-8235808 webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html testing: tier1-3 (in progress) thanks, -katya From igor.ignatyev at oracle.com Thu Dec 12 03:53:11 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 11 Dec 2019 19:53:11 -0800 Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp In-Reply-To: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com> References: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com> Message-ID: <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com> thanks for fixing that. reviewed. -- Igor > On Dec 11, 2019, at 7:41 PM, Ekaterina Pavlova wrote: > > Please review one more issue which could result in tier1/tier3 failures. > Tier1 supposes to run only subset of graal unit tests as defined by > tier1_compiler_graal group. So compiler/graalunit were wrongly included > into tier1_compiler_not_xcomp group. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8235808 > webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html > testing: tier1-3 (in progress) > > > thanks, > -katya From igor.veresov at oracle.com Thu Dec 12 04:37:15 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 11 Dec 2019 20:37:15 -0800 Subject: RFR(XL) 8235634: Update Graal In-Reply-To: References: <4C30F569-6D7E-48C3-984F-32E03CDD5633@oracle.com> Message-ID: <004ABC75-BB6F-4630-99C0-442BE6853F50@oracle.com> Thanks, Vladimir! igor > On Dec 11, 2019, at 3:30 PM, Vladimir Kozlov wrote: > > Looks good. Testing results are good too. > > Thanks, > Vladimir > > On 12/10/19 11:11 PM, Igor Veresov wrote: >> Webrev: http://cr.openjdk.java.net/~iveresov/8235634/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8235634 >> The list of changes can be found in the JBS issue. >> Thanks, >> igor From ekaterina.pavlova at oracle.com Thu Dec 12 05:34:40 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 11 Dec 2019 21:34:40 -0800 Subject: RFR (T/XS) 8235808: Remove graalunit from tier1_compiler_not_xcomp In-Reply-To: <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com> References: <1b201409-cc53-19f5-68b2-e06e2975d402@oracle.com> <4A90A180-64AD-46CF-AD76-036F68CC6B2F@oracle.com> Message-ID: <20baa229-df8a-334d-22a6-7e7e348eeb66@oracle.com> All testing passed, integrated the change. Thanks Igor for prompt review! -katya On 12/11/19 7:53 PM, Igor Ignatyev wrote: > thanks for fixing that. reviewed. > -- Igor > >> On Dec 11, 2019, at 7:41 PM, Ekaterina Pavlova wrote: >> >> Please review one more issue which could result in tier1/tier3 failures. >> Tier1 supposes to run only subset of graal unit tests as defined by >> tier1_compiler_graal group. So compiler/graalunit were wrongly included >> into tier1_compiler_not_xcomp group. >> >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8235808 >> webrev: http://cr.openjdk.java.net/~epavlova//8235808/webrev.00/index.html >> testing: tier1-3 (in progress) >> >> >> thanks, >> -katya > From felix.yang at huawei.com Thu Dec 12 06:24:25 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 12 Dec 2019 06:24:25 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi, I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/ Tested tier1-3 with both aarch64 and x86_64 linux release build. Newly added test case fail without the patch and pass with the patch. Thanks, Felix > -----Original Message----- > From: Yangfei (Felix) > Sent: Wednesday, December 11, 2019 11:12 PM > To: 'Christian Hagedorn' ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation > > Hi Christian, > > Thanks for the suggestions. Comments inlined. > > > -----Original Message----- > > From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com] > > Sent: Wednesday, December 11, 2019 10:54 PM > > To: Yangfei (Felix) ; Tobias Hartmann > > ; hotspot-compiler-dev at openjdk.java.net > > Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 > > compilation > > > > Hi Felix > > > > Thanks for working on that. Your fix also seems to work for JDK-8235700. > > I closed that one as a duplicate of yours. > > > > >>> + for (int i = 0; i < orig_msize; i++) { > > > > Should be uint since orig_msize is a uint > > -- Yes, will modify accordingly when I am preparing webrev. > > > > > >>> + best_align_to_mem_ref = find_align_to_ref(memops, > > >> max_idx); > > >>> + assert(best_align_to_mem_ref == NULL, "sanity"); > > > > You can merge these two lines together into > > assert(find_align_to_ref(memops, > > max_idx) == NULL, "sanity"); since the call belongs to the sanity > > check. Or just surround it by a #ifdef ASSERT. > > -- The purpose of line 721 here is to calculate the max_idx. > So I don't think it's suitable to treat this line as assertion logic. > > > >>> + idx = max_idx; > > > > Is max_idx always guaranteed to be valid and not -1 when accessing it later? > > -- Yes, I think so. > When memops is not empty and the memory ops in memops are not > comparable, find_align_to_ref will always sets its max_idx. > > Thanks, > Felix From vladimir.x.ivanov at oracle.com Thu Dec 12 10:22:06 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 12 Dec 2019 13:22:06 +0300 Subject: [14] RFR (L): 8235756: C2: Merge AD instructions for DivV, SqrtV, FmaV, AddReductionV, and MulReductionV nodes In-Reply-To: <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com> References: <00dd8e05-97ef-7323-c221-2f25122f2f64@oracle.com> <6ab0d5cd-3720-a08d-2b97-a520ffc0ce34@oracle.com> Message-ID: Thanks for the reviews, Vladimir & John. > May be you need to separate Reduction changes into a separate patch so > we can have more time to review it. It is more complex. Sure, I filed JDK-8235824 [1] and removed all reduction-related changes from current patch. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8235824 > On 12/11/19 3:53 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/all/ >> https://bugs.openjdk.java.net/browse/JDK-8235756 >> >> Merge AD instructions for the following vector nodes: >> ?? - DivVF/DivVD >> ?? - SqrtVF/SqrtVD >> ?? - FmaVF/FmaVD >> ?? - AddReductionV* >> ?? - MulReductionV* >> >> Individual patches: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235756/webrev.00/individual >> >> >> Testing: tier1-4, test run on different CPU flavors (KNL, SKL, etc) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Dec 12 10:40:26 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 12 Dec 2019 13:40:26 +0300 Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV and MulReductionV nodes Message-ID: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/all/ https://bugs.openjdk.java.net/browse/JDK-8235824 Merge AD instructions for the following vector nodes: - AddReductionV* - MulReductionV* Individual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/individual Testing: tier1-4, test run on different CPU flavors (KNL, CLX) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, jrose, ? Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Dec 12 11:19:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 12 Dec 2019 14:19:57 +0300 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes Message-ID: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ https://bugs.openjdk.java.net/browse/JDK-8235825 Merge AD instructions for the following vector nodes: - ReplicateB, ..., ReplicateD Individual patches: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual Testing: tier1-4, test run on different CPU flavors (KNL, CLX) Contributed-by: Jatin Bhateja Reviewed-by: vlivanov, sviswanathan, ? Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Dec 12 14:41:07 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 12 Dec 2019 17:41:07 +0300 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: Hi Jason, > I once experimented [2] with a change to Hotspot to consider our > bridge-like forwarder methods as exempt from the MaxInlineLevel budget, as > is the case for lambda form frames. That change alone delivered most of the > performance benefit of doubling MaxInlneLevel with an unmodified JVM. Nice! The idea to exclude bridge-like methods from max inline level accounting looks very promising. Do you have any plans to continue working on it and contribute as a patch into the mainline at some point? On the heuristic itself, it looks like it can be safely generalized to any methods which just call another method irrespective of how many arguments they pass (and in what order). Best regards, Vladimir Ivanov > [1] https://github.com/scala/bug/issues/11627#issuecomment-514490505 > [2] https://gist.github.com/retronym/d27090a68570485c3e329a52db0c7b25 From vladimir.kozlov at oracle.com Thu Dec 12 18:20:25 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Dec 2019 10:20:25 -0800 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> Message-ID: <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> Hi David, Tiered is disabled because we don't want to see compilations and outputs from C1 compiler which does not have EA. The test is specifically written for C2 only (not for C1 or Graal) to verify its Escape Analysis optimization. I did not look in great details into test's code but its analysis may be affected if C1 compiler is also used. Richard may clarify this. thanks, Vladimir On 12/11/19 1:04 PM, David Holmes wrote: > On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >> I will do full review later. I want to comment about test command line. >> >> You don't need vm.opt.TieredCompilation != true in @requires because >> you specified -XX:-TieredCompilation in @run command. > > And per my comment this should be being tested with tiered as well. > > David > >> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >> test from running in Interpreter mode too. >> >> Thanks, >> Vladimir >> >> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>> Hi David, >>> >>> ?? > Most of the details here are in areas I can comment on in >>> detail, but I >>> ?? > did take an initial general look at things. >>> >>> Thanks for taking the time! >>> >>> ?? > The only thing that jumped out at me is that I think the >>> ?? > DeoptimizeObjectsALotThread should be a hidden thread. >>> ?? > >>> ?? > +? bool is_hidden_from_external_view() const { return true; } >>> >>> Yes, it should. Will add the method like above. >>> >>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>> Without >>> ?? > active testing this will just bit-rot. >>> >>> DeoptimizeObjectsALot is meant for stress testing with a larger >>> workload. I will add a minimal test >>> to keep it fresh. >>> >>> ?? > Also on the tests I don't understand your @requires clause: >>> ?? > >>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>> ?? > (vm.opt.TieredCompilation != true)) >>> ?? > >>> ?? > This seems to require that TieredCompilation is disabled, but >>> tiered is >>> ?? > our normal mode of operation. ?? >>> ?? > >>> >>> I removed the clause. I guess I wanted to target the tests towards >>> the code they are supposed to >>> test, and it's easier to analyze failures w/o tiered compilation and >>> with just one compiler thread. >>> >>> Additionally I will make use of >>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>> >>> Thanks, >>> Richard. >>> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net; >>> hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>> Hi, >>>> >>>> I would like to get reviews please for >>>> >>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>> >>>> Corresponding RFE: >>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>> >>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>> >>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>> issues (thanks!). In addition the >>>> change is being tested at SAP since I posted the first RFR some >>>> months ago. >>>> >>>> The intention of this enhancement is to benefit performance wise >>>> from escape analysis even if JVMTI >>>> agents request capabilities that allow them to access local variable >>>> values. E.g. if you start-up >>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>> escape analysis is disabled right >>>> from the beginning, well before a debugger attaches -- if ever one >>>> should do so. With the >>>> enhancement, escape analysis will remain enabled until and after a >>>> debugger attaches. EA based >>>> optimizations are reverted just before an agent acquires the >>>> reference to an object. In the JBS item >>>> you'll find more details. >>> >>> Most of the details here are in areas I can comment on in detail, but I >>> did take an initial general look at things. >>> >>> The only thing that jumped out at me is that I think the >>> DeoptimizeObjectsALotThread should be a hidden thread. >>> >>> +? bool is_hidden_from_external_view() const { return true; } >>> >>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >>> active testing this will just bit-rot. >>> >>> Also on the tests I don't understand your @requires clause: >>> >>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>> (vm.opt.TieredCompilation != true)) >>> >>> This seems to require that TieredCompilation is disabled, but tiered is >>> our normal mode of operation. ?? >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Richard. >>>> >>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>> >>>> From aph at redhat.com Thu Dec 12 19:14:53 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 12 Dec 2019 19:14:53 +0000 Subject: RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset Message-ID: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> This assertion failure happens because load/store patterns in aarch64.ad are incorrect. The operands immIOffset and operand immLoffset() are only correct for load/store *byte* offsets. Offsets of sizes greater than a byte should be shifted by the operand size, and misaligned offsets are only allowed for a small range. We get this wrong, so we try to use misaligned byte addresses for sizes greater than byte. This fails at compile time. We've never noticed this before because Java code doesn't generate misaligned offsets, so we can only test this fix by using Unsafe (or a customer of Unsafe such as ByteBuffer.) Wang Zhuo(Zhuoren) wrote a patch for this bug but it was incomplete; the problem reached deeper than we at first realized. Zhuoren's approach was to fix up the code generation after pattern matching, but this masks an efficiency problem. In many cases where we could use offset addresses, we don't because the pattern matcher incorrectly decides offsets are out of range. This patch fixes both problems by using memory types that are correct for all operand sizes. These are memory1, memory2, memory4, etc; this naming fits in with the existing types used by vectors. It does lead to a rather large patch, but it's not quite as bad as it looks because much of the code is auto-generated by the script ad_encode.m4. Unfortunately, in the time since it was written some developers have edited that auto-generated section of aarch64.ad by hand, so I had to move these hand edits out of the way. I also I have also added big scary // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE comments so this doesn't happen again. In a rather belt-and-braces way I've also added some code that fixes up illegal addresses. I did consider removing it, but I left it in because it doesn't hurt. If we ever do generate similar illegal addresses, debug builds will assert. I'm not sure whether to keep this or not. Andrew Dinn will probably have a cow when he sees this patch. :-) OK for HEAD? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From john.r.rose at oracle.com Thu Dec 12 22:20:57 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 12 Dec 2019 14:20:57 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: On Dec 12, 2019, at 6:41 AM, Vladimir Ivanov wrote: > > On the heuristic itself, it looks like it can be safely generalized to any methods which just call another method irrespective of how many arguments they pass (and in what order). Yes, I like that heuristic; it boils down to a method body containing just one non-trivial method invocation, plus ?other cheap stuff?. I?d want to consider making sure that the ?other cheap stuff? which is disregarded include method invocations that we know are commonly used in adapters (including those of lambda forms), such as: - type adjusting calls: Class.cast, W.valueOf, W.XValue, etc. (incl. internals) - maybe queries such as Class.isInstance - resolved non-invocations like getfield, getstatic, ldc - other stuff that we know ?bottoms out? quickly The point about ?bottoming out? is that a tree with one main branch and no side branches, or shallow twiggy side branches only, does not expand more than linearly into IR after inlining. The point about focusing on type adaptation stuff is that such stuff tends to fold away when stacked on top of itself. There are only so many type adaptations you can perform repeatedly on the same value before you reach a fix point, and the JIT is good at computing such fix points. The result is that such single-branch trees may be expected to fold into sub-linear IR, compared to the size of the original branch, and that?s the win we are after. I suppose the ?cheap stuff" calls might be charged to a side counter, which if it gets huge (>100) stops the inline from going deeper. Let the prototyping begin. My $0.02. ? John P.S. I?m glad we are discussing this. For the record, we have historically stepped with great caution into changing the inlining heuristics, because beneficial changes, even if small, can cause rare regressions. In an ecosystem as large as ours, even rare regressions can be a significant cost. But now, as we are getting used to the new discipline of updating on a regular 6-month cadence, I think we have a much healthier process in which to detect and fix regressions that may stem from changed heuristics. P.P.S Do we have a greater risk of regression because there is something broken about our heuristics? Well, there is an art to building heuristics which are stable, when your system dynamics have (like ours) a nearly infinite set of non-linear feedback loops. Our heuristics, being simple, are reasonably stable. Any new heuristics, like the ones I proposed above, should be examined for stability as well as effectiveness. In the end, a perfect heuristic would have to closely emulate the future dynamics of the system being optimized, and that (as everyone knows) is almost certainly no cheaper than just running the system un-optimized. In general, there will always be workloads that defeat any particular heuristic, even if that heuristic produces good (or null) results 99% of the time. From smita.kamath at intel.com Thu Dec 12 22:43:50 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Thu, 12 Dec 2019 22:43:50 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> Message-ID: <6563F381B547594081EF9DE181D07912B2D7B0A9@fmsmsx123.amr.corp.intel.com> Hi Vladimir, Thanks for reviewing the code. I will make changes to the code as per your suggestion and submit another webrev for review. I will also post performance gains expected with this change (with and without VBMI2). The vector instructions in this code will be executed only after a threshold has reached. I have taken care that the vector code will be executed only on CPU's supporting VBMI2. Thanks, Smita -----Original Message----- From: Vladimir Kozlov Sent: Wednesday, December 11, 2019 10:55 AM To: Kamath, Smita ; 'hotspot compiler' ; Viswanathan, Sandhya Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 Hi Kamath, First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. In vm_version_x86.cpp you need to add more %s in print statement for new output. You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. You need to add intrinsics to Graal's test to ignore them: http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 Thanks, Vladimir On 12/10/19 5:41 PM, Kamath, Smita wrote: > Hi, > > > As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. > > Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 > > Link to webrev : > http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ > > > > I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. > > > [1] > https://software.intel.com/sites/default/files/managed/39/c5/325462-sd > m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. > 2C 5-471) > > [2] > https://software.intel.com/en-us/articles/intel-software-development-e > mulator > > > Regards, > > Smita Kamath > From richard.reingruber at sap.com Thu Dec 12 23:02:26 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Thu, 12 Dec 2019 23:02:26 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> Message-ID: Hello Vladimir, thanks for having a look. > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip > test from running in Interpreter mode too. Done. > You don't need vm.opt.TieredCompilation != true in @requires because you > specified -XX:-TieredCompilation in @run command. Ok. > The test is specifically written for C2 only (not for C1 or Graal) to > verify its Escape Analysis optimization. > I did not look in great details into test's code but its analysis may be > affected if C1 compiler is also used. > > Richard may clarify this. The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1 compiled before doesn't matter all that much. I've got a slight preference to disabled tiered compilation for simplicity. Thanks, Richard. -----Original Message----- From: Vladimir Kozlov Sent: Donnerstag, 12. Dezember 2019 19:20 To: David Holmes ; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi David, Tiered is disabled because we don't want to see compilations and outputs from C1 compiler which does not have EA. The test is specifically written for C2 only (not for C1 or Graal) to verify its Escape Analysis optimization. I did not look in great details into test's code but its analysis may be affected if C1 compiler is also used. Richard may clarify this. thanks, Vladimir On 12/11/19 1:04 PM, David Holmes wrote: > On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >> I will do full review later. I want to comment about test command line. >> >> You don't need vm.opt.TieredCompilation != true in @requires because >> you specified -XX:-TieredCompilation in @run command. > > And per my comment this should be being tested with tiered as well. > > David > >> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >> test from running in Interpreter mode too. >> >> Thanks, >> Vladimir >> >> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>> Hi David, >>> >>> ?? > Most of the details here are in areas I can comment on in >>> detail, but I >>> ?? > did take an initial general look at things. >>> >>> Thanks for taking the time! >>> >>> ?? > The only thing that jumped out at me is that I think the >>> ?? > DeoptimizeObjectsALotThread should be a hidden thread. >>> ?? > >>> ?? > +? bool is_hidden_from_external_view() const { return true; } >>> >>> Yes, it should. Will add the method like above. >>> >>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>> Without >>> ?? > active testing this will just bit-rot. >>> >>> DeoptimizeObjectsALot is meant for stress testing with a larger >>> workload. I will add a minimal test >>> to keep it fresh. >>> >>> ?? > Also on the tests I don't understand your @requires clause: >>> ?? > >>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>> ?? > (vm.opt.TieredCompilation != true)) >>> ?? > >>> ?? > This seems to require that TieredCompilation is disabled, but >>> tiered is >>> ?? > our normal mode of operation. ?? >>> ?? > >>> >>> I removed the clause. I guess I wanted to target the tests towards >>> the code they are supposed to >>> test, and it's easier to analyze failures w/o tiered compilation and >>> with just one compiler thread. >>> >>> Additionally I will make use of >>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>> >>> Thanks, >>> Richard. >>> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net; >>> hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>> Hi, >>>> >>>> I would like to get reviews please for >>>> >>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>> >>>> Corresponding RFE: >>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>> >>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>> >>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>> issues (thanks!). In addition the >>>> change is being tested at SAP since I posted the first RFR some >>>> months ago. >>>> >>>> The intention of this enhancement is to benefit performance wise >>>> from escape analysis even if JVMTI >>>> agents request capabilities that allow them to access local variable >>>> values. E.g. if you start-up >>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>> escape analysis is disabled right >>>> from the beginning, well before a debugger attaches -- if ever one >>>> should do so. With the >>>> enhancement, escape analysis will remain enabled until and after a >>>> debugger attaches. EA based >>>> optimizations are reverted just before an agent acquires the >>>> reference to an object. In the JBS item >>>> you'll find more details. >>> >>> Most of the details here are in areas I can comment on in detail, but I >>> did take an initial general look at things. >>> >>> The only thing that jumped out at me is that I think the >>> DeoptimizeObjectsALotThread should be a hidden thread. >>> >>> +? bool is_hidden_from_external_view() const { return true; } >>> >>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >>> active testing this will just bit-rot. >>> >>> Also on the tests I don't understand your @requires clause: >>> >>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>> (vm.opt.TieredCompilation != true)) >>> >>> This seems to require that TieredCompilation is disabled, but tiered is >>> our normal mode of operation. ?? >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Richard. >>>> >>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>> >>>> From david.holmes at oracle.com Thu Dec 12 23:32:40 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Dec 2019 09:32:40 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> Message-ID: <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com> On 13/12/2019 9:02 am, Reingruber, Richard wrote: > Hello Vladimir, > > thanks for having a look. > > > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip > > test from running in Interpreter mode too. > > Done. > > > You don't need vm.opt.TieredCompilation != true in @requires because you > > specified -XX:-TieredCompilation in @run command. > > Ok. > > > The test is specifically written for C2 only (not for C1 or Graal) to > > verify its Escape Analysis optimization. > > I did not look in great details into test's code but its analysis may be > > affected if C1 compiler is also used. > > > > Richard may clarify this. > > The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1 > compiled before doesn't matter all that much. I've got a slight preference to disabled tiered > compilation for simplicity. My concern - perhaps unfounded - is that this seems to be being tested only in a pure C2 environment when the actual changes will have to operate correctly in a tiered environment (and JVMCI). Thanks, David > Thanks, Richard. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Donnerstag, 12. Dezember 2019 19:20 > To: David Holmes ; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi David, > > Tiered is disabled because we don't want to see compilations and outputs > from C1 compiler which does not have EA. > > The test is specifically written for C2 only (not for C1 or Graal) to > verify its Escape Analysis optimization. > I did not look in great details into test's code but its analysis may be > affected if C1 compiler is also used. > > Richard may clarify this. > > thanks, > Vladimir > > On 12/11/19 1:04 PM, David Holmes wrote: >> On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >>> I will do full review later. I want to comment about test command line. >>> >>> You don't need vm.opt.TieredCompilation != true in @requires because >>> you specified -XX:-TieredCompilation in @run command. >> >> And per my comment this should be being tested with tiered as well. >> >> David >> >>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >>> test from running in Interpreter mode too. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>>> Hi David, >>>> >>>> ?? > Most of the details here are in areas I can comment on in >>>> detail, but I >>>> ?? > did take an initial general look at things. >>>> >>>> Thanks for taking the time! >>>> >>>> ?? > The only thing that jumped out at me is that I think the >>>> ?? > DeoptimizeObjectsALotThread should be a hidden thread. >>>> ?? > >>>> ?? > +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Yes, it should. Will add the method like above. >>>> >>>> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>> Without >>>> ?? > active testing this will just bit-rot. >>>> >>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>> workload. I will add a minimal test >>>> to keep it fresh. >>>> >>>> ?? > Also on the tests I don't understand your @requires clause: >>>> ?? > >>>> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> ?? > (vm.opt.TieredCompilation != true)) >>>> ?? > >>>> ?? > This seems to require that TieredCompilation is disabled, but >>>> tiered is >>>> ?? > our normal mode of operation. ?? >>>> ?? > >>>> >>>> I removed the clause. I guess I wanted to target the tests towards >>>> the code they are supposed to >>>> test, and it's easier to analyze failures w/o tiered compilation and >>>> with just one compiler thread. >>>> >>>> Additionally I will make use of >>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>> >>>> Thanks, >>>> Richard. >>>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net; >>>> hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi Richard, >>>> >>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>> Hi, >>>>> >>>>> I would like to get reviews please for >>>>> >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>> >>>>> Corresponding RFE: >>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>> >>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>> >>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>> issues (thanks!). In addition the >>>>> change is being tested at SAP since I posted the first RFR some >>>>> months ago. >>>>> >>>>> The intention of this enhancement is to benefit performance wise >>>>> from escape analysis even if JVMTI >>>>> agents request capabilities that allow them to access local variable >>>>> values. E.g. if you start-up >>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>> escape analysis is disabled right >>>>> from the beginning, well before a debugger attaches -- if ever one >>>>> should do so. With the >>>>> enhancement, escape analysis will remain enabled until and after a >>>>> debugger attaches. EA based >>>>> optimizations are reverted just before an agent acquires the >>>>> reference to an object. In the JBS item >>>>> you'll find more details. >>>> >>>> Most of the details here are in areas I can comment on in detail, but I >>>> did take an initial general look at things. >>>> >>>> The only thing that jumped out at me is that I think the >>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>> >>>> +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >>>> active testing this will just bit-rot. >>>> >>>> Also on the tests I don't understand your @requires clause: >>>> >>>> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> (vm.opt.TieredCompilation != true)) >>>> >>>> This seems to require that TieredCompilation is disabled, but tiered is >>>> our normal mode of operation. ?? >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>> >>>>> From vladimir.kozlov at oracle.com Thu Dec 12 23:37:25 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Dec 2019 15:37:25 -0800 Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV and MulReductionV nodes In-Reply-To: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com> References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com> Message-ID: <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com> Looks good. I wish we can do more folding of code but instructions are too different. What is done is enough for this project. Thanks, Vladimir On 12/12/19 2:40 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/all/ > https://bugs.openjdk.java.net/browse/JDK-8235824 > > Merge AD instructions for the following vector nodes: > ? - AddReductionV* > ? - MulReductionV* > > Individual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235824/webrev.00/individual > > Testing: tier1-4, test run on different CPU flavors (KNL, CLX) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, jrose, ? > > Best regards, > Vladimir Ivanov From david.holmes at oracle.com Thu Dec 12 23:55:35 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Dec 2019 09:55:35 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> Message-ID: Hi Richard, Some further queries/concerns: src/hotspot/share/runtime/objectMonitor.cpp Can you please explain the changes to ObjectMonitor::wait: ! _recursions = save // restore the old recursion count ! + jt->get_and_reset_relock_count_after_wait(); // increased by the deferred relock count what is the "deferred relock count"? I gather it relates to "The code was extended to be able to deoptimize objects of a frame that is not the top frame and to let another thread than the owning thread do it." which I don't like the sound of at all when it comes to ObjectMonitor state. So I'd like to understand in detail exactly what is going on here and why. This is a very intrusive change that seems to badly break encapsulation and impacts future changes to ObjectMonitor that are under investigation. --- src/hotspot/share/runtime/thread.cpp Can you please explain why JavaThread::wait_for_object_deoptimization has to be handcrafted in this way rather than using proper transitions. We got rid of "deopt suspend" some time ago and it is disturbing to see it being added back (effectively). This seems like it may be something that handshakes could be used for. Thanks, David ----- On 12/12/2019 7:02 am, David Holmes wrote: > On 12/12/2019 1:07 am, Reingruber, Richard wrote: >> Hi David, >> >> ?? > Most of the details here are in areas I can comment on in detail, >> but I >> ?? > did take an initial general look at things. >> >> Thanks for taking the time! > > Apologies the above should read: > > "Most of the details here are in areas I *can't* comment on in detail ..." > > David > >> ?? > The only thing that jumped out at me is that I think the >> ?? > DeoptimizeObjectsALotThread should be a hidden thread. >> ?? > >> ?? > +? bool is_hidden_from_external_view() const { return true; } >> >> Yes, it should. Will add the method like above. >> >> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >> Without >> ?? > active testing this will just bit-rot. >> >> DeoptimizeObjectsALot is meant for stress testing with a larger >> workload. I will add a minimal test >> to keep it fresh. >> >> ?? > Also on the tests I don't understand your @requires clause: >> ?? > >> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >> ?? > (vm.opt.TieredCompilation != true)) >> ?? > >> ?? > This seems to require that TieredCompilation is disabled, but >> tiered is >> ?? > our normal mode of operation. ?? >> ?? > >> >> I removed the clause. I guess I wanted to target the tests towards the >> code they are supposed to >> test, and it's easier to analyze failures w/o tiered compilation and >> with just one compiler thread. >> >> Additionally I will make use of >> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >> >> Thanks, >> Richard. >> >> -----Original Message----- >> From: David Holmes >> Sent: Mittwoch, 11. Dezember 2019 08:03 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in the Presence of JVMTI Agents >> >> Hi Richard, >> >> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to get reviews please for >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>> >>> Corresponding RFE: >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> >>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>> >>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>> issues (thanks!). In addition the >>> change is being tested at SAP since I posted the first RFR some >>> months ago. >>> >>> The intention of this enhancement is to benefit performance wise from >>> escape analysis even if JVMTI >>> agents request capabilities that allow them to access local variable >>> values. E.g. if you start-up >>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>> escape analysis is disabled right >>> from the beginning, well before a debugger attaches -- if ever one >>> should do so. With the >>> enhancement, escape analysis will remain enabled until and after a >>> debugger attaches. EA based >>> optimizations are reverted just before an agent acquires the >>> reference to an object. In the JBS item >>> you'll find more details. >> >> Most of the details here are in areas I can comment on in detail, but I >> did take an initial general look at things. >> >> The only thing that jumped out at me is that I think the >> DeoptimizeObjectsALotThread should be a hidden thread. >> >> +? bool is_hidden_from_external_view() const { return true; } >> >> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >> active testing this will just bit-rot. >> >> Also on the tests I don't understand your @requires clause: >> >> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >> (vm.opt.TieredCompilation != true)) >> >> This seems to require that TieredCompilation is disabled, but tiered is >> our normal mode of operation. ?? >> >> Thanks, >> David >> >>> Thanks, >>> Richard. >>> >>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>> >>> From vladimir.kozlov at oracle.com Fri Dec 13 00:56:16 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Dec 2019 16:56:16 -0800 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com> References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com> Message-ID: Yes, David You are correct these changes touch all part of VM and may affect Graal (which also has EA) too. Changes should be tested in all our modes: tiered, C1 only, Graal, Interpreter. And I realized that I only ran tier3-graal testing so I submitted the rest of Graal's tiers now. I had assumed that our current testing (I ran all from tier1 to tier8) should exercise all paths in VM these changes touch. But I may be wrong and it is correct to ask author to add testing in all VM modes to make sure new code in VM's runtime and JVMTI is tested. I do like to keep what current test is doing with C2. May be add an other test for other modes or modify current one to enable to run it in other modes. Thanks, Vladimir On 12/12/19 3:32 PM, David Holmes wrote: > On 13/12/2019 9:02 am, Reingruber, Richard wrote: >> Hello Vladimir, >> >> thanks for having a look. >> >> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >> ?? > test from running in Interpreter mode too. >> >> Done. >> >> ?? > You don't need vm.opt.TieredCompilation != true in @requires because you >> ?? > specified -XX:-TieredCompilation in @run command. >> >> Ok. >> >> ?? > The test is specifically written for C2 only (not for C1 or Graal) to >> ?? > verify its Escape Analysis optimization. >> ?? > I did not look in great details into test's code but its analysis may be >> ?? > affected if C1 compiler is also used. >> ?? > >> ?? > Richard may clarify this. >> >> The test cases aim to get their testmethod 'dontinline_testMethod' compiled by C2. If they get C1 >> compiled before doesn't matter all that much. I've got a slight preference to disabled tiered >> compilation for simplicity. > > My concern - perhaps unfounded - is that this seems to be being tested only in a pure C2 environment > when the actual changes will have to operate correctly in a tiered environment (and JVMCI). > > Thanks, > David > >> Thanks, Richard. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Donnerstag, 12. Dezember 2019 19:20 >> To: David Holmes ; hotspot-runtime-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net; Reingruber, Richard >> >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of >> JVMTI Agents >> >> Hi David, >> >> Tiered is disabled because we don't want to see compilations and outputs >> from C1 compiler which does not have EA. >> >> The test is specifically written for C2 only (not for C1 or Graal) to >> verify its Escape Analysis optimization. >> I did not look in great details into test's code but its analysis may be >> affected if C1 compiler is also used. >> >> Richard may clarify this. >> >> thanks, >> Vladimir >> >> On 12/11/19 1:04 PM, David Holmes wrote: >>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >>>> I will do full review later. I want to comment about test command line. >>>> >>>> You don't need vm.opt.TieredCompilation != true in @requires because >>>> you specified -XX:-TieredCompilation in @run command. >>> >>> And per my comment this should be being tested with tiered as well. >>> >>> David >>> >>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >>>> test from running in Interpreter mode too. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>>>> Hi David, >>>>> >>>>> ??? > Most of the details here are in areas I can comment on in >>>>> detail, but I >>>>> ??? > did take an initial general look at things. >>>>> >>>>> Thanks for taking the time! >>>>> >>>>> ??? > The only thing that jumped out at me is that I think the >>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>>> ??? > >>>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>>> >>>>> Yes, it should. Will add the method like above. >>>>> >>>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>> Without >>>>> ??? > active testing this will just bit-rot. >>>>> >>>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>>> workload. I will add a minimal test >>>>> to keep it fresh. >>>>> >>>>> ??? > Also on the tests I don't understand your @requires clause: >>>>> ??? > >>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>> ??? > (vm.opt.TieredCompilation != true)) >>>>> ??? > >>>>> ??? > This seems to require that TieredCompilation is disabled, but >>>>> tiered is >>>>> ??? > our normal mode of operation. ?? >>>>> ??? > >>>>> >>>>> I removed the clause. I guess I wanted to target the tests towards >>>>> the code they are supposed to >>>>> test, and it's easier to analyze failures w/o tiered compilation and >>>>> with just one compiler thread. >>>>> >>>>> Additionally I will make use of >>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>>> To: Reingruber, Richard ; >>>>> serviceability-dev at openjdk.java.net; >>>>> hotspot-compiler-dev at openjdk.java.net; >>>>> hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>> Performance in the Presence of JVMTI Agents >>>>> >>>>> Hi Richard, >>>>> >>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>>> Hi, >>>>>> >>>>>> I would like to get reviews please for >>>>>> >>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>>> >>>>>> Corresponding RFE: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>>> >>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>>> >>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>>> issues (thanks!). In addition the >>>>>> change is being tested at SAP since I posted the first RFR some >>>>>> months ago. >>>>>> >>>>>> The intention of this enhancement is to benefit performance wise >>>>>> from escape analysis even if JVMTI >>>>>> agents request capabilities that allow them to access local variable >>>>>> values. E.g. if you start-up >>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>>> escape analysis is disabled right >>>>>> from the beginning, well before a debugger attaches -- if ever one >>>>>> should do so. With the >>>>>> enhancement, escape analysis will remain enabled until and after a >>>>>> debugger attaches. EA based >>>>>> optimizations are reverted just before an agent acquires the >>>>>> reference to an object. In the JBS item >>>>>> you'll find more details. >>>>> >>>>> Most of the details here are in areas I can comment on in detail, but I >>>>> did take an initial general look at things. >>>>> >>>>> The only thing that jumped out at me is that I think the >>>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>>> >>>>> +? bool is_hidden_from_external_view() const { return true; } >>>>> >>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >>>>> active testing this will just bit-rot. >>>>> >>>>> Also on the tests I don't understand your @requires clause: >>>>> >>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>> (vm.opt.TieredCompilation != true)) >>>>> >>>>> This seems to require that TieredCompilation is disabled, but tiered is >>>>> our normal mode of operation. ?? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>>> >>>>>> From vladimir.kozlov at oracle.com Fri Dec 13 01:16:29 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Dec 2019 17:16:29 -0800 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> Message-ID: Vladimir replicateB Can you fold it differently? ReplB_reg_leg predicate(!VM_Version::supports_avx512vlbw()); ins_encode %{ uint vlen = vector_length(this); __ movdl($dst$$XMMRegister, $src$$Register); __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister); __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00); if (vlen > 8) { __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); if (vlen > 16) { __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister); if (vlen > 32) { assert(vlen == 64, "sanity"); __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1); Similar ReplB_imm_leg for which I don't see new implementation. It should also simplify code for avx512 which one or 2 instructions. Other types changes can be done same way. Thanks, Vladimir On 12/12/19 3:19 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ > https://bugs.openjdk.java.net/browse/JDK-8235825 > > Merge AD instructions for the following vector nodes: > ? - ReplicateB, ..., ReplicateD > > Individual patches: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual > > Testing: tier1-4, test run on different CPU flavors (KNL, CLX) > > Contributed-by: Jatin Bhateja > Reviewed-by: vlivanov, sviswanathan, ? > > Best regards, > Vladimir Ivanov From john.r.rose at oracle.com Fri Dec 13 01:46:03 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 12 Dec 2019 17:46:03 -0800 Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV and MulReductionV nodes In-Reply-To: <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com> References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com> <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com> Message-ID: On Dec 12, 2019, at 3:37 PM, Vladimir Kozlov wrote: > > Looks good. I wish we can do more folding of code but instructions are too different. What is done is enough for this project. +1 Reviewed. As I mentioned in the 8235756 thread, a good way to factor the implementation of (associative) reductions would be to reformulate them as the repeated composition of 2N-to-N-lane reductions. For non-associative reductions (floating point), the 2N-to-N pattern is acceptable, *if* the reduction is specified to happen in that order. To get that permission into the contract will require a distinction between reduceSequential and reduceParallel operations in the Vector API. That sequence of 16 vaddss operations is certainly an eyesore, but it?s not clear how to improve on it, algorithmically. Perhaps it could be factored into a sequential accumulation operation, to be repeated N times instead of lg N times. ? John From david.holmes at oracle.com Fri Dec 13 01:52:48 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Dec 2019 11:52:48 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com> Message-ID: <01509185-7b0b-a269-deb1-799444cf082f@oracle.com> On 13/12/2019 10:56 am, Vladimir Kozlov wrote: > Yes, David > > You are correct these changes touch all part of VM and may affect Graal > (which also has EA) too. > Changes should be tested in all our modes: tiered, C1 only, Graal, > Interpreter. And I realized that I only ran tier3-graal testing so I > submitted the rest of Graal's tiers now. > > I had assumed that our current testing (I ran all from tier1 to tier8) > should exercise all paths in VM these changes touch. But I may be wrong > and it is correct to ask author to add testing in all VM modes to make > sure new code in VM's runtime and JVMTI is tested. It may be that our existing JVM TI tests will exercise this adequately and that the new tests are more "whitebox" testing than general functional tests. But it is not obvious to me that we do have the coverage we need. Cheers, David > I do like to keep what current test is doing with C2. May be add an > other test for other modes or modify current one to enable to run it in > other modes. > > Thanks, > Vladimir > > On 12/12/19 3:32 PM, David Holmes wrote: >> On 13/12/2019 9:02 am, Reingruber, Richard wrote: >>> Hello Vladimir, >>> >>> thanks for having a look. >>> >>> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to >>> skip >>> ?? > test from running in Interpreter mode too. >>> >>> Done. >>> >>> ?? > You don't need vm.opt.TieredCompilation != true in @requires >>> because you >>> ?? > specified -XX:-TieredCompilation in @run command. >>> >>> Ok. >>> >>> ?? > The test is specifically written for C2 only (not for C1 or >>> Graal) to >>> ?? > verify its Escape Analysis optimization. >>> ?? > I did not look in great details into test's code but its >>> analysis may be >>> ?? > affected if C1 compiler is also used. >>> ?? > >>> ?? > Richard may clarify this. >>> >>> The test cases aim to get their testmethod 'dontinline_testMethod' >>> compiled by C2. If they get C1 >>> compiled before doesn't matter all that much. I've got a slight >>> preference to disabled tiered >>> compilation for simplicity. >> >> My concern - perhaps unfounded - is that this seems to be being tested >> only in a pure C2 environment when the actual changes will have to >> operate correctly in a tiered environment (and JVMCI). >> >> Thanks, >> David >> >>> Thanks, Richard. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Donnerstag, 12. Dezember 2019 19:20 >>> To: David Holmes ; >>> hotspot-runtime-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net; >>> serviceability-dev at openjdk.java.net; Reingruber, Richard >>> >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi David, >>> >>> Tiered is disabled because we don't want to see compilations and outputs >>> from C1 compiler which does not have EA. >>> >>> The test is specifically written for C2 only (not for C1 or Graal) to >>> verify its Escape Analysis optimization. >>> I did not look in great details into test's code but its analysis may be >>> affected if C1 compiler is also used. >>> >>> Richard may clarify this. >>> >>> thanks, >>> Vladimir >>> >>> On 12/11/19 1:04 PM, David Holmes wrote: >>>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >>>>> I will do full review later. I want to comment about test command >>>>> line. >>>>> >>>>> You don't need vm.opt.TieredCompilation != true in @requires because >>>>> you specified -XX:-TieredCompilation in @run command. >>>> >>>> And per my comment this should be being tested with tiered as well. >>>> >>>> David >>>> >>>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >>>>> test from running in Interpreter mode too. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>>>>> Hi David, >>>>>> >>>>>> ??? > Most of the details here are in areas I can comment on in >>>>>> detail, but I >>>>>> ??? > did take an initial general look at things. >>>>>> >>>>>> Thanks for taking the time! >>>>>> >>>>>> ??? > The only thing that jumped out at me is that I think the >>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>>>> ??? > >>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>>>> >>>>>> Yes, it should. Will add the method like above. >>>>>> >>>>>> ??? > Also I don't see any testing of the >>>>>> DeoptimizeObjectsALotThread. >>>>>> Without >>>>>> ??? > active testing this will just bit-rot. >>>>>> >>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>>>> workload. I will add a minimal test >>>>>> to keep it fresh. >>>>>> >>>>>> ??? > Also on the tests I don't understand your @requires clause: >>>>>> ??? > >>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>>> ??? > (vm.opt.TieredCompilation != true)) >>>>>> ??? > >>>>>> ??? > This seems to require that TieredCompilation is disabled, but >>>>>> tiered is >>>>>> ??? > our normal mode of operation. ?? >>>>>> ??? > >>>>>> >>>>>> I removed the clause. I guess I wanted to target the tests towards >>>>>> the code they are supposed to >>>>>> test, and it's easier to analyze failures w/o tiered compilation and >>>>>> with just one compiler thread. >>>>>> >>>>>> Additionally I will make use of >>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>>>> To: Reingruber, Richard ; >>>>>> serviceability-dev at openjdk.java.net; >>>>>> hotspot-compiler-dev at openjdk.java.net; >>>>>> hotspot-runtime-dev at openjdk.java.net >>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>>> Performance in the Presence of JVMTI Agents >>>>>> >>>>>> Hi Richard, >>>>>> >>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I would like to get reviews please for >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>>>> >>>>>>> Corresponding RFE: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>>>> >>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>>>> >>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>>>> issues (thanks!). In addition the >>>>>>> change is being tested at SAP since I posted the first RFR some >>>>>>> months ago. >>>>>>> >>>>>>> The intention of this enhancement is to benefit performance wise >>>>>>> from escape analysis even if JVMTI >>>>>>> agents request capabilities that allow them to access local variable >>>>>>> values. E.g. if you start-up >>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>>>> escape analysis is disabled right >>>>>>> from the beginning, well before a debugger attaches -- if ever one >>>>>>> should do so. With the >>>>>>> enhancement, escape analysis will remain enabled until and after a >>>>>>> debugger attaches. EA based >>>>>>> optimizations are reverted just before an agent acquires the >>>>>>> reference to an object. In the JBS item >>>>>>> you'll find more details. >>>>>> >>>>>> Most of the details here are in areas I can comment on in detail, >>>>>> but I >>>>>> did take an initial general look at things. >>>>>> >>>>>> The only thing that jumped out at me is that I think the >>>>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>>>> >>>>>> +? bool is_hidden_from_external_view() const { return true; } >>>>>> >>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>>> Without >>>>>> active testing this will just bit-rot. >>>>>> >>>>>> Also on the tests I don't understand your @requires clause: >>>>>> >>>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>>> (vm.opt.TieredCompilation != true)) >>>>>> >>>>>> This seems to require that TieredCompilation is disabled, but >>>>>> tiered is >>>>>> our normal mode of operation. ?? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>>>> >>>>>>> >>>>>>> From jzaugg at gmail.com Fri Dec 13 02:47:25 2019 From: jzaugg at gmail.com (Jason Zaugg) Date: Fri, 13 Dec 2019 12:47:25 +1000 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: On Fri, 13 Dec 2019 at 00:43, Vladimir Ivanov wrote: > > On the heuristic itself, it looks like it can be safely generalized to > any methods which just call another method irrespective of how many > arguments they pass (and in what order). > Agreed. I was initially aiming for minimality to make it dead-simple to reason that the inlinee would not grow and to make the analysis cheap enough to perform eagerly during bytecode parsing. > Nice! The idea to exclude bridge-like methods from max inline level > accounting looks very promising. Do you have any plans to continue > working on it and contribute as a patch into the mainline at some point? > I could spend some time to clean up the patch. However, I would need signficant guidance to extend it to John's suggestion to admit type adaptations / constant loads / known cheap methods etc. Would this more thorough analysis it suit a C2-only analysis performed on an IR instead? Is the set of cheap methods defined with an annotation, a list of owner/name/descriptors or some other means? What is the testing strategy? It would likely be more efficient for someone with the experience in the code base implement themselves rather than shepherd me through it :) -jason From john.r.rose at oracle.com Fri Dec 13 05:07:00 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 12 Dec 2019 21:07:00 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: Message-ID: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> On Dec 12, 2019, at 6:47 PM, Jason Zaugg wrote: > > I could spend some time to clean up the patch. However, I would need > signficant guidance > to extend it to John's suggestion to admit type adaptations / constant > loads / known > cheap methods etc. Would this more thorough analysis it suit a C2-only > analysis > performed on an IR instead? C2 doesn?t convert a method into IR until it has made the inlining decision about it. It would be a much deeper change to switch that around. However, ciMethod::get_flow_analysis activates a general purpose bytecode-based flow analysis which could easily be adjusted to add the necessary metrics. Since it is already run from the inline logic, this wouldn?t be a new pass. (There?s also ciMethod::get_bcea which performs a much more specialized escape analysis on methods, after they fail to inline. I don?t think that?s the place for the new metrics.) > Is the set of cheap methods defined with an annotation, > a list of owner/name/descriptors or some other means? Something along the lines of ciMethod::is_boxing_method will work. In other words, a hardwired white list. Annotations will probably play a role. For example, most of the JIT?s ?magically known methods? sport @HotSpotIntrinsicCandidate, and are registered in vmIntrinsics.hpp. The annotations @ForceInline and @DontInline are another example. We could have a @TinyInline annotation, meaning ?don?t expect this thing to expand much if you inline it?, or we could try to detect such things using further metrics, again in TypeFlow. > What is the testing strategy? The usual: Mainly functional and performance regressions. Some manual and white box testing to make sure the expected sorts of inline chains don?t suddenly stop inlining. > It would likely be more efficient for someone with the experience in the > code base implement themselves rather than shepherd me through it :) I?m intrigued that you are interested in this, and I encourage you to consider pulling on this string some more. I?ll help you pull if you want. I think this is doable, to a degree, as a starter project, to install new parameters and do initial exploration of their settings. Actually dialing in the settings and testing them across a range of workloads is a very specialized job, which few of us are good at, but if the optimization seems to pan out we can find the necessary kind of expert. Your first step, should you choose to accept this mission, would be to join the community. Your name should be on http://openjdk.java.net/census. See http://openjdk.java.net/contribute/. ? John From nick.gasson at arm.com Fri Dec 13 06:10:24 2019 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 13 Dec 2019 14:10:24 +0800 Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> Message-ID: <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com> On 13/12/2019 03:14, Andrew Haley wrote: > > This patch fixes both problems by using memory types that are correct > for all operand sizes. These are memory1, memory2, memory4, etc; this > naming fits in with the existing types used by vectors. > The RFR mail is missing a link to the webrev - is it this one? http://cr.openjdk.java.net/~aph/8235385/ Thanks, Nick From vladimir.x.ivanov at oracle.com Fri Dec 13 08:38:25 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 13 Dec 2019 11:38:25 +0300 Subject: [15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV and MulReductionV nodes In-Reply-To: References: <62eb09d3-a15d-5c82-911f-0d9197db2773@oracle.com> <95e1ce7b-a8b6-84a0-2ed9-d5fe25020aff@oracle.com> Message-ID: <6b031827-17f9-6a48-733a-4a78a7b66a51@oracle.com> Thanks for the reviews, Vladimir & John. > As I mentioned in the 8235756 thread, a good way to factor the > implementation of (associative) reductions would be to reformulate > them as the repeated composition of 2N-to-N-lane reductions. > > For non-associative reductions (floating point), the 2N-to-N > pattern is acceptable, *if* the reduction is specified to happen in > that order. To get that permission into the contract will require > a distinction between reduceSequential and reduceParallel > operations in the Vector API. > > That sequence of 16 vaddss operations is certainly an eyesore, > but it?s not clear how to improve on it, algorithmically. Perhaps > it could be factored into a sequential accumulation operation, > to be repeated N times instead of lg N times. Yes, I agree that reduction nodes look too high-level for matching purposes: having a node per reduction step is much more suitable (at least, on x86). I think if the IR is shaped that way (nested reduction steps which reduce a vector to a scalar), there's a way to introduce a single shared IR node which represents a reduction step (2N => N) across all vector shapes. Best regards, Vladimir Ivanov From aph at redhat.com Fri Dec 13 09:31:23 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 13 Dec 2019 09:31:23 +0000 Subject: Fwd: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> Message-ID: http://cr.openjdk.java.net/~aph/8235385/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- An embedded message was scrubbed... From: Andrew Haley Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset Date: Thu, 12 Dec 2019 19:14:53 +0000 Size: 12171 URL: From aph at redhat.com Fri Dec 13 09:31:45 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 13 Dec 2019 09:31:45 +0000 Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset In-Reply-To: <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> <642bcb33-f02f-892b-d258-5ac537fdef4b@arm.com> Message-ID: <1ac65be6-2aa3-6697-46b5-5eec5f52a48e@redhat.com> On 12/13/19 6:10 AM, Nick Gasson wrote: > On 13/12/2019 03:14, Andrew Haley wrote: >> >> This patch fixes both problems by using memory types that are correct >> for all operand sizes. These are memory1, memory2, memory4, etc; this >> naming fits in with the existing types used by vectors. >> > > The RFR mail is missing a link to the webrev - is it this one? > > http://cr.openjdk.java.net/~aph/8235385/ Yes, sorry. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.x.ivanov at oracle.com Fri Dec 13 10:27:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 13 Dec 2019 13:27:32 +0300 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> Message-ID: Thanks for the feedback, Vladimir. > replicateB > > Can you fold it differently? > > ReplB_reg_leg Are you talking about ReplB_reg? > ? predicate(!VM_Version::supports_avx512vlbw()); 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{ 3152 predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. Otherwise, legVec constraint will be unnecessarily applied to other configurations. 3119 instruct ReplB_reg(vec dst, rRegI src) %{ 3120 predicate((n->as_Vector()->length() <= 32) || 3121 (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); For ReplB_reg there's a shorter version: predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw()); But do you find it easier to read? For example, when you are checking that all configurations are covered: predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw()); instruct ReplB_reg_leg(legVec dst, rRegI src) %{ predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); vs instruct ReplB_reg_leg(legVec dst, rRegI src) %{ predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); instruct ReplB_reg(vec dst, rRegI src) %{ predicate((n->as_Vector()->length() <= 32) || (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); > ? ins_encode %{ > ??? uint vlen = vector_length(this); > ??? __ movdl($dst$$XMMRegister, $src$$Register); > ??? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister); > ??? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00); > ??? if (vlen > 8) { > ????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > ????? if (vlen > 16) { > ??????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister); > ??????? if (vlen > 32) { > ????????? assert(vlen == 64, "sanity"); > ????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, > $dst$$XMMRegister, 0x1); Yes, it should work as well. Do you find it easier to read though? > Similar ReplB_imm_leg for which I don't see new implementation. Good catch. Added it back. (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some configuration isn't covered, _reg is used. But it was in original code, so I added it back.) Updated version: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00 Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/ The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL for 512bit case, but Intel manual says AVX512F is enough. I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825. > It should also simplify code for avx512 which one or 2 instructions. Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 instructions (like ReplB_reg)? > Other types changes can be done same way. Best regards, Vladimir Ivanov > On 12/12/19 3:19 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ >> https://bugs.openjdk.java.net/browse/JDK-8235825 >> >> Merge AD instructions for the following vector nodes: >> ?? - ReplicateB, ..., ReplicateD >> >> Individual patches: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual >> >> >> Testing: tier1-4, test run on different CPU flavors (KNL, CLX) >> >> Contributed-by: Jatin Bhateja >> Reviewed-by: vlivanov, sviswanathan, ? >> >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Dec 13 10:44:24 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 13 Dec 2019 13:44:24 +0300 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> References: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> Message-ID: >> I could spend some time to clean up the patch. However, I would need >> signficant guidance >> to extend it to John's suggestion to admit type adaptations / constant >> loads / known >> cheap methods etc. Would this more thorough analysis it suit a C2-only >> analysis >> performed on an IR instead? > > C2 doesn?t convert a method into IR until it has made the inlining > decision about it. ?It would be a much deeper change to switch that around. > > However,?ciMethod::get_flow_analysis activates a general purpose > bytecode-based flow analysis which could easily be adjusted to add > the necessary metrics. ?Since it is already run from the inline logic, > this wouldn?t be a new pass. > > (There?s also ciMethod::get_bcea which performs a much more specialized > escape analysis on methods, after they fail to inline. ?I don?t think that?s > the place for the new metrics.) Also, I'd like to add that it's fine to fix it in incrementally: start with something simple and reliable, and then explore more complex extensions on top of it. I don't think it's necessary to bytecode analysis too far. I agree with you that some point it becomes simpler to just parse the method and observe the effects than trying to derive them directly from bytecode. (I fully agree that it requires significant effort to enable IR-based analysis in C2. But also there are ways to workaround that and cache the analysis results across compilations: gather data during stand-alone compilation and then reuse it when doing inlining.) >> Is the set of cheap methods defined with an annotation, >> a list of owner/name/descriptors or some other means? > > Something along the lines of ciMethod::is_boxing_method > will work. ?In other words, a hardwired white list. > Annotations will probably play a role. ?For example, most of the JIT?s > ?magically known methods? sport @HotSpotIntrinsicCandidate, > and are registered in vmIntrinsics.hpp. ?The annotations @ForceInline > and @DontInline are another example. ?We could have a @TinyInline > annotation, meaning ?don?t expect this thing to expand much if you > inline it?, or we could try to detect such things using further metrics, > again in TypeFlow. FTR part of @LambdaForm.Compiled semantics is equivalent to @TinyInline. Best regards, Vladimir Ivanov > >> What is the testing strategy? > > The usual: ?Mainly functional and performance regressions. > Some manual and white box testing to make sure the expected sorts of > inline chains don?t suddenly stop inlining. > >> It would likely be more efficient for someone with the experience in the >> code base implement themselves rather than shepherd me through it :) > > I?m intrigued that you are interested in this, and I encourage you to > consider > pulling on this string some more. ?I?ll help you pull if you want. > > I think this is doable, to a degree, as a starter project, to install > new parameters > and do initial exploration of their settings. ?Actually dialing in the > settings > and testing them across a range of workloads is a very specialized job, > which > few of us are good at, but if the optimization seems to pan out we can find > the necessary kind of expert. > > Your first step, should you choose to accept this mission, would be to join > the community. ?Your name should be on http://openjdk.java.net/census. > See http://openjdk.java.net/contribute/. > > ? John From christian.hagedorn at oracle.com Fri Dec 13 12:46:26 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 13 Dec 2019 13:46:26 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <87h829a4h5.fsf@redhat.com> References: <87h829a4h5.fsf@redhat.com> Message-ID: <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> Hi Roland As we have discussed offline and also in discussion with Erik ?sterlund, I propose the following alternative fix based on your suggestion: http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/ The idea is to first snapshot the data and extra parameter data only and translating those. In a second step, the remaining extra data is prepared as before (possibly giving up the extra data lock). And only afterwards, when the cache is populated and no safepoints can happen anymore, the remaining extra data (trap entries and arg info data) is snapshotted and translated while holding the extra data lock. This ensures that no extra data is cleaned between snapshotting and translation since the lock is not released anymore in the meantime. This fixes this observed concurrency bug and is probably a cleaner solution than the original proposed fix. Best regards, Christian On 09.12.19 16:44, Roland Westrelin wrote: > > Hi Christian, > >> Before loading and copying the extra data from the MDO to the ciMDO in >> ciMethodData::load_extra_data(), the metadata is prepared in a >> fixed-point iteration by cleaning all SpeculativeTrapData entries of >> methods whose klasses are unloaded [3]. If it encounters such a dead >> entry it releases the extra data lock (due to ranking issues) and tries >> again later [4]. This release of the lock triggers the bug: There can be >> cases where one thread A is waiting in the whitebox API method to get >> the extra data lock [2] to clean the extra data for the very same MDO >> for which another thread B just released the lock at [4]. If that MDO >> actually contained SpeculativeTrapData entries, then thread A cleaned >> those but the ciMDO, which thread B is preparing, still contains the >> uncleaned old MDO extra data (because thread B only made a snapshot of >> the MDO earlier at [5]). > > Would it be possible to call prepare_data() before the snapshot is taken > so the snapshot doesn't contain any entry that are then removed? > > Roland. > From rwestrel at redhat.com Fri Dec 13 12:58:26 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 13 Dec 2019 13:58:26 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> Message-ID: <87r2189ybx.fsf@redhat.com> Hi Christian, Thanks for experimenting with a different solution. > http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/ 173 // New traps in the MDO may have been added since we copied the 174 // data (concurrent deoptimizations before we acquired 175 // extra_data_lock above) or can be removed (a safepoint may occur 176 // in the prepare_metadata call above) as we translate the copy: 177 // update the copy as we go. Can the above still happen? I see you dropped the memcpy so I suppose no? Roland. From christian.hagedorn at oracle.com Fri Dec 13 13:14:28 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 13 Dec 2019 14:14:28 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <87r2189ybx.fsf@redhat.com> References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> <87r2189ybx.fsf@redhat.com> Message-ID: Hi Roland > Thanks for experimenting with a different solution. No problem, I think this approach is better. >> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.01/ > > 173 // New traps in the MDO may have been added since we copied the > 174 // data (concurrent deoptimizations before we acquired > 175 // extra_data_lock above) or can be removed (a safepoint may occur > 176 // in the prepare_metadata call above) as we translate the copy: > 177 // update the copy as we go. > > Can the above still happen? > > I see you dropped the memcpy so I suppose no? No, this cannot happen anymore since the lock is not released anymore after the snapshot. I updated the webrev to also remove this comment and the code belonging to the already removed memcpy: http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ Best regards, Christian From rwestrel at redhat.com Fri Dec 13 13:21:41 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 13 Dec 2019 14:21:41 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> <87r2189ybx.fsf@redhat.com> Message-ID: <87o8wc9x96.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ That looks good to me. Roland. From christian.hagedorn at oracle.com Fri Dec 13 13:23:53 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 13 Dec 2019 14:23:53 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <87o8wc9x96.fsf@redhat.com> References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> <87r2189ybx.fsf@redhat.com> <87o8wc9x96.fsf@redhat.com> Message-ID: <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com> Thank you Roland for your review! Best regards, Christian On 13.12.19 14:21, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ > > That looks good to me. > > Roland. > From richard.reingruber at sap.com Fri Dec 13 14:17:00 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 13 Dec 2019 14:17:00 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <01509185-7b0b-a269-deb1-799444cf082f@oracle.com> References: <863a7bfc-5656-2a4d-6c22-c1fc22968d11@oracle.com> <00ac0f71-fe90-9ead-8696-6e7f96ff6a17@oracle.com> <57c09482-7f0a-e666-2bd5-f4b43ae8b32a@oracle.com> <01509185-7b0b-a269-deb1-799444cf082f@oracle.com> Message-ID: Hi David, Vladimir, The tests are very targeted and customized towards the issues they solve. IMHO they should be run in the configuration they are tailored for, but as I said, I'm ok with removing the tiered options/conditions. The enhancement should be covered also by existing JVMTI, JDI, JDWP tests, assuming they are also executed with Xcomp. If running the tests with Graal as C2 replacement you'll get failures, because the JVMCI compiler does not provide the debug info required at runtime (see compiledVFrame::not_global_escape_in_scope() and compiledVFrame::arg_escape). Still it would be possible to change the tests to expect these failures when executed with Graal. Perhaps I should do this? Thanks, Richard. -----Original Message----- From: David Holmes Sent: Freitag, 13. Dezember 2019 02:53 To: Vladimir Kozlov ; Reingruber, Richard ; hotspot-runtime-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; serviceability-dev at openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents On 13/12/2019 10:56 am, Vladimir Kozlov wrote: > Yes, David > > You are correct these changes touch all part of VM and may affect Graal > (which also has EA) too. > Changes should be tested in all our modes: tiered, C1 only, Graal, > Interpreter. And I realized that I only ran tier3-graal testing so I > submitted the rest of Graal's tiers now. > > I had assumed that our current testing (I ran all from tier1 to tier8) > should exercise all paths in VM these changes touch. But I may be wrong > and it is correct to ask author to add testing in all VM modes to make > sure new code in VM's runtime and JVMTI is tested. It may be that our existing JVM TI tests will exercise this adequately and that the new tests are more "whitebox" testing than general functional tests. But it is not obvious to me that we do have the coverage we need. Cheers, David > I do like to keep what current test is doing with C2. May be add an > other test for other modes or modify current one to enable to run it in > other modes. > > Thanks, > Vladimir > > On 12/12/19 3:32 PM, David Holmes wrote: >> On 13/12/2019 9:02 am, Reingruber, Richard wrote: >>> Hello Vladimir, >>> >>> thanks for having a look. >>> >>> ?? > Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to >>> skip >>> ?? > test from running in Interpreter mode too. >>> >>> Done. >>> >>> ?? > You don't need vm.opt.TieredCompilation != true in @requires >>> because you >>> ?? > specified -XX:-TieredCompilation in @run command. >>> >>> Ok. >>> >>> ?? > The test is specifically written for C2 only (not for C1 or >>> Graal) to >>> ?? > verify its Escape Analysis optimization. >>> ?? > I did not look in great details into test's code but its >>> analysis may be >>> ?? > affected if C1 compiler is also used. >>> ?? > >>> ?? > Richard may clarify this. >>> >>> The test cases aim to get their testmethod 'dontinline_testMethod' >>> compiled by C2. If they get C1 >>> compiled before doesn't matter all that much. I've got a slight >>> preference to disabled tiered >>> compilation for simplicity. >> >> My concern - perhaps unfounded - is that this seems to be being tested >> only in a pure C2 environment when the actual changes will have to >> operate correctly in a tiered environment (and JVMCI). >> >> Thanks, >> David >> >>> Thanks, Richard. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Donnerstag, 12. Dezember 2019 19:20 >>> To: David Holmes ; >>> hotspot-runtime-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net; >>> serviceability-dev at openjdk.java.net; Reingruber, Richard >>> >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi David, >>> >>> Tiered is disabled because we don't want to see compilations and outputs >>> from C1 compiler which does not have EA. >>> >>> The test is specifically written for C2 only (not for C1 or Graal) to >>> verify its Escape Analysis optimization. >>> I did not look in great details into test's code but its analysis may be >>> affected if C1 compiler is also used. >>> >>> Richard may clarify this. >>> >>> thanks, >>> Vladimir >>> >>> On 12/11/19 1:04 PM, David Holmes wrote: >>>> On 12/12/2019 5:21 am, Vladimir Kozlov wrote: >>>>> I will do full review later. I want to comment about test command >>>>> line. >>>>> >>>>> You don't need vm.opt.TieredCompilation != true in @requires because >>>>> you specified -XX:-TieredCompilation in @run command. >>>> >>>> And per my comment this should be being tested with tiered as well. >>>> >>>> David >>>> >>>>> Use vm.compMode == "Xmixed" instead of vm.compMode != "Xcomp" to skip >>>>> test from running in Interpreter mode too. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/11/19 7:07 AM, Reingruber, Richard wrote: >>>>>> Hi David, >>>>>> >>>>>> ??? > Most of the details here are in areas I can comment on in >>>>>> detail, but I >>>>>> ??? > did take an initial general look at things. >>>>>> >>>>>> Thanks for taking the time! >>>>>> >>>>>> ??? > The only thing that jumped out at me is that I think the >>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>>>> ??? > >>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>>>> >>>>>> Yes, it should. Will add the method like above. >>>>>> >>>>>> ??? > Also I don't see any testing of the >>>>>> DeoptimizeObjectsALotThread. >>>>>> Without >>>>>> ??? > active testing this will just bit-rot. >>>>>> >>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>>>> workload. I will add a minimal test >>>>>> to keep it fresh. >>>>>> >>>>>> ??? > Also on the tests I don't understand your @requires clause: >>>>>> ??? > >>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>>> ??? > (vm.opt.TieredCompilation != true)) >>>>>> ??? > >>>>>> ??? > This seems to require that TieredCompilation is disabled, but >>>>>> tiered is >>>>>> ??? > our normal mode of operation. ?? >>>>>> ??? > >>>>>> >>>>>> I removed the clause. I guess I wanted to target the tests towards >>>>>> the code they are supposed to >>>>>> test, and it's easier to analyze failures w/o tiered compilation and >>>>>> with just one compiler thread. >>>>>> >>>>>> Additionally I will make use of >>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>>>> To: Reingruber, Richard ; >>>>>> serviceability-dev at openjdk.java.net; >>>>>> hotspot-compiler-dev at openjdk.java.net; >>>>>> hotspot-runtime-dev at openjdk.java.net >>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>>> Performance in the Presence of JVMTI Agents >>>>>> >>>>>> Hi Richard, >>>>>> >>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I would like to get reviews please for >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>>>> >>>>>>> Corresponding RFE: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>>>> >>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>>>> >>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>>>> issues (thanks!). In addition the >>>>>>> change is being tested at SAP since I posted the first RFR some >>>>>>> months ago. >>>>>>> >>>>>>> The intention of this enhancement is to benefit performance wise >>>>>>> from escape analysis even if JVMTI >>>>>>> agents request capabilities that allow them to access local variable >>>>>>> values. E.g. if you start-up >>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>>>> escape analysis is disabled right >>>>>>> from the beginning, well before a debugger attaches -- if ever one >>>>>>> should do so. With the >>>>>>> enhancement, escape analysis will remain enabled until and after a >>>>>>> debugger attaches. EA based >>>>>>> optimizations are reverted just before an agent acquires the >>>>>>> reference to an object. In the JBS item >>>>>>> you'll find more details. >>>>>> >>>>>> Most of the details here are in areas I can comment on in detail, >>>>>> but I >>>>>> did take an initial general look at things. >>>>>> >>>>>> The only thing that jumped out at me is that I think the >>>>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>>>> >>>>>> +? bool is_hidden_from_external_view() const { return true; } >>>>>> >>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>>> Without >>>>>> active testing this will just bit-rot. >>>>>> >>>>>> Also on the tests I don't understand your @requires clause: >>>>>> >>>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>>> (vm.opt.TieredCompilation != true)) >>>>>> >>>>>> This seems to require that TieredCompilation is disabled, but >>>>>> tiered is >>>>>> our normal mode of operation. ?? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>>>> >>>>>>> >>>>>>> From christian.hagedorn at oracle.com Fri Dec 13 15:20:17 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 13 Dec 2019 16:20:17 +0100 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi Felix Thanks for explaining. Following your analysis with the provided test case, orig_msize is 0 in the end. Can you also provide a test case or show an example which covers the else case in this test: 717 if (orig_msize == 0) { 718 best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); 719 } else { 720 for (uint i = 0; i < orig_msize; i++) { 721 memops.remove(0); 722 } 723 best_align_to_mem_ref = find_align_to_ref(memops, max_idx); 724 assert(best_align_to_mem_ref == NULL, "sanity"); 725 best_align_to_mem_ref = memops.at(max_idx)->as_Mem(); 726 } Best regards, Christian On 12.12.19 07:24, Yangfei (Felix) wrote: > Hi, > > I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/ > Tested tier1-3 with both aarch64 and x86_64 linux release build. > Newly added test case fail without the patch and pass with the patch. > > Thanks, > Felix > >> -----Original Message----- >> From: Yangfei (Felix) >> Sent: Wednesday, December 11, 2019 11:12 PM >> To: 'Christian Hagedorn' ; Tobias Hartmann >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation >> >> Hi Christian, >> >> Thanks for the suggestions. Comments inlined. >> >>> -----Original Message----- >>> From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com] >>> Sent: Wednesday, December 11, 2019 10:54 PM >>> To: Yangfei (Felix) ; Tobias Hartmann >>> ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): 8235762: JVM crash in SWPointer during C2 >>> compilation >>> >>> Hi Felix >>> >>> Thanks for working on that. Your fix also seems to work for JDK-8235700. >>> I closed that one as a duplicate of yours. >>> >>>>>> + for (int i = 0; i < orig_msize; i++) { >>> >>> Should be uint since orig_msize is a uint >> >> -- Yes, will modify accordingly when I am preparing webrev. >> >>> >>>>>> + best_align_to_mem_ref = find_align_to_ref(memops, >>>>> max_idx); >>>>>> + assert(best_align_to_mem_ref == NULL, "sanity"); >>> >>> You can merge these two lines together into >>> assert(find_align_to_ref(memops, >>> max_idx) == NULL, "sanity"); since the call belongs to the sanity >>> check. Or just surround it by a #ifdef ASSERT. >> >> -- The purpose of line 721 here is to calculate the max_idx. >> So I don't think it's suitable to treat this line as assertion logic. >> >>>>>> + idx = max_idx; >>> >>> Is max_idx always guaranteed to be valid and not -1 when accessing it later? >> >> -- Yes, I think so. >> When memops is not empty and the memory ops in memops are not >> comparable, find_align_to_ref will always sets its max_idx. >> >> Thanks, >> Felix From zhuoren.wz at alibaba-inc.com Fri Dec 13 02:24:41 2019 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Fri, 13 Dec 2019 10:24:41 +0800 Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gUkZSOiA4MjM1Mzg1OiBBQXJjaDY0OiBDcmFzaCBv?= =?UTF-8?B?biBhYXJjaDY0IEpESyBkdWUJdG8gbG9uZyBvZmZzZXQ=?= In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> Message-ID: <9448390e-8a43-485a-9060-ab3b7f4250fa.zhuoren.wz@alibaba-inc.com> OK for me. I also found this assertion failure also existed in load. A test for load was uploaded in JBS page. Regards, Zhuoren ------------------------------------------------------------------ From:Andrew Haley Sent At:2019 Dec. 13 (Fri.) 03:15 To:hotspot compiler ; aarch64-port-dev at openjdk.java.net Subject:[aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset This assertion failure happens because load/store patterns in aarch64.ad are incorrect. The operands immIOffset and operand immLoffset() are only correct for load/store *byte* offsets. Offsets of sizes greater than a byte should be shifted by the operand size, and misaligned offsets are only allowed for a small range. We get this wrong, so we try to use misaligned byte addresses for sizes greater than byte. This fails at compile time. We've never noticed this before because Java code doesn't generate misaligned offsets, so we can only test this fix by using Unsafe (or a customer of Unsafe such as ByteBuffer.) Wang Zhuo(Zhuoren) wrote a patch for this bug but it was incomplete; the problem reached deeper than we at first realized. Zhuoren's approach was to fix up the code generation after pattern matching, but this masks an efficiency problem. In many cases where we could use offset addresses, we don't because the pattern matcher incorrectly decides offsets are out of range. This patch fixes both problems by using memory types that are correct for all operand sizes. These are memory1, memory2, memory4, etc; this naming fits in with the existing types used by vectors. It does lead to a rather large patch, but it's not quite as bad as it looks because much of the code is auto-generated by the script ad_encode.m4. Unfortunately, in the time since it was written some developers have edited that auto-generated section of aarch64.ad by hand, so I had to move these hand edits out of the way. I also I have also added big scary // DO NOT EDIT ANYTHING IN THIS SECTION OF THE FILE comments so this doesn't happen again. In a rather belt-and-braces way I've also added some code that fixes up illegal addresses. I did consider removing it, but I left it in because it doesn't hurt. If we ever do generate similar illegal addresses, debug builds will assert. I'm not sure whether to keep this or not. Andrew Dinn will probably have a cow when he sees this patch. :-) OK for HEAD? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jatin.bhateja at intel.com Fri Dec 13 17:22:54 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Fri, 13 Dec 2019 17:22:54 +0000 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class Message-ID: Hi All, Please find below a link to the patch JBS : https://bugs.openjdk.java.net/browse/JDK-8230185 WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/ Here first level compilation for the method was done by C1 compiler since -Xcomp and -XX:+TieredCompilation (default) options were used, as the back-edge taken count went beyond threshold for Inner-Loop, OSR compilation request was issued to C2 compiler. Ideal construction begins from the hot loop header block and follows the control flows exposed by the ciTypeFlow model (CFG created over raw bytecodes) keeping branch probabilities into consideration. Loop detection also begins from the hot loop header and does a DFS walk till it encounters a backedge, newly detected loop containing the mul-add graph pattern in this case is irreducible and not a natural counted loop which is a must requirement for vector VNNI pattern detection. Adding a missing check for counted loop to prevent crash here. Kindly review. Regards, Jatin From richard.reingruber at sap.com Fri Dec 13 19:01:57 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 13 Dec 2019 19:01:57 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> Message-ID: Hi David, > Some further queries/concerns: > > src/hotspot/share/runtime/objectMonitor.cpp > > Can you please explain the changes to ObjectMonitor::wait: > > ! _recursions = save // restore the old recursion count > ! + jt->get_and_reset_relock_count_after_wait(); // > increased by the deferred relock count > > what is the "deferred relock count"? I gather it relates to > > "The code was extended to be able to deoptimize objects of a frame that > is not the top frame and to let another thread than the owning thread do > it." Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated locking. New with the enhancement is that we do this also just before object references are acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we register deoptimized objects as deferred updates. When control returns to C it gets deoptimized, we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again (relocking twice would be incorrect of course). Deferred updates are copied into the new interpreter frames. Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking is deferred until T owns the monitor again. This is what the piece of code above does. > which I don't like the sound of at all when it comes to ObjectMonitor > state. So I'd like to understand in detail exactly what is going on here > and why. This is a very intrusive change that seems to badly break > encapsulation and impacts future changes to ObjectMonitor that are under > investigation. I would not regard this as breaking encapsulation. Certainly not badly. I've added a property relock_count_after_wait to JavaThread. The property is well encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free in choosing a way to do that as long as that property is taken into account. This is hardly a limitation. Note also that the property is a straight forward extension of the existing concept of deferred local updates. It is embedded into the structure holding them. So not even the footprint of a JavaThread is enlarged if no deferred updates are generated. > --- > > src/hotspot/share/runtime/thread.cpp > > Can you please explain why JavaThread::wait_for_object_deoptimization > has to be handcrafted in this way rather than using proper transitions. > I wrote wait_for_object_deoptimization taking JavaThread::java_suspend_self_with_safepoint_check as template. So in short: for the same reasons :) Threads reach both methods as part of thread state transitions, therefore special handling is required to change thread state on top of ongoing transitions. > We got rid of "deopt suspend" some time ago and it is disturbing to see > it being added back (effectively). This seems like it may be something > that handshakes could be used for. Deopt suspend used to be something rather different with a similar name[1]. It is not being added back. I'm actually duplicating the existing external suspend mechanism, because a thread can be suspended at most once. And hey, and don't like that either! But it seems not unlikely that the duplicate can be removed together with the original and the new type of handshakes that will be used for thread suspend can be used for object deoptimization too. See today's discussion in JDK-8227745 [2]. Thanks, Richard. [1] Deopt suspend was something like an async. handshake for architectures with register windows, where patching the return pc for deoptimization of a compiled frame was racy if the owner thread was in native code. Instead a "deopt" suspend flag was set on which the thread patched its own frame upon return from native. So no thread was suspended. It got its name only from the name of the flags. [2] Discussion about using handshakes to sync. with the target thread: https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 -----Original Message----- From: David Holmes Sent: Freitag, 13. Dezember 2019 00:56 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, Some further queries/concerns: src/hotspot/share/runtime/objectMonitor.cpp Can you please explain the changes to ObjectMonitor::wait: ! _recursions = save // restore the old recursion count ! + jt->get_and_reset_relock_count_after_wait(); // increased by the deferred relock count what is the "deferred relock count"? I gather it relates to "The code was extended to be able to deoptimize objects of a frame that is not the top frame and to let another thread than the owning thread do it." which I don't like the sound of at all when it comes to ObjectMonitor state. So I'd like to understand in detail exactly what is going on here and why. This is a very intrusive change that seems to badly break encapsulation and impacts future changes to ObjectMonitor that are under investigation. --- src/hotspot/share/runtime/thread.cpp Can you please explain why JavaThread::wait_for_object_deoptimization has to be handcrafted in this way rather than using proper transitions. We got rid of "deopt suspend" some time ago and it is disturbing to see it being added back (effectively). This seems like it may be something that handshakes could be used for. Thanks, David ----- On 12/12/2019 7:02 am, David Holmes wrote: > On 12/12/2019 1:07 am, Reingruber, Richard wrote: >> Hi David, >> >> ?? > Most of the details here are in areas I can comment on in detail, >> but I >> ?? > did take an initial general look at things. >> >> Thanks for taking the time! > > Apologies the above should read: > > "Most of the details here are in areas I *can't* comment on in detail ..." > > David > >> ?? > The only thing that jumped out at me is that I think the >> ?? > DeoptimizeObjectsALotThread should be a hidden thread. >> ?? > >> ?? > +? bool is_hidden_from_external_view() const { return true; } >> >> Yes, it should. Will add the method like above. >> >> ?? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >> Without >> ?? > active testing this will just bit-rot. >> >> DeoptimizeObjectsALot is meant for stress testing with a larger >> workload. I will add a minimal test >> to keep it fresh. >> >> ?? > Also on the tests I don't understand your @requires clause: >> ?? > >> ?? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >> ?? > (vm.opt.TieredCompilation != true)) >> ?? > >> ?? > This seems to require that TieredCompilation is disabled, but >> tiered is >> ?? > our normal mode of operation. ?? >> ?? > >> >> I removed the clause. I guess I wanted to target the tests towards the >> code they are supposed to >> test, and it's easier to analyze failures w/o tiered compilation and >> with just one compiler thread. >> >> Additionally I will make use of >> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >> >> Thanks, >> Richard. >> >> -----Original Message----- >> From: David Holmes >> Sent: Mittwoch, 11. Dezember 2019 08:03 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in the Presence of JVMTI Agents >> >> Hi Richard, >> >> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to get reviews please for >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>> >>> Corresponding RFE: >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> >>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>> >>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>> issues (thanks!). In addition the >>> change is being tested at SAP since I posted the first RFR some >>> months ago. >>> >>> The intention of this enhancement is to benefit performance wise from >>> escape analysis even if JVMTI >>> agents request capabilities that allow them to access local variable >>> values. E.g. if you start-up >>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>> escape analysis is disabled right >>> from the beginning, well before a debugger attaches -- if ever one >>> should do so. With the >>> enhancement, escape analysis will remain enabled until and after a >>> debugger attaches. EA based >>> optimizations are reverted just before an agent acquires the >>> reference to an object. In the JBS item >>> you'll find more details. >> >> Most of the details here are in areas I can comment on in detail, but I >> did take an initial general look at things. >> >> The only thing that jumped out at me is that I think the >> DeoptimizeObjectsALotThread should be a hidden thread. >> >> +? bool is_hidden_from_external_view() const { return true; } >> >> Also I don't see any testing of the DeoptimizeObjectsALotThread. Without >> active testing this will just bit-rot. >> >> Also on the tests I don't understand your @requires clause: >> >> ?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >> (vm.opt.TieredCompilation != true)) >> >> This seems to require that TieredCompilation is disabled, but tiered is >> our normal mode of operation. ?? >> >> Thanks, >> David >> >>> Thanks, >>> Richard. >>> >>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>> >>> From vladimir.kozlov at oracle.com Fri Dec 13 19:42:31 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 13 Dec 2019 11:42:31 -0800 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: Message-ID: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Hi Jatin Yes, this fix is correct. But you added trailing spaces to modified lines. Please, fix it. Thanks, Vladimir On 12/13/19 9:22 AM, Bhateja, Jatin wrote: > Hi All, > > Please find below a link to the patch > > JBS : https://bugs.openjdk.java.net/browse/JDK-8230185 > WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/ > > Here first level compilation for the method was done by C1 compiler since -Xcomp and -XX:+TieredCompilation (default) options were used, as the back-edge > taken count went beyond threshold for Inner-Loop, OSR compilation request was issued to C2 compiler. > > Ideal construction begins from the hot loop header block and follows the control flows exposed by the ciTypeFlow model (CFG created over raw bytecodes) keeping branch probabilities into consideration. Loop detection also begins from the hot loop header and does a DFS walk till it encounters a backedge, newly detected loop containing the mul-add graph pattern in this case is irreducible and not a natural counted loop which is a must requirement for vector VNNI pattern detection. Adding a missing check for counted loop to prevent crash here. > > Kindly review. > > Regards, > Jatin > From vladimir.kozlov at oracle.com Fri Dec 13 22:18:44 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 13 Dec 2019 14:18:44 -0800 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> Message-ID: <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> On 12/13/19 2:27 AM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir. > >> replicateB >> >> Can you fold it differently? >> >> ReplB_reg_leg > > Are you talking about ReplB_reg? Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] length vectors but I missed that ReplB_reg_leg uses legVec and needs separate instructions :( And I wanted to separate instruction which use avx 512 (evpbroadcastb) because it is difficult to see relation between predicate condition (length() == 64 && VM_Version::supports_avx512bw()) and check in code (vlen == 64 || VM_Version::supports_avx512vlbw()). First, || vs &&. Second, avx512bw vs avx512vlbw. May be better to have a separate instruction for this. > >> ?? predicate(!VM_Version::supports_avx512vlbw()); > > 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{ > 3152?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); > > For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. > Otherwise, legVec constraint will be unnecessarily applied to other configurations. That is why you replaced !avx512vlbw with !avx512bw? May be this section of code need comment which explains why one or an other is used. > > 3119 instruct ReplB_reg(vec dst, rRegI src) %{ > 3120?? predicate((n->as_Vector()->length() <= 32) || > 3121???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); > > For ReplB_reg there's a shorter version: > > predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw()); > > > But do you find it easier to read? For example, when you are checking that all configurations are covered: > > predicate(n->as_Vector()->length() <= 32 || > ????????? VM_Version::supports_avx512bw()); > > instruct ReplB_reg_leg(legVec dst, rRegI src) %{ > ? predicate(n->as_Vector()->length() == 64 && > ?????????? !VM_Version::supports_avx512bw()); > > vs > > instruct ReplB_reg_leg(legVec dst, rRegI src) %{ > ? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); > > instruct ReplB_reg(vec dst, rRegI src) %{ > ? predicate((n->as_Vector()->length() <= 32) || > ??????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); I think next conditions in predicates require comment about using avx512vlbw and avx512bw. ! predicate((n->as_Vector()->length() <= 32 && VM_Version::supports_avx512vlbw()) || ! (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); > > >> ?? ins_encode %{ >> ???? uint vlen = vector_length(this); >> ???? __ movdl($dst$$XMMRegister, $src$$Register); >> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister); >> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00); >> ???? if (vlen > 8) { >> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); >> ?????? if (vlen > 16) { >> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister); >> ???????? if (vlen > 32) { >> ?????????? assert(vlen == 64, "sanity"); >> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1); > > Yes, it should work as well. Do you find it easier to read though? Code is smaller. > >> Similar ReplB_imm_leg for which I don't see new implementation. > > Good catch. Added it back. > > (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some > configuration isn't covered, _reg is used. But it was in original code, so I added it back.) I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl and movq without vpbroadcastb). > > Updated version: > ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00 Should be webrev.01 > > Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/ > > The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL > for 512bit case, but Intel manual says AVX512F is enough. > > I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825. Yes, lets do it separately. > >> It should also simplify code for avx512 which one or 2 instructions. > > Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 > instructions (like ReplB_reg)? No, I was talking about cases when evpbroadcastb and vpbroadcastb instructions are used. I was think to have them in separate instructions. In your latest version it would be only evpbroadcastb case from ReplB_reg(). Thanks, Vladimir > >> Other types changes can be done same way. > > Best regards, > Vladimir Ivanov > >> On 12/12/19 3:19 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ >>> https://bugs.openjdk.java.net/browse/JDK-8235825 >>> >>> Merge AD instructions for the following vector nodes: >>> ?? - ReplicateB, ..., ReplicateD >>> >>> Individual patches: >>> >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual >>> >>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX) >>> >>> Contributed-by: Jatin Bhateja >>> Reviewed-by: vlivanov, sviswanathan, ? >>> >>> Best regards, >>> Vladimir Ivanov From jatin.bhateja at intel.com Sun Dec 15 08:41:58 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Sun, 15 Dec 2019 08:41:58 +0000 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: Hi Vladimir, Updated patch is placed at following link. http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/ Kindly also push this to the repository. Regards, Jatin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov > Sent: Saturday, December 14, 2019 1:13 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class > > Hi Jatin > > Yes, this fix is correct. But you added trailing spaces to modified lines. Please, > fix it. > > Thanks, > Vladimir > > On 12/13/19 9:22 AM, Bhateja, Jatin wrote: > > Hi All, > > > > Please find below a link to the patch > > > > JBS : https://bugs.openjdk.java.net/browse/JDK-8230185 > > WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/ > > > > Here first level compilation for the method was done by C1 compiler > > since -Xcomp and -XX:+TieredCompilation (default) options were used, as the > back-edge taken count went beyond threshold for Inner-Loop, OSR > compilation request was issued to C2 compiler. > > > > Ideal construction begins from the hot loop header block and follows the > control flows exposed by the ciTypeFlow model (CFG created over raw > bytecodes) keeping branch probabilities into consideration. Loop detection > also begins from the hot loop header and does a DFS walk till it encounters a > backedge, newly detected loop containing the mul-add graph pattern in this > case is irreducible and not a natural counted loop which is a must requirement > for vector VNNI pattern detection. Adding a missing check for counted loop to > prevent crash here. > > > > Kindly review. > > > > Regards, > > Jatin > > From felix.yang at huawei.com Mon Dec 16 02:47:42 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 16 Dec 2019 02:47:42 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi Christian, Yes, orig_msize is 0 for the test case in my webrev. For the else case, I find it hard to manually create a test case for it. Another choice is asserting that this else case never happens. I am not going that way as I haven't got a strong reason for that. Thank, Felix > > Hi Felix > > Thanks for explaining. Following your analysis with the provided test case, > orig_msize is 0 in the end. Can you also provide a test case or show an example > which covers the else case in this test: > > 717 if (orig_msize == 0) { > 718 best_align_to_mem_ref = > memops.at(max_idx)->as_Mem(); > 719 } else { > 720 for (uint i = 0; i < orig_msize; i++) { > 721 memops.remove(0); > 722 } > 723 best_align_to_mem_ref = find_align_to_ref(memops, > max_idx); > 724 assert(best_align_to_mem_ref == NULL, "sanity"); > 725 best_align_to_mem_ref = > memops.at(max_idx)->as_Mem(); > 726 } From tobias.hartmann at oracle.com Mon Dec 16 07:34:39 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 16 Dec 2019 08:34:39 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com> References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> <87r2189ybx.fsf@redhat.com> <87o8wc9x96.fsf@redhat.com> <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com> Message-ID: <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com> Hi Christian, On 13.12.19 14:23, Christian Hagedorn wrote: >> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ This looks good to me. Just noticed some excess whitespace in ciMethodData.cpp:163 ") / HeapWordSize". Please run some quick sanity performance testing before pushing. Best regards, Tobias From tobias.hartmann at oracle.com Mon Dec 16 07:40:36 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 16 Dec 2019 08:40:36 +0100 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: Hi Jatin, this looks good to me too but could you please add a regression test (for example, simplified version of the JavaFuzzer generated test)? Also, please set the bug to "In Progress". Thanks, Tobias On 15.12.19 09:41, Bhateja, Jatin wrote: > Hi Vladimir, > > Updated patch is placed at following link. > > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/ > > Kindly also push this to the repository. > > Regards, > Jatin > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov >> Sent: Saturday, December 14, 2019 1:13 AM >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class >> >> Hi Jatin >> >> Yes, this fix is correct. But you added trailing spaces to modified lines. Please, >> fix it. >> >> Thanks, >> Vladimir >> >> On 12/13/19 9:22 AM, Bhateja, Jatin wrote: >>> Hi All, >>> >>> Please find below a link to the patch >>> >>> JBS : https://bugs.openjdk.java.net/browse/JDK-8230185 >>> WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/ >>> >>> Here first level compilation for the method was done by C1 compiler >>> since -Xcomp and -XX:+TieredCompilation (default) options were used, as the >> back-edge taken count went beyond threshold for Inner-Loop, OSR >> compilation request was issued to C2 compiler. >>> >>> Ideal construction begins from the hot loop header block and follows the >> control flows exposed by the ciTypeFlow model (CFG created over raw >> bytecodes) keeping branch probabilities into consideration. Loop detection >> also begins from the hot loop header and does a DFS walk till it encounters a >> backedge, newly detected loop containing the mul-add graph pattern in this >> case is irreducible and not a natural counted loop which is a must requirement >> for vector VNNI pattern detection. Adding a missing check for counted loop to >> prevent crash here. >>> >>> Kindly review. >>> >>> Regards, >>> Jatin >>> From christian.hagedorn at oracle.com Mon Dec 16 07:58:07 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 16 Dec 2019 08:58:07 +0100 Subject: [14] RFR(S): 8231501: VM crash in MethodData::clean_extra_data(CleanExtraDataClosure*): fatal error: unexpected tag 99 In-Reply-To: <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com> References: <87h829a4h5.fsf@redhat.com> <24848a55-857b-fc49-d9e1-13076587539f@oracle.com> <87r2189ybx.fsf@redhat.com> <87o8wc9x96.fsf@redhat.com> <69bc6af9-dffd-117e-c8b4-e87c9d5e2b75@oracle.com> <856c26c5-c996-3d4c-cc6b-14db718c78b2@oracle.com> Message-ID: Thank you for your review Tobias! On 16.12.19 08:34, Tobias Hartmann wrote: > Hi Christian, > > On 13.12.19 14:23, Christian Hagedorn wrote: >>> http://cr.openjdk.java.net/~chagedorn/8231501/webrev.02/ > > This looks good to me. Just noticed some excess whitespace in ciMethodData.cpp:163 ") / HeapWordSize". Thanks, fixed in current webrev. > Please run some quick sanity performance testing before pushing. Ran some performance tests over the weekend. It looks good. Best regards, Christian From rwestrel at redhat.com Mon Dec 16 08:18:45 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 16 Dec 2019 09:18:45 +0100 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops Message-ID: <87o8w8ptsq.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8231291/webrev.01/ As discussed before: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html Fully unrolling loops early helps EA. The change to cfgnode.cpp is required because full unroll sometimes needs peeling which may add a phi between a memory access and its AddP, a pattern that EA doesn't recognize. Roland. From tobias.hartmann at oracle.com Mon Dec 16 08:39:16 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 16 Dec 2019 09:39:16 +0100 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi Felix, On 12.12.19 07:24, Yangfei (Felix) wrote: > I have created a webrev for the patch: http://cr.openjdk.java.net/~fyang/8235762/webrev.00/ > Tested tier1-3 with both aarch64 and x86_64 linux release build. > Newly added test case fail without the patch and pass with the patch. Thanks for creating a webrev and adding the test. I have some questions: - Shouldn't we at least add an assert to verify that after SuperWord::find_adjacent_refs() best_align_to_mem_ref != NULL if _packset.length() > 0? - Why do you need max_idx? Isn't is the case that if find_align_to_ref returns NULL, there aren't any comparable memory operations left and it essentially doesn't matter which one you chose? Also, is it guaranteed that max_idx is always initialized in SuperWord::find_align_to_ref? Thanks, Tobias From felix.yang at huawei.com Mon Dec 16 10:14:24 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 16 Dec 2019 10:14:24 +0000 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: Hi Tobias, Thanks for reviewing this. > > Hi Felix, > > On 12.12.19 07:24, Yangfei (Felix) wrote: > > I have created a webrev for the patch: > http://cr.openjdk.java.net/~fyang/8235762/webrev.00/ > > Tested tier1-3 with both aarch64 and x86_64 linux release build. > > Newly added test case fail without the patch and pass with the patch. > > Thanks for creating a webrev and adding the test. I have some questions: > - Shouldn't we at least add an assert to verify that after > SuperWord::find_adjacent_refs() best_align_to_mem_ref != NULL if > _packset.length() > 0? -- OK. I will add one assertion immediately before breaking the loop. > - Why do you need max_idx? Isn't is the case that if find_align_to_ref returns > NULL, there aren't any comparable memory operations left and it essentially > doesn't matter which one you chose? -- I was considering minimizing the performance impact of the patch here. best_align_to_mem_ref is used in SuperWord::align_initial_loop_index for adjusting the pre-loop limit. For the test case in the webrev, best_align_to_mem_ref was chosen from node 470 (StoreB) and node 431 (StoreL). The vector width for these two memory operations are different on aarch64 platform: vw = 16 bytes for node 431 and 2 bytes for node 470. SuperWord::align_initial_loop_index will emit different code sequences for the test case. The max_idx tells us which memory operation has the biggest vector size. > Also, is it guaranteed that max_idx is > always initialized in SuperWord::find_align_to_ref? -- Yes, I think so. When memops is not empty and the memory ops in memops are not comparable, find_align_to_ref will always sets its max_idx. From robbin.ehn at oracle.com Mon Dec 16 10:20:39 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 16 Dec 2019 11:20:39 +0100 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: Message-ID: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> Hi Richard, as mentioned it would be better if you could do this with handshakes, instead of using _suspend_flag (since they are going away). But I can't think of a way doing it without blocking safepoints, so we need to add some more features in handshakes first. When possible I hope you are willing to move this code to handshakes instead. You could stop one thread with, e.g.: class EscapeBarrierSuspendHandshake : public HandshakeClosure { Semaphore _is_waiting; Semaphore _wait; bool _started; public: EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), _wait(0), _started(false) { } void do_thread(Thread* th) { _is_waiting.signal(); _wait.wait(); Atomic::store(&_started, true); } void wait_until_eb_stopped() { _is_waiting.wait(); } void start_thread() { _wait.signal(); while(!Atomic::load(&_started)) { os::naked_yield(); } } }; But it would block safepoints. Thanks, Robbin On 12/10/19 10:45 PM, Reingruber, Richard wrote: > Hi, > > I would like to get reviews please for > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ > > Corresponding RFE: > https://bugs.openjdk.java.net/browse/JDK-8227745 > > Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 > And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] > > Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the > change is being tested at SAP since I posted the first RFR some months ago. > > The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI > agents request capabilities that allow them to access local variable values. E.g. if you start-up > with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right > from the beginning, well before a debugger attaches -- if ever one should do so. With the > enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based > optimizations are reverted just before an agent acquires the reference to an object. In the JBS item > you'll find more details. > > Thanks, > Richard. > > [1] Experimental fix for JDK-8214584 based on JDK-8227745 > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch > From goetz.lindenmaier at sap.com Mon Dec 16 13:38:59 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 16 Dec 2019 13:38:59 +0000 Subject: RFR(M): 8235988: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. Message-ID: Hi, PrintInlining and TraceLoopPredicate allocate stringStreams with new and relied on the fact that all memory used is on the ResourceArea cleaned after the compilation. Since 8224193 the char* of the stringStream is malloced and thus must be freed. No doing so manifests a memory leak. This is only relevant if the corresponding tracing is active. To fix TraceLoopPredicate I added the destructor call. Fixing PrintInlining is a bit more complicated, as it uses several stringStreams. A row of them is in a GrowableArray which must be walked to free all of them. As the GrowableArray is on an arena no destructor is called for it. I also changed some as_string() calls to base() calls which reduced memory need of the traces, and added a comment explaining the constructor of GrowableArray that calls the copyconstructor for its elements. Please review: http://cr.openjdk.java.net/~goetz/wr19/8235988-c2_tracing_mem_leak/01/ Best regards, Goetz. From richard.reingruber at sap.com Mon Dec 16 13:41:49 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 16 Dec 2019 13:41:49 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> References: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> Message-ID: Hi Robbin, first of all: thanks a lot for providing feedback. I do appreciate it. I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it. Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1] I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread T1 would apply it on another thread T2. 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore? 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking T2, because either T2 or the VMThread will block in L10. I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating objects triggers GC or attempting to execute the vm operation in JvmtiEnv::GetOwnedMonitorStackDepthInfo(). It might be impossible to replace my suspend flag with handshakes that are available today, because if it was you could replace all the suspend flags right away, couldn't you? Or I'm simply missing something... quite possible... :) Thanks, Richard. [1] Drafted by Robbin (thanks!) 1 class EscapeBarrierSuspendHandshake : public HandshakeClosure { 2 Semaphore _is_waiting; 3 Semaphore _wait; 4 bool _started; 5 public: 6 EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), 7 _wait(0), _started(false) { } 8 void do_thread(Thread* th) { 9 _is_waiting.signal(); 10 _wait.wait(); 11 Atomic::store(&_started, true); 12 } 13 void wait_until_eb_stopped() { _is_waiting.wait(); } 14 void start_thread() { 15 _wait.signal(); 16 while(!Atomic::load(&_started)) { 17 os::naked_yield(); 18 } 19 } 20 }; -----Original Message----- From: Robbin Ehn Sent: Montag, 16. Dezember 2019 11:21 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, as mentioned it would be better if you could do this with handshakes, instead of using _suspend_flag (since they are going away). But I can't think of a way doing it without blocking safepoints, so we need to add some more features in handshakes first. When possible I hope you are willing to move this code to handshakes instead. You could stop one thread with, e.g.: class EscapeBarrierSuspendHandshake : public HandshakeClosure { Semaphore _is_waiting; Semaphore _wait; bool _started; public: EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), _wait(0), _started(false) { } void do_thread(Thread* th) { _is_waiting.signal(); _wait.wait(); Atomic::store(&_started, true); } void wait_until_eb_stopped() { _is_waiting.wait(); } void start_thread() { _wait.signal(); while(!Atomic::load(&_started)) { os::naked_yield(); } } }; But it would block safepoints. Thanks, Robbin On 12/10/19 10:45 PM, Reingruber, Richard wrote: > Hi, > > I would like to get reviews please for > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ > > Corresponding RFE: > https://bugs.openjdk.java.net/browse/JDK-8227745 > > Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 > And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] > > Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the > change is being tested at SAP since I posted the first RFR some months ago. > > The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI > agents request capabilities that allow them to access local variable values. E.g. if you start-up > with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right > from the beginning, well before a debugger attaches -- if ever one should do so. With the > enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based > optimizations are reverted just before an agent acquires the reference to an object. In the JBS item > you'll find more details. > > Thanks, > Richard. > > [1] Experimental fix for JDK-8214584 based on JDK-8227745 > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch > From goetz.lindenmaier at sap.com Mon Dec 16 14:34:00 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 16 Dec 2019 14:34:00 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. Message-ID: Hi, I'm resending this with fixed bugId ... Sorry! Best regards, Goetz > Hi, > > PrintInlining and TraceLoopPredicate allocate stringStreams with new and > relied on the fact that all memory used is on the ResourceArea cleaned > after the compilation. > > Since 8224193 the char* of the stringStream is malloced and thus > must be freed. No doing so manifests a memory leak. > This is only relevant if the corresponding tracing is active. > > To fix TraceLoopPredicate I added the destructor call > Fixing PrintInlining is a bit more complicated, as it uses several > stringStreams. A row of them is in a GrowableArray which must > be walked to free all of them. > As the GrowableArray is on an arena no destructor is called for it. > > I also changed some as_string() calls to base() calls which reduced > memory need of the traces, and added a comment explaining the > constructor of GrowableArray that calls the copyconstructor for its > elements. > > Please review: > http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/ > > Best regards, > Goetz. From jatin.bhateja at intel.com Mon Dec 16 15:42:00 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Mon, 16 Dec 2019 15:42:00 +0000 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: Hi Tobias, Please find below the updated patch with the test case. http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/ Kindly push it to the repository if there are no further comments. Thanks, Jatin > -----Original Message----- > From: Tobias Hartmann > Sent: Monday, December 16, 2019 1:11 PM > To: Bhateja, Jatin ; Vladimir Kozlov > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class > > Hi Jatin, > > this looks good to me too but could you please add a regression test (for > example, simplified version of the JavaFuzzer generated test)? > > Also, please set the bug to "In Progress". > > Thanks, > Tobias > > On 15.12.19 09:41, Bhateja, Jatin wrote: > > Hi Vladimir, > > > > Updated patch is placed at following link. > > > > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.01/ > > > > Kindly also push this to the repository. > > > > Regards, > > Jatin > > > >> -----Original Message----- > >> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Vladimir Kozlov > >> Sent: Saturday, December 14, 2019 1:13 AM > >> To: hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid > >> node class > >> > >> Hi Jatin > >> > >> Yes, this fix is correct. But you added trailing spaces to modified > >> lines. Please, fix it. > >> > >> Thanks, > >> Vladimir > >> > >> On 12/13/19 9:22 AM, Bhateja, Jatin wrote: > >>> Hi All, > >>> > >>> Please find below a link to the patch > >>> > >>> JBS : https://bugs.openjdk.java.net/browse/JDK-8230185 > >>> WebRev : http://cr.openjdk.java.net/~jbhateja/8230185/webrev.00/ > >>> > >>> Here first level compilation for the method was done by C1 compiler > >>> since -Xcomp and -XX:+TieredCompilation (default) options were used, > >>> as the > >> back-edge taken count went beyond threshold for Inner-Loop, OSR > >> compilation request was issued to C2 compiler. > >>> > >>> Ideal construction begins from the hot loop header block and > >>> follows the > >> control flows exposed by the ciTypeFlow model (CFG created over raw > >> bytecodes) keeping branch probabilities into consideration. Loop > >> detection also begins from the hot loop header and does a DFS walk > >> till it encounters a backedge, newly detected loop containing the > >> mul-add graph pattern in this case is irreducible and not a natural > >> counted loop which is a must requirement for vector VNNI pattern > >> detection. Adding a missing check for counted loop to prevent crash here. > >>> > >>> Kindly review. > >>> > >>> Regards, > >>> Jatin > >>> From robbin.ehn at oracle.com Mon Dec 16 17:20:50 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 16 Dec 2019 18:20:50 +0100 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> Message-ID: <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com> Hi Richard, On 2019-12-16 14:41, Reingruber, Richard wrote: > Hi Robbin, > > first of all: thanks a lot for providing feedback. I do appreciate it. > > I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it. > > Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1] > > I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread > T1 would apply it on another thread T2. Sorry I don't immediately see what issue there is in doing a handshake instead of: VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, owned_monitors_list); > > 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore? > > 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking > T2, because either T2 or the VMThread will block in L10. Yes, sorry, I forgot/confused myself about asynch handshake. (I have a test prototype for that, which removes suspend flag) > > I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could > continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating > objects triggers GC or attempting to execute the vm operation in > JvmtiEnv::GetOwnedMonitorStackDepthInfo(). > > It might be impossible to replace my suspend flag with handshakes that are available today, because > if it was you could replace all the suspend flags right away, couldn't you? So adding asynch handshakes and a per thread handshake queue, we can. (which this test prototype does) The issue I'm thinking of is if we need selective polling first. Suspend flags are not checked in every transition, e.g. vm->native. A JVM TI agent don't expect to suspend it's own thread when suspending all threads. (that thread would be suspended when trying to get back to agent code when it does vm->native transition) > > Or I'm simply missing something... quite possible... :) No I think you got it right. Thanks, Robbin > > Thanks, Richard. > > [1] Drafted by Robbin (thanks!) > > 1 class EscapeBarrierSuspendHandshake : public HandshakeClosure { > 2 Semaphore _is_waiting; > 3 Semaphore _wait; > 4 bool _started; > 5 public: > 6 EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), > 7 _wait(0), _started(false) { } > 8 void do_thread(Thread* th) { > 9 _is_waiting.signal(); > 10 _wait.wait(); > 11 Atomic::store(&_started, true); > 12 } > 13 void wait_until_eb_stopped() { _is_waiting.wait(); } > 14 void start_thread() { > 15 _wait.signal(); > 16 while(!Atomic::load(&_started)) { > 17 os::naked_yield(); > 18 } > 19 } > 20 }; > > -----Original Message----- > From: Robbin Ehn > Sent: Montag, 16. Dezember 2019 11:21 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, as mentioned it would be better if you could do this with > handshakes, instead of using _suspend_flag (since they are going away). > But I can't think of a way doing it without blocking safepoints, so we need to > add some more features in handshakes first. > When possible I hope you are willing to move this code to handshakes instead. > > You could stop one thread with, e.g.: > class EscapeBarrierSuspendHandshake : public HandshakeClosure { > Semaphore _is_waiting; > Semaphore _wait; > bool _started; > public: > EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), > _wait(0), _started(false) { } > void do_thread(Thread* th) { > _is_waiting.signal(); > _wait.wait(); > Atomic::store(&_started, true); > } > void wait_until_eb_stopped() { _is_waiting.wait(); } > void start_thread() { > _wait.signal(); > while(!Atomic::load(&_started)) { > os::naked_yield(); > } > } > }; > > But it would block safepoints. > > Thanks, Robbin > > On 12/10/19 10:45 PM, Reingruber, Richard wrote: >> Hi, >> >> I would like to get reviews please for >> >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >> >> Corresponding RFE: >> https://bugs.openjdk.java.net/browse/JDK-8227745 >> >> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >> >> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the >> change is being tested at SAP since I posted the first RFR some months ago. >> >> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI >> agents request capabilities that allow them to access local variable values. E.g. if you start-up >> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right >> from the beginning, well before a debugger attaches -- if ever one should do so. With the >> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based >> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item >> you'll find more details. >> >> Thanks, >> Richard. >> >> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >> From vladimir.x.ivanov at oracle.com Mon Dec 16 22:29:09 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 17 Dec 2019 01:29:09 +0300 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> Message-ID: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> Hi Vladimir, Updated version: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/ What's changed: * Added more comments * Fixed missing cases (Repl4B_imm() and Repl8B_imm) * Double-checked that there are no other missing cases left: http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt I'd like to reiterate that I deliberately don't want to spend too much time polishing the current version because I consider it as an interim point and not as the optimal and desired one (though it's definitely much better than where we started both in number of instructions and clarity). The shape I want to see in the next couple of iterations is the following: (1) all the operation implementations encapsulated in MacroAssembler * CPU dispatching will happen there, not in AD file; (2) get rid of vec vs legVec separation as much as possible * one of the ways to fix it is to introduce additional operand types (for example, vecBW == {legVec when avx512() && !avx512bw(); vec, otherwise}) It would turn current ReplB_reg & ReplB_reg_leg into: instruct ReplB_reg(vecBW dst, scalar src) %{ match(Set dst (ReplicateB src)); format %{ "replicate $dst,$src" %} ins_encode %{ uint vlen = vector_length(this); __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen); %} %} where MacroAssembler::replicate_byte() hides all the dispatching logic against CPU capabilities and vector length. Moreover, it opens additional merging opportunities. As an example: instruct ReplBS_reg(vecBW dst, rRegI src) %{ match(Set dst (ReplicateB src)); match(Set dst (ReplicateS src)); format %{ "replicate $dst,$src" %} ins_encode %{ uint vlen = vector_length(this); switch (ideal_Opcode()) { case Op_ReplicateB: __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen); break; case Op_ReplicateS: __ replicate_short($dst$$XMMRegister, $src$$Register, vlen); break; default: ShouldNotReachHere(); } %} ins_pipe( pipe_slow ); %} If we agree that this is the direction we want to move, splitting instructions is counter-productive and just pushes more work for later iterations. Same applies to dispatching logic on CPU & vector length. Best regards, Vladimir Ivanov On 14.12.2019 01:18, Vladimir Kozlov wrote: > On 12/13/19 2:27 AM, Vladimir Ivanov wrote: >> Thanks for the feedback, Vladimir. >> >>> replicateB >>> >>> Can you fold it differently? >>> >>> ReplB_reg_leg >> >> Are you talking about ReplB_reg? > > Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] > length vectors but I missed that ReplB_reg_leg uses legVec and needs > separate instructions :( > > And I wanted to separate instruction which use avx 512 (evpbroadcastb) > because it is difficult to see relation between predicate condition > (length() == 64 && VM_Version::supports_avx512bw()) and check in code > (vlen == 64 || VM_Version::supports_avx512vlbw()). First, || vs &&. > Second, avx512bw vs avx512vlbw. May be better to have a separate > instruction for this. > >> >>> ?? predicate(!VM_Version::supports_avx512vlbw()); >> >> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >> 3152?? predicate(n->as_Vector()->length() == 64 && >> !VM_Version::supports_avx512bw()); >> >> For ReplB_reg_leg the predicate can't be simplified: it is applicable >> only to 512bit vector when AVX512BW is absent. Otherwise, legVec >> constraint will be unnecessarily applied to other configurations. > > That is why you replaced !avx512vlbw with !avx512bw? > May be this section of code need comment which explains why one or an > other is used. > >> >> 3119 instruct ReplB_reg(vec dst, rRegI src) %{ >> 3120?? predicate((n->as_Vector()->length() <= 32) || >> 3121???????????? (n->as_Vector()->length() == 64 && >> VM_Version::supports_avx512bw())); >> >> For ReplB_reg there's a shorter version: >> >> predicate(n->as_Vector()->length() <= 32 || >> VM_Version::supports_avx512bw()); >> >> >> But do you find it easier to read? For example, when you are checking >> that all configurations are covered: >> >> predicate(n->as_Vector()->length() <= 32 || >> ?????????? VM_Version::supports_avx512bw()); >> >> instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >> ?? predicate(n->as_Vector()->length() == 64 && >> ??????????? !VM_Version::supports_avx512bw()); >> >> vs >> >> instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >> ?? predicate(n->as_Vector()->length() == 64 && >> !VM_Version::supports_avx512bw()); >> >> instruct ReplB_reg(vec dst, rRegI src) %{ >> ?? predicate((n->as_Vector()->length() <= 32) || >> ???????????? (n->as_Vector()->length() == 64 && >> VM_Version::supports_avx512bw())); > > I think next conditions in predicates require comment about using > avx512vlbw and avx512bw. > > !?? predicate((n->as_Vector()->length() <= 32 && > VM_Version::supports_avx512vlbw()) || > !???????????? (n->as_Vector()->length() == 64 && > VM_Version::supports_avx512bw())); > >> >> >>> ?? ins_encode %{ >>> ???? uint vlen = vector_length(this); >>> ???? __ movdl($dst$$XMMRegister, $src$$Register); >>> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister); >>> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00); >>> ???? if (vlen > 8) { >>> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); >>> ?????? if (vlen > 16) { >>> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister); >>> ???????? if (vlen > 32) { >>> ?????????? assert(vlen == 64, "sanity"); >>> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, >>> $dst$$XMMRegister, 0x1); >> >> Yes, it should work as well. Do you find it easier to read though? > > Code is smaller. > >> >>> Similar ReplB_imm_leg for which I don't see new implementation. >> >> Good catch. Added it back. >> >> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for >> _mem and _imm it is optional: if they some configuration isn't >> covered, _reg is used. But it was in original code, so I added it back.) > > I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl > and movq without vpbroadcastb). > >> >> Updated version: >> ?? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00 > > Should be webrev.01 > >> >> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl >> checks are not necessary: >> >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/ >> >> >> The code assumes that evpbroadcastd/evpbroadcastq, >> vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL >> for 512bit case, but Intel manual says AVX512F is enough. >> >> I plan to handle it as a separate change, but let me know if you want >> to incorporate it into 8235825. > > Yes, lets do it separately. > >> >>> It should also simplify code for avx512 which one or 2 instructions. >> >> Can you elaborate, please? Are you talking about the case when version >> for different vector sizes differ in 1-2 instructions (like ReplB_reg)? > No, I was talking about cases when evpbroadcastb and vpbroadcastb > instructions are used. I was think to have them in separate > instructions. In your latest version it would be only evpbroadcastb case > from ReplB_reg(). > > Thanks, > Vladimir > >> >>> Other types changes can be done same way. >> >> Best regards, >> Vladimir Ivanov >> >>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ >>>> https://bugs.openjdk.java.net/browse/JDK-8235825 >>>> >>>> Merge AD instructions for the following vector nodes: >>>> ?? - ReplicateB, ..., ReplicateD >>>> >>>> Individual patches: >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual >>>> >>>> >>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX) >>>> >>>> Contributed-by: Jatin Bhateja >>>> Reviewed-by: vlivanov, sviswanathan, ? >>>> >>>> Best regards, >>>> Vladimir Ivanov From david.holmes at oracle.com Tue Dec 17 07:03:00 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 17 Dec 2019 17:03:00 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> Message-ID: <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> David On 17/12/2019 4:57 pm, David Holmes wrote: > Hi Richard, > > On 14/12/2019 5:01 am, Reingruber, Richard wrote: >> Hi David, >> >> ?? > Some further queries/concerns: >> ?? > >> ?? > src/hotspot/share/runtime/objectMonitor.cpp >> ?? > >> ?? > Can you please explain the changes to ObjectMonitor::wait: >> ?? > >> ?? > !?? _recursions = save????? // restore the old recursion count >> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >> ?? > increased by the deferred relock count >> ?? > >> ?? > what is the "deferred relock count"? I gather it relates to >> ?? > >> ?? > "The code was extended to be able to deoptimize objects of a >> frame that >> ?? > is not the top frame and to let another thread than the owning >> thread do >> ?? > it." >> >> Yes, these relate. Currently EA based optimizations are reverted, when >> a compiled frame is replaced >> with corresponding interpreter frames. Part of this is relocking >> objects with eliminated >> locking. New with the enhancement is that we do this also just before >> object references are acquired >> through JVMTI. In this case we deoptimize also the owning compiled >> frame C and we register >> deoptimized objects as deferred updates. When control returns to C it >> gets deoptimized, we notice >> that objects are already deoptimized (reallocated and relocked), so we >> don't do it again (relocking >> twice would be incorrect of course). Deferred updates are copied into >> the new interpreter frames. >> >> Problem: relocking is not possible if the target thread T is waiting >> on the monitor that needs to be >> relocked. This happens only with non-local objects with >> EliminateNestedLocks. Instead relocking is >> deferred until T owns the monitor again. This is what the piece of >> code above does. > > Sorry I need some more detail here. How can you wait() on an object > monitor if the object allocation and/or locking was optimised away? And > what is a "non-local object" in this context? Isn't EA restricted to > thread-confined objects? > > Is it just that some of the locking gets optimized away e.g. > > synchronised(obj) { > ? synchronised(obj) { > ??? synchronised(obj) { > ????? obj.wait(); > ??? } > ? } > } > > If this is reduced to a form as-if it were a single lock of the monitor > (due to EA) and the wait() triggers a JVM TI event which leads to the > escape of "obj" then we need to reconstruct the true lock state, and so > when the wait() internally unblocks and reacquires the monitor it has to > set the true recursion count to 3, not the 1 that it appeared to be when > wait() was initially called. Is that the scenario? > > If so I find this truly awful. Anyone using wait() in a realistic form > requires a notification and so the object cannot be thread confined. In > which case I would strongly argue that upon hitting the wait() the deopt > should occur unconditionally and so the lock state is correct before we > wait and so we don't need to mess with the recursion count internally > when we reacquire the monitor. > >> >> ?? > which I don't like the sound of at all when it comes to >> ObjectMonitor >> ?? > state. So I'd like to understand in detail exactly what is going >> on here >> ?? > and why.? This is a very intrusive change that seems to badly break >> ?? > encapsulation and impacts future changes to ObjectMonitor that >> are under >> ?? > investigation. >> >> I would not regard this as breaking encapsulation. Certainly not badly. >> >> I've added a property relock_count_after_wait to JavaThread. The >> property is well >> encapsulated. Future ObjectMonitor implementations have to deal with >> recursion too. They are free in >> choosing a way to do that as long as that property is taken into >> account. This is hardly a >> limitation. > > I do think this badly breaks encapsulation as you have to add a callout > from the guts of the ObjectMonitor code to reach into the thread to get > this lock count adjustment. I understand why you have had to do this but > I would much rather see a change to the EA optimisation strategy so that > this is not needed. > >> Note also that the property is a straight forward extension of the >> existing concept of deferred >> local updates. It is embedded into the structure holding them. So not >> even the footprint of a >> JavaThread is enlarged if no deferred updates are generated. >> >> ?? > --- >> ?? > >> ?? > src/hotspot/share/runtime/thread.cpp >> ?? > >> ?? > Can you please explain why >> JavaThread::wait_for_object_deoptimization >> ?? > has to be handcrafted in this way rather than using proper >> transitions. >> ?? > >> >> I wrote wait_for_object_deoptimization taking >> JavaThread::java_suspend_self_with_safepoint_check >> as template. So in short: for the same reasons :) >> >> Threads reach both methods as part of thread state transitions, >> therefore special handling is >> required to change thread state on top of ongoing transitions. >> >> ?? > We got rid of "deopt suspend" some time ago and it is disturbing >> to see >> ?? > it being added back (effectively). This seems like it may be >> something >> ?? > that handshakes could be used for. >> >> Deopt suspend used to be something rather different with a similar >> name[1]. It is not being added back. > > I stand corrected. Despite comments in the code to the contrary > deopt_suspend didn't actually cause a self-suspend. I was doing a lot of > cleanup in this area 13 years ago :) > >> >> I'm actually duplicating the existing external suspend mechanism, >> because a thread can be suspended >> at most once. And hey, and don't like that either! But it seems not >> unlikely that the duplicate can >> be removed together with the original and the new type of handshakes >> that will be used for >> thread suspend can be used for object deoptimization too. See today's >> discussion in JDK-8227745 [2]. > > I hope that discussion bears some fruit, at the moment it seems not to > be possible to use handshakes here. :( > > The external suspend mechanism is a royal pain in the proverbial that we > have to carefully live with. The idea that we're duplicating that for > use in another fringe area of functionality does not thrill me at all. > > To be clear, I understand the problem that exists and that you wish to > solve, but for the runtime parts I balk at the complexity cost of > solving it. > > Thanks, > David > ----- > >> Thanks, Richard. >> >> [1] Deopt suspend was something like an async. handshake for >> architectures with register windows, >> ???? where patching the return pc for deoptimization of a compiled >> frame was racy if the owner thread >> ???? was in native code. Instead a "deopt" suspend flag was set on >> which the thread patched its own >> ???? frame upon return from native. So no thread was suspended. It got >> its name only from the name of >> ???? the flags. >> >> [2] Discussion about using handshakes to sync. with the target thread: >> >> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 >> >> >> -----Original Message----- >> From: David Holmes >> Sent: Freitag, 13. Dezember 2019 00:56 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in the Presence of JVMTI Agents >> >> Hi Richard, >> >> Some further queries/concerns: >> >> src/hotspot/share/runtime/objectMonitor.cpp >> >> Can you please explain the changes to ObjectMonitor::wait: >> >> !?? _recursions = save????? // restore the old recursion count >> !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >> increased by the deferred relock count >> >> what is the "deferred relock count"? I gather it relates to >> >> "The code was extended to be able to deoptimize objects of a frame that >> is not the top frame and to let another thread than the owning thread do >> it." >> >> which I don't like the sound of at all when it comes to ObjectMonitor >> state. So I'd like to understand in detail exactly what is going on here >> and why.? This is a very intrusive change that seems to badly break >> encapsulation and impacts future changes to ObjectMonitor that are under >> investigation. >> >> --- >> >> src/hotspot/share/runtime/thread.cpp >> >> Can you please explain why JavaThread::wait_for_object_deoptimization >> has to be handcrafted in this way rather than using proper transitions. >> >> We got rid of "deopt suspend" some time ago and it is disturbing to see >> it being added back (effectively). This seems like it may be something >> that handshakes could be used for. >> >> Thanks, >> David >> ----- >> >> On 12/12/2019 7:02 am, David Holmes wrote: >>> On 12/12/2019 1:07 am, Reingruber, Richard wrote: >>>> Hi David, >>>> >>>> ??? > Most of the details here are in areas I can comment on in detail, >>>> but I >>>> ??? > did take an initial general look at things. >>>> >>>> Thanks for taking the time! >>> >>> Apologies the above should read: >>> >>> "Most of the details here are in areas I *can't* comment on in detail >>> ..." >>> >>> David >>> >>>> ??? > The only thing that jumped out at me is that I think the >>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>> ??? > >>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Yes, it should. Will add the method like above. >>>> >>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>> Without >>>> ??? > active testing this will just bit-rot. >>>> >>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>> workload. I will add a minimal test >>>> to keep it fresh. >>>> >>>> ??? > Also on the tests I don't understand your @requires clause: >>>> ??? > >>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> ??? > (vm.opt.TieredCompilation != true)) >>>> ??? > >>>> ??? > This seems to require that TieredCompilation is disabled, but >>>> tiered is >>>> ??? > our normal mode of operation. ?? >>>> ??? > >>>> >>>> I removed the clause. I guess I wanted to target the tests towards the >>>> code they are supposed to >>>> test, and it's easier to analyze failures w/o tiered compilation and >>>> with just one compiler thread. >>>> >>>> Additionally I will make use of >>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>> >>>> Thanks, >>>> Richard. >>>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net; >>>> hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi Richard, >>>> >>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>> Hi, >>>>> >>>>> I would like to get reviews please for >>>>> >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>> >>>>> Corresponding RFE: >>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>> >>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>> >>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>> issues (thanks!). In addition the >>>>> change is being tested at SAP since I posted the first RFR some >>>>> months ago. >>>>> >>>>> The intention of this enhancement is to benefit performance wise from >>>>> escape analysis even if JVMTI >>>>> agents request capabilities that allow them to access local variable >>>>> values. E.g. if you start-up >>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>> escape analysis is disabled right >>>>> from the beginning, well before a debugger attaches -- if ever one >>>>> should do so. With the >>>>> enhancement, escape analysis will remain enabled until and after a >>>>> debugger attaches. EA based >>>>> optimizations are reverted just before an agent acquires the >>>>> reference to an object. In the JBS item >>>>> you'll find more details. >>>> >>>> Most of the details here are in areas I can comment on in detail, but I >>>> did take an initial general look at things. >>>> >>>> The only thing that jumped out at me is that I think the >>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>> >>>> +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>> Without >>>> active testing this will just bit-rot. >>>> >>>> Also on the tests I don't understand your @requires clause: >>>> >>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> (vm.opt.TieredCompilation != true)) >>>> >>>> This seems to require that TieredCompilation is disabled, but tiered is >>>> our normal mode of operation. ?? >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>> >>>>> >>>>> From tobias.hartmann at oracle.com Tue Dec 17 07:39:42 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 17 Dec 2019 08:39:42 +0100 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: Hi Jatin, On 16.12.19 16:42, Bhateja, Jatin wrote: > Please find below the updated patch with the test case. > > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/ Thanks for adding the test. Some comments: - We try to avoid bug ids in test names. I would suggest a more descriptive name like "TestIrreducibleLoopWithVNNI". - Please also add the test to package compiler.loopopts - For Java code, we use 4 whitespace indentation - The variable 'c' is not used in 'mainTest' Thanks, Tobias From richard.reingruber at sap.com Tue Dec 17 10:24:53 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 17 Dec 2019 10:24:53 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com> References: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com> Message-ID: Hi Robbin, > Sorry I don't immediately see what issue there is in doing a handshake > instead of: > VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, > owned_monitors_list); VM_GetOwnedMonitorInfo /can/ be replaced by a handshake, but the calling_thread T1 needs to walk java_thread T2's stack /before/ to reallocate and relock objects, because the GC interface does not allow the VMThread to allocate from the java heap. T1: 1. reallocate scalar replaced objects of T2 // not possible as part of handshake/vmop, // because GC interface does not allow VMThread // to allocate from heap 2. execute VM_GetOwnedMonitorInfo() or equivalent handshake while T2 is /not/ pushing new frames with EA based optimizations. > > > > 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore? > > > > 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking > > T2, because either T2 or the VMThread will block in L10. > > Yes, sorry, I forgot/confused myself about asynch handshake. > (I have a test prototype for that, which removes suspend flag) > > > > > I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could > > continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating > > objects triggers GC or attempting to execute the vm operation in > > JvmtiEnv::GetOwnedMonitorStackDepthInfo(). > > > > It might be impossible to replace my suspend flag with handshakes that are available today, because > > if it was you could replace all the suspend flags right away, couldn't you? > > So adding asynch handshakes and a per thread handshake queue, we can. > (which this test prototype does) Yes, should work for my use case too. > The issue I'm thinking of is if we need selective polling first. > Suspend flags are not checked in every transition, e.g. vm->native. > A JVM TI agent don't expect to suspend it's own thread when suspending > all threads. > (that thread would be suspended when trying to get back to agent code > when it does vm->native transition) Note that JVM TI doesn't offer "suspending all threads" directly. It offers SuspendThreadList [1] which can be used to self-suspend: "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it" Thanks, Richard. [1] https://docs.oracle.com/en/java/javase/13/docs/specs/jvmti.html#SuspendThreadList -----Original Message----- From: Robbin Ehn Sent: Montag, 16. Dezember 2019 18:21 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, On 2019-12-16 14:41, Reingruber, Richard wrote: > Hi Robbin, > > first of all: thanks a lot for providing feedback. I do appreciate it. > > I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it. > > Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1] > > I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread > T1 would apply it on another thread T2. Sorry I don't immediately see what issue there is in doing a handshake instead of: VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, owned_monitors_list); > > 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore? > > 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking > T2, because either T2 or the VMThread will block in L10. Yes, sorry, I forgot/confused myself about asynch handshake. (I have a test prototype for that, which removes suspend flag) > > I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could > continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating > objects triggers GC or attempting to execute the vm operation in > JvmtiEnv::GetOwnedMonitorStackDepthInfo(). > > It might be impossible to replace my suspend flag with handshakes that are available today, because > if it was you could replace all the suspend flags right away, couldn't you? So adding asynch handshakes and a per thread handshake queue, we can. (which this test prototype does) The issue I'm thinking of is if we need selective polling first. Suspend flags are not checked in every transition, e.g. vm->native. A JVM TI agent don't expect to suspend it's own thread when suspending all threads. (that thread would be suspended when trying to get back to agent code when it does vm->native transition) > > Or I'm simply missing something... quite possible... :) No I think you got it right. Thanks, Robbin > > Thanks, Richard. > > [1] Drafted by Robbin (thanks!) > > 1 class EscapeBarrierSuspendHandshake : public HandshakeClosure { > 2 Semaphore _is_waiting; > 3 Semaphore _wait; > 4 bool _started; > 5 public: > 6 EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), > 7 _wait(0), _started(false) { } > 8 void do_thread(Thread* th) { > 9 _is_waiting.signal(); > 10 _wait.wait(); > 11 Atomic::store(&_started, true); > 12 } > 13 void wait_until_eb_stopped() { _is_waiting.wait(); } > 14 void start_thread() { > 15 _wait.signal(); > 16 while(!Atomic::load(&_started)) { > 17 os::naked_yield(); > 18 } > 19 } > 20 }; > > -----Original Message----- > From: Robbin Ehn > Sent: Montag, 16. Dezember 2019 11:21 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, as mentioned it would be better if you could do this with > handshakes, instead of using _suspend_flag (since they are going away). > But I can't think of a way doing it without blocking safepoints, so we need to > add some more features in handshakes first. > When possible I hope you are willing to move this code to handshakes instead. > > You could stop one thread with, e.g.: > class EscapeBarrierSuspendHandshake : public HandshakeClosure { > Semaphore _is_waiting; > Semaphore _wait; > bool _started; > public: > EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), > _wait(0), _started(false) { } > void do_thread(Thread* th) { > _is_waiting.signal(); > _wait.wait(); > Atomic::store(&_started, true); > } > void wait_until_eb_stopped() { _is_waiting.wait(); } > void start_thread() { > _wait.signal(); > while(!Atomic::load(&_started)) { > os::naked_yield(); > } > } > }; > > But it would block safepoints. > > Thanks, Robbin > > On 12/10/19 10:45 PM, Reingruber, Richard wrote: >> Hi, >> >> I would like to get reviews please for >> >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >> >> Corresponding RFE: >> https://bugs.openjdk.java.net/browse/JDK-8227745 >> >> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >> >> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the >> change is being tested at SAP since I posted the first RFR some months ago. >> >> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI >> agents request capabilities that allow them to access local variable values. E.g. if you start-up >> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right >> from the beginning, well before a debugger attaches -- if ever one should do so. With the >> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based >> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item >> you'll find more details. >> >> Thanks, >> Richard. >> >> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >> From robbin.ehn at oracle.com Tue Dec 17 11:01:03 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 17 Dec 2019 12:01:03 +0100 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <2fbd9ac0-1cb0-98c4-c2c4-becd1cf49fef@oracle.com> <9f24ec2c-d737-f9b7-8821-5905264971a7@oracle.com> Message-ID: <08d4f482-36a0-6499-0546-2a888da2a094@oracle.com> Hi Richard, On 12/17/19 11:24 AM, Reingruber, Richard wrote: > > So adding asynch handshakes and a per thread handshake queue, we can. > > (which this test prototype does) > > Yes, should work for my use case too. Great. > > > The issue I'm thinking of is if we need selective polling first. > > Suspend flags are not checked in every transition, e.g. vm->native. > > A JVM TI agent don't expect to suspend it's own thread when suspending > > all threads. > > (that thread would be suspended when trying to get back to agent code > > when it does vm->native transition) > > Note that JVM TI doesn't offer "suspending all threads" directly. It offers SuspendThreadList [1] > which can be used to self-suspend: "If the calling thread is specified in the request_list array, > this function will not return until some other thread resumes it" Maybe there is a test-bug here or it was more complicated scenario. I have to investigate, but suspending threads in all transitions causes a chunk of test failure in jdi/jvmti. The issue was suspending threads going vm->native (back to agent code). Thanks, Robbin > > Thanks, Richard. > > [1] https://docs.oracle.com/en/java/javase/13/docs/specs/jvmti.html#SuspendThreadList > > -----Original Message----- > From: Robbin Ehn > Sent: Montag, 16. Dezember 2019 18:21 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, > > On 2019-12-16 14:41, Reingruber, Richard wrote: >> Hi Robbin, >> >> first of all: thanks a lot for providing feedback. I do appreciate it. >> >> I am absolutely willing to move this to handshakes. Only I still can't see how to achieve it. >> >> Could you explain the drafted class EscapeBarrierSuspendHandshake a little bit? [1] >> >> I'd like to look at it by example of JvmtiEnv::GetOwnedMonitorStackDepthInfo() where calling_thread >> T1 would apply it on another thread T2. > > Sorry I don't immediately see what issue there is in doing a handshake > instead of: > VM_GetOwnedMonitorInfo op(this, calling_thread, java_thread, > owned_monitors_list); > >> >> 1. L13: is wait_until_eb_stopped to be called by T1 to wait until T2 cannot move anymore? >> >> 2. Handshakes between two threads are synchronous, correct? If so, then T1 will block handshaking >> T2, because either T2 or the VMThread will block in L10. > > Yes, sorry, I forgot/confused myself about asynch handshake. > (I have a test prototype for that, which removes suspend flag) > >> >> I cannot figure out, how you mean this. Only if a helper thread H would handshake T2 then T1 could >> continue and call wait_until_eb_stopped(). But returning from there T1 would block if reallocating >> objects triggers GC or attempting to execute the vm operation in >> JvmtiEnv::GetOwnedMonitorStackDepthInfo(). >> >> It might be impossible to replace my suspend flag with handshakes that are available today, because >> if it was you could replace all the suspend flags right away, couldn't you? > > So adding asynch handshakes and a per thread handshake queue, we can. > (which this test prototype does) > The issue I'm thinking of is if we need selective polling first. > Suspend flags are not checked in every transition, e.g. vm->native. > A JVM TI agent don't expect to suspend it's own thread when suspending > all threads. > (that thread would be suspended when trying to get back to agent code > when it does vm->native transition) > >> >> Or I'm simply missing something... quite possible... :) > > No I think you got it right. > > Thanks, Robbin > >> >> Thanks, Richard. >> >> [1] Drafted by Robbin (thanks!) >> >> 1 class EscapeBarrierSuspendHandshake : public HandshakeClosure { >> 2 Semaphore _is_waiting; >> 3 Semaphore _wait; >> 4 bool _started; >> 5 public: >> 6 EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), >> 7 _wait(0), _started(false) { } >> 8 void do_thread(Thread* th) { >> 9 _is_waiting.signal(); >> 10 _wait.wait(); >> 11 Atomic::store(&_started, true); >> 12 } >> 13 void wait_until_eb_stopped() { _is_waiting.wait(); } >> 14 void start_thread() { >> 15 _wait.signal(); >> 16 while(!Atomic::load(&_started)) { >> 17 os::naked_yield(); >> 18 } >> 19 } >> 20 }; >> >> -----Original Message----- >> From: Robbin Ehn >> Sent: Montag, 16. Dezember 2019 11:21 >> To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents >> >> Hi Richard, as mentioned it would be better if you could do this with >> handshakes, instead of using _suspend_flag (since they are going away). >> But I can't think of a way doing it without blocking safepoints, so we need to >> add some more features in handshakes first. >> When possible I hope you are willing to move this code to handshakes instead. >> >> You could stop one thread with, e.g.: >> class EscapeBarrierSuspendHandshake : public HandshakeClosure { >> Semaphore _is_waiting; >> Semaphore _wait; >> bool _started; >> public: >> EscapeBarrierSuspendHandshake() : HandshakeClosure("EscapeBarrierSuspend"), >> _wait(0), _started(false) { } >> void do_thread(Thread* th) { >> _is_waiting.signal(); >> _wait.wait(); >> Atomic::store(&_started, true); >> } >> void wait_until_eb_stopped() { _is_waiting.wait(); } >> void start_thread() { >> _wait.signal(); >> while(!Atomic::load(&_started)) { >> os::naked_yield(); >> } >> } >> }; >> >> But it would block safepoints. >> >> Thanks, Robbin >> >> On 12/10/19 10:45 PM, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to get reviews please for >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>> >>> Corresponding RFE: >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> >>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>> >>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the >>> change is being tested at SAP since I posted the first RFR some months ago. >>> >>> The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI >>> agents request capabilities that allow them to access local variable values. E.g. if you start-up >>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right >>> from the beginning, well before a debugger attaches -- if ever one should do so. With the >>> enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based >>> optimizations are reverted just before an agent acquires the reference to an object. In the JBS item >>> you'll find more details. >>> >>> Thanks, >>> Richard. >>> >>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>> From adinn at redhat.com Tue Dec 17 11:33:40 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 17 Dec 2019 11:33:40 +0000 Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> Message-ID: <87eb867b-18f6-6b06-7b6b-450391066b7f@redhat.com> On 12/12/2019 19:14, Andrew Haley wrote: > This assertion failure happens because load/store patterns in > aarch64.ad are incorrect. > . . . > Andrew Dinn will probably have a cow when he sees this patch. :-) I am currently suffering birth pangs. I will get back to you with a proper review asap. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.x.ivanov at oracle.com Tue Dec 17 12:50:13 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 17 Dec 2019 15:50:13 +0300 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 Message-ID: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-7175279 There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base. Proposed patch removes the rest and makes x86-64 code x87-free. The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2]. Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). x87 instructions are made available only on x86-32. C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only. Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8175916 [2] https://bugs.openjdk.java.net/browse/JDK-8136414 [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From jzaugg at gmail.com Tue Dec 17 13:15:42 2019 From: jzaugg at gmail.com (Jason Zaugg) Date: Tue, 17 Dec 2019 23:15:42 +1000 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> References: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> Message-ID: On Fri, 13 Dec 2019 at 15:07, John Rose wrote: > I?m intrigued that you are interested in this, and I encourage you to consider > pulling on this string some more. I?ll help you pull if you want. > > I think this is doable, to a degree, as a starter project, to install new parameters > and do initial exploration of their settings. Actually dialing in the settings > and testing them across a range of workloads is a very specialized job, which > few of us are good at, but if the optimization seems to pan out we can find > the necessary kind of expert. > > Your first step, should you choose to accept this mission, would be to join > the community. Your name should be on http://openjdk.java.net/census. > See http://openjdk.java.net/contribute/. Thanks for the encouragement. I've submitted an OCA and will work on a patch. I'm past some initial tooling issues -- I can build an image and run jtreg.. I've added [1] a simple version of the analysis to the existing the flow analysis and hooked this into InlineTree::try_to_inline. I've added a new test in the same manner as inlining/InlineAccessors.java. I'll flesh this out and report back after Christmas. -jason [1] https://github.com/retronym/jdk/pull/1/ From jesper.wilhelmsson at oracle.com Tue Dec 17 14:26:29 2019 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Tue, 17 Dec 2019 15:26:29 +0100 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> Message-ID: <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com> Hi, This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14? In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped? Thanks, /Jesper > On 17 Dec 2019, at 13:50, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-7175279 > > There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base. > > Proposed patch removes the rest and makes x86-64 code x87-free. > > The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2]. > > Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). > > x87 instructions are made available only on x86-32. > > C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only. > > Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8175916 > > [2] https://bugs.openjdk.java.net/browse/JDK-8136414 > > [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From richard.reingruber at sap.com Tue Dec 17 14:47:37 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 17 Dec 2019 14:47:37 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> Message-ID: Hi David, > > > Some further queries/concerns: > > > > > > src/hotspot/share/runtime/objectMonitor.cpp > > > > > > Can you please explain the changes to ObjectMonitor::wait: > > > > > > ! _recursions = save // restore the old recursion count > > > ! + jt->get_and_reset_relock_count_after_wait(); // > > > increased by the deferred relock count > > > > > > what is the "deferred relock count"? I gather it relates to > > > > > > "The code was extended to be able to deoptimize objects of a > > frame that > > > is not the top frame and to let another thread than the owning > > thread do > > > it." > > > > Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is > > replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated > > locking. New with the enhancement is that we do this also just before object references are > > acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we > > register deoptimized objects as deferred updates. When control returns to C it gets deoptimized, > > we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again > > (relocking twice would be incorrect of course). Deferred updates are copied into the new > > interpreter frames. > > > > Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to > > be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking > > is deferred until T owns the monitor again. This is what the piece of code above does. > > Sorry I need some more detail here. How can you wait() on an object > monitor if the object allocation and/or locking was optimised away? And > what is a "non-local object" in this context? Isn't EA restricted to > thread-confined objects? "Non-local object" is an object that escapes its thread. The issue I'm addressing with the changes in ObjectMonitor::wait are almost unrelated to EA. They are caused by EliminateNestedLocks, where C2 eliminates recursive locking of an already owned lock. The lock owning object exists on the heap, it is locked and you can call wait() on it. EliminateLocks is the C2 option that controls lock elimination based on EA. Both optimizations have in common that objects with eliminated locking need to be relocked when deoptimizing a frame, i.e. when replacing a compiled frame with equivalent interpreter frames. Deoptimization::relock_objects does that job for /all/ eliminated locks in scope. /All/ can be a mix of eliminated nested locks and locks of not-escaping objects. New with the enhancement: I call relock_objects earlier, just before objects pontentially escape. But then later when the owning compiled frame gets deoptimized, I must not do it again: See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp: 373 if ((jvmci_enabled || ((DoEscapeAnalysis || EliminateNestedLocks) && EliminateLocks)) 374 && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { 375 bool unused; 376 eliminate_locks(thread, chunk, realloc_failures, deoptee, exec_mode, unused); 377 } Now when calling relock_objects early it is quiet possible that I have to relock an object the target thread currently waits for. Obviously I cannot relock in this case, instead I chose to introduce relock_count_after_wait to JavaThread. > Is it just that some of the locking gets optimized away e.g. > > synchronised(obj) { > synchronised(obj) { > synchronised(obj) { > obj.wait(); > } > } > } > > If this is reduced to a form as-if it were a single lock of the monitor > (due to EA) and the wait() triggers a JVM TI event which leads to the > escape of "obj" then we need to reconstruct the true lock state, and so > when the wait() internally unblocks and reacquires the monitor it has to > set the true recursion count to 3, not the 1 that it appeared to be when > wait() was initially called. Is that the scenario? Kind of... except that the locking is not eliminated due to EA and there is no JVM TI event triggered by wait. Add LocalObject l1 = new LocalObject(); in front of the synchrnized blocks and assume a JVM TI agent acquires l1. This triggers the code in question. See that relocking/reallocating is transactional. If it is done then for /all/ objects in scope and it is done at most once. It wouldn't be quite so easy to split this in relocking of nested/EA-based eliminated locks. > If so I find this truly awful. Anyone using wait() in a realistic form > requires a notification and so the object cannot be thread confined. In It is not thread confined. > which case I would strongly argue that upon hitting the wait() the deopt > should occur unconditionally and so the lock state is correct before we > wait and so we don't need to mess with the recursion count internally > when we reacquire the monitor. > > > > > > which I don't like the sound of at all when it comes to ObjectMonitor > > > state. So I'd like to understand in detail exactly what is going on here > > > and why. This is a very intrusive change that seems to badly break > > > encapsulation and impacts future changes to ObjectMonitor that are under > > > investigation. > > > > I would not regard this as breaking encapsulation. Certainly not badly. > > > > I've added a property relock_count_after_wait to JavaThread. The property is well > > encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free > > in choosing a way to do that as long as that property is taken into account. This is hardly a > > limitation. > > I do think this badly breaks encapsulation as you have to add a callout > from the guts of the ObjectMonitor code to reach into the thread to get > this lock count adjustment. I understand why you have had to do this but > I would much rather see a change to the EA optimisation strategy so that > this is not needed. > > > Note also that the property is a straight forward extension of the existing concept of deferred > > local updates. It is embedded into the structure holding them. So not even the footprint of a > > JavaThread is enlarged if no deferred updates are generated. > > [...] > > > > > I'm actually duplicating the existing external suspend mechanism, because a thread can be > > suspended at most once. And hey, and don't like that either! But it seems not unlikely that the > > duplicate can be removed together with the original and the new type of handshakes that will be > > used for thread suspend can be used for object deoptimization too. See today's discussion in > > JDK-8227745 [2]. > > I hope that discussion bears some fruit, at the moment it seems not to > be possible to use handshakes here. :( > > The external suspend mechanism is a royal pain in the proverbial that we > have to carefully live with. The idea that we're duplicating that for > use in another fringe area of functionality does not thrill me at all. > > To be clear, I understand the problem that exists and that you wish to > solve, but for the runtime parts I balk at the complexity cost of > solving it. I know it's complex, but by far no rocket science. Also I find it hard to imagine another fix for JDK-8233915 besides changing the JVM TI specification. Thanks, Richard. -----Original Message----- From: David Holmes Sent: Dienstag, 17. Dezember 2019 08:03 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Vladimir Kozlov (vladimir.kozlov at oracle.com) Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents David On 17/12/2019 4:57 pm, David Holmes wrote: > Hi Richard, > > On 14/12/2019 5:01 am, Reingruber, Richard wrote: >> Hi David, >> >> ?? > Some further queries/concerns: >> ?? > >> ?? > src/hotspot/share/runtime/objectMonitor.cpp >> ?? > >> ?? > Can you please explain the changes to ObjectMonitor::wait: >> ?? > >> ?? > !?? _recursions = save????? // restore the old recursion count >> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >> ?? > increased by the deferred relock count >> ?? > >> ?? > what is the "deferred relock count"? I gather it relates to >> ?? > >> ?? > "The code was extended to be able to deoptimize objects of a >> frame that >> ?? > is not the top frame and to let another thread than the owning >> thread do >> ?? > it." >> >> Yes, these relate. Currently EA based optimizations are reverted, when >> a compiled frame is replaced >> with corresponding interpreter frames. Part of this is relocking >> objects with eliminated >> locking. New with the enhancement is that we do this also just before >> object references are acquired >> through JVMTI. In this case we deoptimize also the owning compiled >> frame C and we register >> deoptimized objects as deferred updates. When control returns to C it >> gets deoptimized, we notice >> that objects are already deoptimized (reallocated and relocked), so we >> don't do it again (relocking >> twice would be incorrect of course). Deferred updates are copied into >> the new interpreter frames. >> >> Problem: relocking is not possible if the target thread T is waiting >> on the monitor that needs to be >> relocked. This happens only with non-local objects with >> EliminateNestedLocks. Instead relocking is >> deferred until T owns the monitor again. This is what the piece of >> code above does. > > Sorry I need some more detail here. How can you wait() on an object > monitor if the object allocation and/or locking was optimised away? And > what is a "non-local object" in this context? Isn't EA restricted to > thread-confined objects? > > Is it just that some of the locking gets optimized away e.g. > > synchronised(obj) { > ? synchronised(obj) { > ??? synchronised(obj) { > ????? obj.wait(); > ??? } > ? } > } > > If this is reduced to a form as-if it were a single lock of the monitor > (due to EA) and the wait() triggers a JVM TI event which leads to the > escape of "obj" then we need to reconstruct the true lock state, and so > when the wait() internally unblocks and reacquires the monitor it has to > set the true recursion count to 3, not the 1 that it appeared to be when > wait() was initially called. Is that the scenario? > > If so I find this truly awful. Anyone using wait() in a realistic form > requires a notification and so the object cannot be thread confined. In > which case I would strongly argue that upon hitting the wait() the deopt > should occur unconditionally and so the lock state is correct before we > wait and so we don't need to mess with the recursion count internally > when we reacquire the monitor. > >> >> ?? > which I don't like the sound of at all when it comes to >> ObjectMonitor >> ?? > state. So I'd like to understand in detail exactly what is going >> on here >> ?? > and why.? This is a very intrusive change that seems to badly break >> ?? > encapsulation and impacts future changes to ObjectMonitor that >> are under >> ?? > investigation. >> >> I would not regard this as breaking encapsulation. Certainly not badly. >> >> I've added a property relock_count_after_wait to JavaThread. The >> property is well >> encapsulated. Future ObjectMonitor implementations have to deal with >> recursion too. They are free in >> choosing a way to do that as long as that property is taken into >> account. This is hardly a >> limitation. > > I do think this badly breaks encapsulation as you have to add a callout > from the guts of the ObjectMonitor code to reach into the thread to get > this lock count adjustment. I understand why you have had to do this but > I would much rather see a change to the EA optimisation strategy so that > this is not needed. > >> Note also that the property is a straight forward extension of the >> existing concept of deferred >> local updates. It is embedded into the structure holding them. So not >> even the footprint of a >> JavaThread is enlarged if no deferred updates are generated. >> >> ?? > --- >> ?? > >> ?? > src/hotspot/share/runtime/thread.cpp >> ?? > >> ?? > Can you please explain why >> JavaThread::wait_for_object_deoptimization >> ?? > has to be handcrafted in this way rather than using proper >> transitions. >> ?? > >> >> I wrote wait_for_object_deoptimization taking >> JavaThread::java_suspend_self_with_safepoint_check >> as template. So in short: for the same reasons :) >> >> Threads reach both methods as part of thread state transitions, >> therefore special handling is >> required to change thread state on top of ongoing transitions. >> >> ?? > We got rid of "deopt suspend" some time ago and it is disturbing >> to see >> ?? > it being added back (effectively). This seems like it may be >> something >> ?? > that handshakes could be used for. >> >> Deopt suspend used to be something rather different with a similar >> name[1]. It is not being added back. > > I stand corrected. Despite comments in the code to the contrary > deopt_suspend didn't actually cause a self-suspend. I was doing a lot of > cleanup in this area 13 years ago :) > >> >> I'm actually duplicating the existing external suspend mechanism, >> because a thread can be suspended >> at most once. And hey, and don't like that either! But it seems not >> unlikely that the duplicate can >> be removed together with the original and the new type of handshakes >> that will be used for >> thread suspend can be used for object deoptimization too. See today's >> discussion in JDK-8227745 [2]. > > I hope that discussion bears some fruit, at the moment it seems not to > be possible to use handshakes here. :( > > The external suspend mechanism is a royal pain in the proverbial that we > have to carefully live with. The idea that we're duplicating that for > use in another fringe area of functionality does not thrill me at all. > > To be clear, I understand the problem that exists and that you wish to > solve, but for the runtime parts I balk at the complexity cost of > solving it. > > Thanks, > David > ----- > >> Thanks, Richard. >> >> [1] Deopt suspend was something like an async. handshake for >> architectures with register windows, >> ???? where patching the return pc for deoptimization of a compiled >> frame was racy if the owner thread >> ???? was in native code. Instead a "deopt" suspend flag was set on >> which the thread patched its own >> ???? frame upon return from native. So no thread was suspended. It got >> its name only from the name of >> ???? the flags. >> >> [2] Discussion about using handshakes to sync. with the target thread: >> >> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 >> >> >> -----Original Message----- >> From: David Holmes >> Sent: Freitag, 13. Dezember 2019 00:56 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in the Presence of JVMTI Agents >> >> Hi Richard, >> >> Some further queries/concerns: >> >> src/hotspot/share/runtime/objectMonitor.cpp >> >> Can you please explain the changes to ObjectMonitor::wait: >> >> !?? _recursions = save????? // restore the old recursion count >> !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >> increased by the deferred relock count >> >> what is the "deferred relock count"? I gather it relates to >> >> "The code was extended to be able to deoptimize objects of a frame that >> is not the top frame and to let another thread than the owning thread do >> it." >> >> which I don't like the sound of at all when it comes to ObjectMonitor >> state. So I'd like to understand in detail exactly what is going on here >> and why.? This is a very intrusive change that seems to badly break >> encapsulation and impacts future changes to ObjectMonitor that are under >> investigation. >> >> --- >> >> src/hotspot/share/runtime/thread.cpp >> >> Can you please explain why JavaThread::wait_for_object_deoptimization >> has to be handcrafted in this way rather than using proper transitions. >> >> We got rid of "deopt suspend" some time ago and it is disturbing to see >> it being added back (effectively). This seems like it may be something >> that handshakes could be used for. >> >> Thanks, >> David >> ----- >> >> On 12/12/2019 7:02 am, David Holmes wrote: >>> On 12/12/2019 1:07 am, Reingruber, Richard wrote: >>>> Hi David, >>>> >>>> ??? > Most of the details here are in areas I can comment on in detail, >>>> but I >>>> ??? > did take an initial general look at things. >>>> >>>> Thanks for taking the time! >>> >>> Apologies the above should read: >>> >>> "Most of the details here are in areas I *can't* comment on in detail >>> ..." >>> >>> David >>> >>>> ??? > The only thing that jumped out at me is that I think the >>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>> ??? > >>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Yes, it should. Will add the method like above. >>>> >>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>> Without >>>> ??? > active testing this will just bit-rot. >>>> >>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>> workload. I will add a minimal test >>>> to keep it fresh. >>>> >>>> ??? > Also on the tests I don't understand your @requires clause: >>>> ??? > >>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> ??? > (vm.opt.TieredCompilation != true)) >>>> ??? > >>>> ??? > This seems to require that TieredCompilation is disabled, but >>>> tiered is >>>> ??? > our normal mode of operation. ?? >>>> ??? > >>>> >>>> I removed the clause. I guess I wanted to target the tests towards the >>>> code they are supposed to >>>> test, and it's easier to analyze failures w/o tiered compilation and >>>> with just one compiler thread. >>>> >>>> Additionally I will make use of >>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>> >>>> Thanks, >>>> Richard. >>>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net; >>>> hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi Richard, >>>> >>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>> Hi, >>>>> >>>>> I would like to get reviews please for >>>>> >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>> >>>>> Corresponding RFE: >>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>> >>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>> >>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>> issues (thanks!). In addition the >>>>> change is being tested at SAP since I posted the first RFR some >>>>> months ago. >>>>> >>>>> The intention of this enhancement is to benefit performance wise from >>>>> escape analysis even if JVMTI >>>>> agents request capabilities that allow them to access local variable >>>>> values. E.g. if you start-up >>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>> escape analysis is disabled right >>>>> from the beginning, well before a debugger attaches -- if ever one >>>>> should do so. With the >>>>> enhancement, escape analysis will remain enabled until and after a >>>>> debugger attaches. EA based >>>>> optimizations are reverted just before an agent acquires the >>>>> reference to an object. In the JBS item >>>>> you'll find more details. >>>> >>>> Most of the details here are in areas I can comment on in detail, but I >>>> did take an initial general look at things. >>>> >>>> The only thing that jumped out at me is that I think the >>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>> >>>> +? bool is_hidden_from_external_view() const { return true; } >>>> >>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>> Without >>>> active testing this will just bit-rot. >>>> >>>> Also on the tests I don't understand your @requires clause: >>>> >>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>> (vm.opt.TieredCompilation != true)) >>>> >>>> This seems to require that TieredCompilation is disabled, but tiered is >>>> our normal mode of operation. ?? >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>> >>>>> >>>>> From vladimir.x.ivanov at oracle.com Tue Dec 17 15:02:38 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 17 Dec 2019 18:02:38 +0300 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com> Message-ID: Hi Jesper, > This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14? > In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped? I consider the risk of merge conflicts as low, since the bulk of changes touch C1 (and it isn't usually changed much). But I'm fine with waiting until the rate of fixes in JDK 14 drops (after RDP2?). Best regards, Vladimir Ivanov >> On 17 Dec 2019, at 13:50, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-7175279 >> >> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base. >> >> Proposed patch removes the rest and makes x86-64 code x87-free. >> >> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2]. >> >> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). >> >> x87 instructions are made available only on x86-32. >> >> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only. >> >> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >> >> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >> >> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp > From jesper.wilhelmsson at oracle.com Tue Dec 17 15:38:55 2019 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Tue, 17 Dec 2019 16:38:55 +0100 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> <1A324EE8-6965-4F76-B55B-D16184DE2ED9@oracle.com> Message-ID: <722D0395-9B89-490C-A6C4-5604AFFDA94B@oracle.com> > On 17 Dec 2019, at 16:02, Vladimir Ivanov wrote: > > Hi Jesper, > >> This is a fairly large (wide spread) change. Is there any risk for conflicts with remaining work in JDK 14? >> In the interest of keeping forwardports as conflict free as possible, would it make sense to hold this change until the number of changes in 14 has dropped? > > I consider the risk of merge conflicts as low, since the bulk of changes touch C1 (and it isn't usually changed much). > > But I'm fine with waiting until the rate of fixes in JDK 14 drops (after RDP2?). Yes, once we enter RDP2 there shouldn't be many changes left for JDK 14. Thank you! /Jesper > > Best regards, > Vladimir Ivanov > >>> On 17 Dec 2019, at 13:50, Vladimir Ivanov wrote: >>> >>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>> >>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base. >>> >>> Proposed patch removes the rest and makes x86-64 code x87-free. >>> >>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2]. >>> >>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). >>> >>> x87 instructions are made available only on x86-32. >>> >>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only. >>> >>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>> >>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>> >>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From adinn at redhat.com Tue Dec 17 16:39:41 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 17 Dec 2019 16:39:41 +0000 Subject: [aarch64-port-dev ] RFR: 8235385: AArch64: Crash on aarch64 JDK due to long offset In-Reply-To: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> References: <154c82af-865f-d250-395d-6cb9b41b2780@redhat.com> Message-ID: On 12/12/2019 19:14, Andrew Haley wrote: > In a rather belt-and-braces way I've also added some code that fixes > up illegal addresses. I did consider removing it, but I left it in > because it doesn't hurt. If we ever do generate similar illegal > addresses, debug builds will assert. I'm not sure whether to keep this > or not. > > Andrew Dinn will probably have a cow when he sees this patch. :-) > > OK for HEAD? Well, that wasn't quite so painful as it sounded. Code Review: The repeated warning comments are a very good idea. I spotted two things in aarch64.ad: 1) Range rule: 7115 // Load Range 7116 instruct loadRange(iRegINoSp dst, memory8 mem) 7117 %{ 7118 match(Set dst (LoadRange mem)); 7119 7120 ins_cost(4 * INSN_COST); 7121 format %{ "ldrw $dst, $mem\t# range" %} 7122 7123 ins_encode(aarch64_enc_ldrw(dst, mem)); 7124 7125 ins_pipe(iload_reg_mem); 7126 %} I think that should be memory4? Also, should you maybe change the comment to // Load Range (32 bit signed) 2) Pop count rules The following rule uses 'memory4' to declare the memory op but passes 'sizeof (jfloat)' to the ldrs call. Would the latter not be better passed as 4? (n.b. you use the matching numeric literal as argument to loadStore in the encoding class definitions). 8161 instruct popCountI_mem(iRegINoSp dst, memory4 mem, vRegF tmp) %{8162 predicate(UsePopCountInstruction); 8163 match(Set dst (PopCountI (LoadI mem))); 8164 effect(TEMP tmp); 8165 ins_cost(INSN_COST * 13); 8166 8167 format %{ "ldrs $tmp, $mem\n\t" 8168 "cnt $tmp, $tmp\t# vector (8B)\n\t" 8169 "addv $tmp, $tmp\t# vector (8B)\n\t" 8170 "mov $dst, $tmp\t# vector (1D)" %} 8171 ins_encode %{ 8172 FloatRegister tmp_reg = as_FloatRegister($tmp$$reg); 8173 loadStore(MacroAssembler(&cbuf), &MacroAssembler::ldrs, tmp_reg, $mem->opcode(), 8174 as_Register($mem$$base), $mem$$index, $mem$$scale, $mem$$disp, sizeof (jfloat)); 8175 __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); 8176 __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); 8177 __ mov($dst$$Register, $tmp$$FloatRegister, __ T1D, 0); 8178 %} . . . The same question applied for the next definition 8204 instruct popCountL_mem(iRegINoSp dst, memory8 mem, vRegD tmp) %{ . . . Yes, I too do not like the look of legitimize_address and it ought not to be necessary, given that the assert in debug mode ought to stop us hitting this in product mode. Still belt and braces is always a good thing so I'm happy for to stay (and do no harm). Testing: Was there a rationale for picking those specific offsets for the accesses? Anyway, I'm assuming the test didn't crash the JVM ;-) So, the only real issue is the size in the Range rule. I guess you didn't trigger a case where that mattered. Modulo that it is good to push. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.kozlov at oracle.com Tue Dec 17 17:45:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 09:45:18 -0800 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> Message-ID: Finally! Very good cleanup. Few notes. c1_CodeStubs.hpp - I think it should be stronger than assert to catch it in product too (we can do check in product because it is not performance critical code). c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64: https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 Thanks, Vladimir On 12/17/19 4:50 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-7175279 > > There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 > code base. > > Proposed patch removes the rest and makes x86-64 code x87-free. > > The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for > JEP 306 [1] and related enhancements [2]. > > Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses > StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). > > x87 instructions are made available only on x86-32. > > C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] > x86-32-only. > > Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8175916 > > [2] https://bugs.openjdk.java.net/browse/JDK-8136414 > > [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From jatin.bhateja at intel.com Tue Dec 17 18:58:38 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Tue, 17 Dec 2019 18:58:38 +0000 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: Hi Tobias, Please find updated patch at following link. http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/ Thanks, Jatin > -----Original Message----- > From: Tobias Hartmann > Sent: Tuesday, December 17, 2019 1:10 PM > To: Bhateja, Jatin > Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov > > Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class > > Hi Jatin, > > On 16.12.19 16:42, Bhateja, Jatin wrote: > > Please find below the updated patch with the test case. > > > > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/ > > Thanks for adding the test. Some comments: > - We try to avoid bug ids in test names. I would suggest a more descriptive > name like "TestIrreducibleLoopWithVNNI". > - Please also add the test to package compiler.loopopts > - For Java code, we use 4 whitespace indentation > - The variable 'c' is not used in 'mainTest' > > Thanks, > Tobias From john.r.rose at oracle.com Tue Dec 17 22:00:09 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 17 Dec 2019 14:00:09 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> Message-ID: On Dec 13, 2019, at 2:44 AM, Vladimir Ivanov wrote: > > Also, I'd like to add that it's fine to fix it in incrementally: start with something simple and reliable, and then explore more complex extensions on top of it. +1 > I don't think it's necessary to bytecode analysis too far. I agree with you that some point it becomes simpler to just parse the method and observe the effects than trying to derive them directly from bytecode. The reason I point out ciTypeFlow is that it already parses the bytecodes. It does this mainly to build a CFG, but given the framework it is easy to add other small chores. I see counting instructions, by kind, as a small chore. Another place where such counting could be added is the construction of ciMethodData. It seems reasonable to me to associate static metrics with a method at the same time as setting up dynamic metric collection. Jason noted JDK-8056071 as an example of bytecode instruction scanning. The fix in that bug is to extend the specialized recognizers of bytecodes.*pp to detect ?tiny methods?. If I had to choose between adding more of those recognizers and adding extra chores to the ciMethodData or ciTypeFlow passes, I?d prefer to do the latter, because the instruction classification is already written and debugged for the full passes. The recognizers in bytecodes.*pp always bothered me a little, that they are thrown off by tiny variations in the code. A counting technique that looks at all instructions is more robust, since it can more easily discount simple instructions like data movement and constant materialization. > (I fully agree that it requires significant effort to enable IR-based analysis in C2. But also there are ways to workaround that and cache the analysis results across compilations: gather data during stand-alone compilation and then reuse it when doing inlining.) I?d like to mine out more of the potential in adding those ?extra chores? to existing passes before doing IR caching. In fact, I see IR caching as one of those big investments that we should do in C2?s Java-based successor. Caching is a near cousin to serialization, and that is much more easily done on reflective Java data than on C++ data. (Or maybe I missed your point?) The C2 IR is not garbage collected, but rather thread-confined and deleted wholesale after each compilation task. Sharing across compilation tasks is a hard problem, as we found when we built the CI. ? John From john.r.rose at oracle.com Tue Dec 17 22:13:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 17 Dec 2019 14:13:50 -0800 Subject: RFR: 8234863: Increase default value of MaxInlineLevel In-Reply-To: References: <62A1DE3C-3072-4ACE-A495-94C9281B4888@oracle.com> Message-ID: <059C0946-7C30-43C1-BBD8-AE496360C2A1@oracle.com> On Dec 17, 2019, at 5:15 AM, Jason Zaugg wrote: > > Thanks for the encouragement. I've submitted an OCA and will work on a > patch. Very good! > I'm past some initial tooling issues -- I can build an image and run jtreg.. I've > added [1] a simple version of the analysis to the existing the flow analysis and > hooked this into InlineTree::try_to_inline. I've added a new test in the same > manner as inlining/InlineAccessors.java. This is on the right track. Try to find the simple wins, of course. And please see my previous messing in this thread to Vladimir. One place to consider piggy-backing the counting techniques in ciTypeFlow is in apply_one_bytecode. (Counts should only be accumulated the first time through each block.) The tricky part would be factoring the code so that the counting logic would not obscure the type flowing logic, but I think this is doable. The reason I like this possibility is that ciTypeFlow ignores some unreached instructions. That potentially covers some cases of assertion code which is turned off, and exception processing code which has never (yet) been called, and would force a deoptimization if it were called into service. These cases are notorious for harming inlining. If we could say ?this is a tiny method, except for assertion and exception processing code that never runs?, we could build a more robust inlining policy. Something similar might be doable with MethodData, earlier in execution; I haven?t thought it through. Perhaps MethodData could associate instruction kind counts with reachable points in the bytecode, which other clients, including ciTypeFlow, could make use of. I think ciTF is a better cut point. > I'll flesh this out and report back after Christmas. Nice; let?s see where it goes. Despite all my musings about abstract interpretation, do start small. Even small is hard. ? John From vladimir.kozlov at oracle.com Wed Dec 18 00:09:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 16:09:10 -0800 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> Message-ID: <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com> This looks good to me. I will leave final review and sponsoring to Tobias. Thanks, Vladimir On 12/17/19 10:58 AM, Bhateja, Jatin wrote: > Hi Tobias, > > Please find updated patch at following link. > > http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/ > > Thanks, > Jatin > >> -----Original Message----- >> From: Tobias Hartmann >> Sent: Tuesday, December 17, 2019 1:10 PM >> To: Bhateja, Jatin >> Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >> >> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class >> >> Hi Jatin, >> >> On 16.12.19 16:42, Bhateja, Jatin wrote: >>> Please find below the updated patch with the test case. >>> >>> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/ >> >> Thanks for adding the test. Some comments: >> - We try to avoid bug ids in test names. I would suggest a more descriptive >> name like "TestIrreducibleLoopWithVNNI". >> - Please also add the test to package compiler.loopopts >> - For Java code, we use 4 whitespace indentation >> - The variable 'c' is not used in 'mainTest' >> >> Thanks, >> Tobias From vladimir.kozlov at oracle.com Wed Dec 18 01:06:30 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 17:06:30 -0800 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> Message-ID: <3bc591f0-6041-2b95-bcb2-26a00c190922@oracle.com> On 12/16/19 2:29 PM, Vladimir Ivanov wrote: > Hi Vladimir, > > Updated version: > ? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/ Good. > > What's changed: > ? * Added more comments > ? * Fixed missing cases (Repl4B_imm() and Repl8B_imm) > ??? * Double-checked that there are no other missing cases left: > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt > > I'd like to reiterate that I deliberately don't want to spend too much time polishing the current version because I > consider it as an interim point and not as the optimal and desired one (though it's definitely much better than where we > started both in number of instructions and clarity). I was about to comment changes for Long values but I agree that what is dine is enough. > > The shape I want to see in the next couple of iterations is the following: > > ? (1) all the operation implementations encapsulated in MacroAssembler > ??? * CPU dispatching will happen there, not in AD file; Yes, we discussed it before - asm instructions selection should be done in MacroAssembler (it is here exactly for that purpose). > > ? (2) get rid of vec vs legVec separation as much as possible > ??? * one of the ways to fix it is to introduce additional operand types (for example, vecBW == {legVec when avx512() > && !avx512bw(); vec, otherwise}) It would be nice. > > It would turn current ReplB_reg & ReplB_reg_leg into: > > instruct ReplB_reg(vecBW dst, scalar src) %{ > ? match(Set dst (ReplicateB src)); > ? format %{ "replicate $dst,$src" %} > ? ins_encode %{ > ??? uint vlen = vector_length(this); > ??? __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen); > ? %} > %} > > where MacroAssembler::replicate_byte() hides all the dispatching logic against CPU capabilities and vector length. > > Moreover, it opens additional merging opportunities. As an example: > > instruct ReplBS_reg(vecBW dst, rRegI src) %{ > ? match(Set dst (ReplicateB src)); > ? match(Set dst (ReplicateS src)); > ? format %{ "replicate $dst,$src" %} > ? ins_encode %{ > ??? uint vlen = vector_length(this); > ??? switch (ideal_Opcode()) { > ?????? case Op_ReplicateB: __ replicate_byte($dst$$XMMRegister, $src$$Register, vlen); break; > ?????? case Op_ReplicateS: __ replicate_short($dst$$XMMRegister, $src$$Register, vlen); break; > ?????? default: ShouldNotReachHere(); > ??? } > ? %} > ? ins_pipe( pipe_slow ); > %} > > If we agree that this is the direction we want to move, splitting instructions is counter-productive and just pushes > more work for later iterations. Same applies to dispatching logic on CPU & vector length. Yes, I agree. Thanks, Vladimir > > Best regards, > Vladimir Ivanov > > On 14.12.2019 01:18, Vladimir Kozlov wrote: >> On 12/13/19 2:27 AM, Vladimir Ivanov wrote: >>> Thanks for the feedback, Vladimir. >>> >>>> replicateB >>>> >>>> Can you fold it differently? >>>> >>>> ReplB_reg_leg >>> >>> Are you talking about ReplB_reg? >> >> Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] length vectors but I missed that ReplB_reg_leg >> uses legVec and needs separate instructions :( >> >> And I wanted to separate instruction which use avx 512 (evpbroadcastb) because it is difficult to see relation between >> predicate condition (length() == 64 && VM_Version::supports_avx512bw()) and check in code (vlen == 64 || >> VM_Version::supports_avx512vlbw()). First, || vs &&. Second, avx512bw vs avx512vlbw. May be better to have a separate >> instruction for this. >> >>> >>>> ?? predicate(!VM_Version::supports_avx512vlbw()); >>> >>> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >>> 3152?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); >>> >>> For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. >>> Otherwise, legVec constraint will be unnecessarily applied to other configurations. >> >> That is why you replaced !avx512vlbw with !avx512bw? >> May be this section of code need comment which explains why one or an other is used. >> >>> >>> 3119 instruct ReplB_reg(vec dst, rRegI src) %{ >>> 3120?? predicate((n->as_Vector()->length() <= 32) || >>> 3121???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); >>> >>> For ReplB_reg there's a shorter version: >>> >>> predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw()); >>> >>> >>> But do you find it easier to read? For example, when you are checking that all configurations are covered: >>> >>> predicate(n->as_Vector()->length() <= 32 || >>> ?????????? VM_Version::supports_avx512bw()); >>> >>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >>> ?? predicate(n->as_Vector()->length() == 64 && >>> ??????????? !VM_Version::supports_avx512bw()); >>> >>> vs >>> >>> instruct ReplB_reg_leg(legVec dst, rRegI src) %{ >>> ?? predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw()); >>> >>> instruct ReplB_reg(vec dst, rRegI src) %{ >>> ?? predicate((n->as_Vector()->length() <= 32) || >>> ???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); >> >> I think next conditions in predicates require comment about using avx512vlbw and avx512bw. >> >> !?? predicate((n->as_Vector()->length() <= 32 && VM_Version::supports_avx512vlbw()) || >> !???????????? (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw())); >> >>> >>> >>>> ?? ins_encode %{ >>>> ???? uint vlen = vector_length(this); >>>> ???? __ movdl($dst$$XMMRegister, $src$$Register); >>>> ???? __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister); >>>> ???? __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00); >>>> ???? if (vlen > 8) { >>>> ?????? __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); >>>> ?????? if (vlen > 16) { >>>> ???????? __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister); >>>> ???????? if (vlen > 32) { >>>> ?????????? assert(vlen == 64, "sanity"); >>>> ?????????? __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1); >>> >>> Yes, it should work as well. Do you find it easier to read though? >> >> Code is smaller. >> >>> >>>> Similar ReplB_imm_leg for which I don't see new implementation. >>> >>> Good catch. Added it back. >>> >>> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some >>> configuration isn't covered, _reg is used. But it was in original code, so I added it back.) >> >> I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl and movq without vpbroadcastb). >> >>> >>> Updated version: >>> ?? http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00 >> >> Should be webrev.01 >> >>> >>> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary: >>> >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/ >>> >>> The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need >>> AVX512VL for 512bit case, but Intel manual says AVX512F is enough. >>> >>> I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825. >> >> Yes, lets do it separately. >> >>> >>>> It should also simplify code for avx512 which one or 2 instructions. >>> >>> Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 >>> instructions (like ReplB_reg)? >> No, I was talking about cases when evpbroadcastb and vpbroadcastb instructions are used. I was think to have them in >> separate instructions. In your latest version it would be only evpbroadcastb case from ReplB_reg(). >> >> Thanks, >> Vladimir >> >>> >>>> Other types changes can be done same way. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8235825 >>>>> >>>>> Merge AD instructions for the following vector nodes: >>>>> ?? - ReplicateB, ..., ReplicateD >>>>> >>>>> Individual patches: >>>>> >>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual >>>>> >>>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX) >>>>> >>>>> Contributed-by: Jatin Bhateja >>>>> Reviewed-by: vlivanov, sviswanathan, ? >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov From john.r.rose at oracle.com Wed Dec 18 01:40:17 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 17 Dec 2019 17:40:17 -0800 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> Message-ID: Reviewed again. It reads well. I agree about future steps. ? John On Dec 16, 2019, at 2:29 PM, Vladimir Ivanov wrote: > > Updated version: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/ > > What's changed: > * Added more comments > * Fixed missing cases (Repl4B_imm() and Repl8B_imm) > * Double-checked that there are no other missing cases left: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt From vladimir.kozlov at oracle.com Wed Dec 18 02:40:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 18:40:08 -0800 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: Message-ID: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> CCing to Runtime group. For me the use of `_print_inlining_stream->~stringStream()` is not obvious. I would definitively miss to do that if I use stringStreams in some new code. May be someone can suggest some C++ trick to do that automatically. Thanks, Vladimir On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > Hi, > > I'm resending this with fixed bugId ... > Sorry! > > Best regards, > Goetz > >> Hi, >> >> PrintInlining and TraceLoopPredicate allocate stringStreams with new and >> relied on the fact that all memory used is on the ResourceArea cleaned >> after the compilation. >> >> Since 8224193 the char* of the stringStream is malloced and thus >> must be freed. No doing so manifests a memory leak. >> This is only relevant if the corresponding tracing is active. >> >> To fix TraceLoopPredicate I added the destructor call >> Fixing PrintInlining is a bit more complicated, as it uses several >> stringStreams. A row of them is in a GrowableArray which must >> be walked to free all of them. >> As the GrowableArray is on an arena no destructor is called for it. >> >> I also changed some as_string() calls to base() calls which reduced >> memory need of the traces, and added a comment explaining the >> constructor of GrowableArray that calls the copyconstructor for its >> elements. >> >> Please review: >> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/ >> >> Best regards, >> Goetz. > From vladimir.kozlov at oracle.com Wed Dec 18 03:18:35 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 19:18:35 -0800 Subject: [14] RFR (S) 8236000: VM build without C2 fails Message-ID: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8236000 C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues which were fixed too: # Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403 # assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked and # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118 # assert(DerivedPointerTable::is_active()) failed: Sanity Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci and tier1. Thanks, Vladimir From david.holmes at oracle.com Wed Dec 18 03:33:26 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Dec 2019 13:33:26 +1000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > CCing to Runtime group. > > For me the use of `_print_inlining_stream->~stringStream()` is not obvious. > I would definitively miss to do that if I use stringStreams in some new > code. But that is not a problem added by this changeset, the problem is that we're not deallocating these stringStreams even though we should be. If you use a stringStream in new code you have to manage its lifecycle. That said why is this: if (_print_inlining_stream != NULL) _print_inlining_stream->~stringStream(); not just: delete _print_inlining_stream; ? Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly calling the destructor rather than calling delete? Cheers, David > May be someone can suggest some C++ trick to do that automatically. > Thanks, > Vladimir > > On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I'm resending this with fixed bugId ... >> Sorry! >> >> Best regards, >> ?? Goetz >> >>> Hi, >>> >>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and >>> relied on the fact that all memory used is on the ResourceArea cleaned >>> after the compilation. >>> >>> Since 8224193 the char* of the stringStream is malloced and thus >>> must be freed. No doing so manifests a memory leak. >>> This is only relevant if the corresponding tracing is active. >>> >>> To fix TraceLoopPredicate I added the destructor call >>> Fixing PrintInlining is a bit more complicated, as it uses several >>> stringStreams. A row of them is in a GrowableArray which must >>> be walked to free all of them. >>> As the GrowableArray is on an arena no destructor is called for it. >>> >>> I also changed some as_string() calls to base() calls which reduced >>> memory need of the traces, and added a comment explaining the >>> constructor of GrowableArray that calls the copyconstructor for its >>> elements. >>> >>> Please review: >>> http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/01/ >>> >>> Best regards, >>> ?? Goetz. >> From vladimir.kozlov at oracle.com Wed Dec 18 03:38:38 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 19:38:38 -0800 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: <87o8w8ptsq.fsf@redhat.com> References: <87o8w8ptsq.fsf@redhat.com> Message-ID: Very nice! cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout? Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. Did you find some issues? Thanks, Vladimir On 12/16/19 12:18 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8231291/webrev.01/ > > As discussed before: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html > > Fully unrolling loops early helps EA. The change to cfgnode.cpp is > required because full unroll sometimes needs peeling which may add a phi > between a memory access and its AddP, a pattern that EA doesn't > recognize. > > Roland. > From kim.barrett at oracle.com Wed Dec 18 04:09:16 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Dec 2019 23:09:16 -0500 Subject: [14] RFR (S) 8236000: VM build without C2 fails In-Reply-To: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> Message-ID: > On Dec 17, 2019, at 10:18 PM, Vladimir Kozlov wrote: > > https://cr.openjdk.java.net/~kvn/8236000/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8236000 > > C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues which were fixed too: > > # Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403 > # assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked > > and > > # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118 > # assert(DerivedPointerTable::is_active()) failed: Sanity > > Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci > and tier1. > > Thanks, > Vladimir Looks good. From vladimir.kozlov at oracle.com Wed Dec 18 04:34:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Dec 2019 20:34:34 -0800 Subject: [14] RFR (S) 8236000: VM build without C2 fails In-Reply-To: References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> Message-ID: <3819390f-c0fe-5752-b52e-9f957ab04830@oracle.com> Thank you, Kim Vladimir On 12/17/19 8:09 PM, Kim Barrett wrote: >> On Dec 17, 2019, at 10:18 PM, Vladimir Kozlov wrote: >> >> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8236000 >> >> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 found other issues which were fixed too: >> >> # Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403 >> # assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked >> >> and >> >> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118 >> # assert(DerivedPointerTable::is_active()) failed: Sanity >> >> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci >> and tier1. >> >> Thanks, >> Vladimir > > Looks good. > From tobias.hartmann at oracle.com Wed Dec 18 06:59:29 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 18 Dec 2019 07:59:29 +0100 Subject: [14] RFR (S) 8236000: VM build without C2 fails In-Reply-To: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> Message-ID: Hi Vladimir, looks good to me. Best regards, Tobias On 18.12.19 04:18, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8236000/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8236000 > > C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 > found other issues which were fixed too: > > #? Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403 > #? assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked > > and > > # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118 > # assert(DerivedPointerTable::is_active()) failed: Sanity > > Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci > and tier1. > > Thanks, > Vladimir From tobias.hartmann at oracle.com Wed Dec 18 07:06:57 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 18 Dec 2019 08:06:57 +0100 Subject: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class In-Reply-To: <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com> References: <8d725705-27a2-8955-70de-d3ee248fc0a5@oracle.com> <8edd881e-259a-c479-b9ac-0afb99d8511e@oracle.com> Message-ID: Looks good to me too. I'll sponsor. Best regards, Tobias On 18.12.19 01:09, Vladimir Kozlov wrote: > This looks good to me. I will leave final review and sponsoring to Tobias. > > Thanks, > Vladimir > > On 12/17/19 10:58 AM, Bhateja, Jatin wrote: >> Hi Tobias, >> >> Please find updated patch at following link. >> >> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.03/ >> >> Thanks, >> Jatin >> >>> -----Original Message----- >>> From: Tobias Hartmann >>> Sent: Tuesday, December 17, 2019 1:10 PM >>> To: Bhateja, Jatin >>> Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >>> >>> Subject: Re: RFR(XXS): 8230185 - assert(is_Loop()) failed: invalid node class >>> >>> Hi Jatin, >>> >>> On 16.12.19 16:42, Bhateja, Jatin wrote: >>>> Please find below the updated patch with the test case. >>>> >>>> http://cr.openjdk.java.net/~jbhateja/8230185/webrev.02/ >>> >>> Thanks for adding the test. Some comments: >>> - We try to avoid bug ids in test names. I would suggest a more descriptive >>> name like "TestIrreducibleLoopWithVNNI". >>> - Please also add the test to package compiler.loopopts >>> - For Java code, we use 4 whitespace indentation >>> - The variable 'c' is not used in 'mainTest' >>> >>> Thanks, >>> Tobias From vladimir.x.ivanov at oracle.com Wed Dec 18 09:34:18 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 18 Dec 2019 12:34:18 +0300 Subject: [15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes In-Reply-To: References: <07bc55c8-22fe-d330-835d-79d47898d27e@oracle.com> <6705bf76-162e-fdd6-b771-df437fea46e0@oracle.com> <767b492f-f2f7-ba83-308f-f3d2950fe930@oracle.com> Message-ID: Thanks for the reviews, Vladimir and John. Best regards, Vladimir Ivanov On 18.12.2019 04:40, John Rose wrote: > Reviewed again. ?It reads well. ?I agree about future steps. ?? John > > On Dec 16, 2019, at 2:29 PM, Vladimir Ivanov > > wrote: >> >> Updated version: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/ >> >> What's changed: >> ?* Added more comments >> ?* Fixed missing cases (Repl4B_imm() and Repl8B_imm) >> ???* Double-checked that there are no other missing cases left: >> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.02/repl.txt > From vladimir.x.ivanov at oracle.com Wed Dec 18 10:02:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 18 Dec 2019 13:02:01 +0300 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: <87o8w8ptsq.fsf@redhat.com> References: <87o8w8ptsq.fsf@redhat.com> Message-ID: <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com> Hi Roland, > http://cr.openjdk.java.net/~roland/8231291/webrev.01/ As I understand, the intention of the change you propose is to perform complete loop unrolling earlier so EA can benefit from it. Some comments: src/hotspot/share/opto/loopnode.cpp: + if (mode == LoopOptsMaxUnroll) { + for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { + IdealLoopTree* lpt = iter.current(); + if (lpt->is_innermost() && lpt->_allow_optimizations && !lpt->_has_call && lpt->is_counted()) { + lpt->compute_trip_count(this); + if (!lpt->do_one_iteration_loop(this) && + !lpt->do_remove_empty_loop(this)) { + AutoNodeBudget node_budget(this); + if (lpt->policy_maximally_unroll(this)) { + memset( worklist.adr(), 0, worklist.Size()*sizeof(Node*) ); + do_maximally_unroll(lpt, worklist); + } + } + } + } It looks like LoopOptsMaxUnroll is a shortened version of IdealLoopTree::iteration_split/iteration_split_impl(). Have you considered factoring out the common code? Right now, its hard to correlate the checks for LoopOptsMaxUnroll with iteration_split() and there's a risk they'll diverge eventually. Do you need the following steps from the original version? ================================================ // Look for loop-exit tests with my 50/50 guesses from the Parsing stage. // Replace with a 1-in-10 exit guess. if (!is_root() && is_loop()) { adjust_loop_exit_prob(phase); } // Compute loop trip count from profile data compute_profile_trip_cnt(phase); No use of profiling data since full unrolling is happening anyway? ======================== if (!cl->is_valid_counted_loop()) return true; // Ignore various kinds of broken loops ======================== // Do nothing special to pre- and post- loops if (cl->is_pre_loop() || cl->is_post_loop()) return true; I assume there are no pre-/post-loops exist at that point, so these checks are redundant. Turn them into asserts? ======================== if (cl->is_normal_loop()) { if (policy_unswitching(phase)) { phase->do_unswitching(this, old_new); return true; } if (policy_maximally_unroll(phase)) { // Here we did some unrolling and peeling. Eventually we will // completely unroll this loop and it will no longer be a loop. phase->do_maximally_unroll(this, old_new); return true; } You don't perform loop unswitching at all. So, the order of operations changes. Do you see any problems with that? ======================== src/hotspot/share/opto/compile.cpp // Perform escape analysis if (_do_escape_analysis && ConnectionGraph::has_candidates(this)) { if (has_loops()) { // Cleanup graph (remove dead nodes). TracePhase tp("idealLoop", &timers[_t_idealLoop]); - PhaseIdealLoop::optimize(igvn, LoopOptsNone); + PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll); if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2); if (failing()) return; } ConnectionGraph::do_analysis(this, &igvn); Does it make sense to do more elaborate checks before performing early full loop unrolling? Like whether candidates are used inside loop bodies? Best regards, Vladimir Ivanov > > As discussed before: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html > > Fully unrolling loops early helps EA. The change to cfgnode.cpp is > required because full unroll sometimes needs peeling which may add a phi > between a memory access and its AddP, a pattern that EA doesn't > recognize. > > Roland. > From rkennke at redhat.com Wed Dec 18 12:40:48 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 18 Dec 2019 13:40:48 +0100 Subject: RFR: 8236181: C2: Remove useless step_over_gc_barrier() in int->bool conversion Message-ID: In cfgnode.cpp, in is_x2logic() that converts a diamond-shape if/else to simple bool patterns, we have a step_over_gc_barrier() at the end. This has been introduced by Shenandoah. I believe the intention was to convert obj vs null check to a simple boolean expression and eliminate the barrier on the unneeded path. However, it is not needed because Shenandoah we already eliminate barriers when the only user is a null-check, and it might actually be counter-productive if the barrier is needed on other paths, because it keeps the input of the barrier alive. This is probably a left-over from pre-LRB. Bug: https://bugs.openjdk.java.net/browse/JDK-8236181 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8236181/webrev.00/ Testing: hotspot_gc_shenandoah, submit-repo (in-progress) Can I please get a review? Thanks, Roman From christian.hagedorn at oracle.com Wed Dec 18 14:44:15 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 18 Dec 2019 15:44:15 +0100 Subject: RFR(S): 8235762: JVM crash in SWPointer during C2 compilation In-Reply-To: References: <4ef39906-839b-9a08-3225-f99b974f08de@oracle.com> Message-ID: <9b7da1b1-976d-7c57-7238-e7d38f30930c@oracle.com> Hi Felix > Yes, orig_msize is 0 for the test case in my webrev. > For the else case, I find it hard to manually create a test case for it. > Another choice is asserting that this else case never happens. I am not going that way as I haven't got a strong reason for that. Ok, maybe we could really just assert and bailout if the else case happens if you cannot find a test case which covers it? Might be best if someone else can comment on that, too, what to do in this case. Best regards, Christian >> Thanks for explaining. Following your analysis with the provided test case, >> orig_msize is 0 in the end. Can you also provide a test case or show an example >> which covers the else case in this test: >> >> 717 if (orig_msize == 0) { >> 718 best_align_to_mem_ref = >> memops.at(max_idx)->as_Mem(); >> 719 } else { >> 720 for (uint i = 0; i < orig_msize; i++) { >> 721 memops.remove(0); >> 722 } >> 723 best_align_to_mem_ref = find_align_to_ref(memops, >> max_idx); >> 724 assert(best_align_to_mem_ref == NULL, "sanity"); >> 725 best_align_to_mem_ref = >> memops.at(max_idx)->as_Mem(); >> 726 } From gromero at linux.vnet.ibm.com Wed Dec 18 15:34:25 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Dec 2019 12:34:25 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> Message-ID: <56adbff7-ed96-5e54-8899-06b0efef212b@linux.vnet.ibm.com> Hi Matthias, On 12/11/2019 11:31 AM, Baesken, Matthias wrote: > Hi Gustavo, thanks for posting this . > I put your change into our internal build+test queue . Thanks a lot for testing it and catching the fastdebug assert() error. I'll send a fix to it in v2 in conjunction to Martin's requests. > We currently do not have something like you described ( a P9 QEMU VM (emulation) with NVDIMM support ) in our test landscape, but > It does not hurt to have the patch in our builds/tests anyway .... The change must work the same as on a POWER9 LPAR w/ vPMEM support, as described in [1] (try to ignore the sales pitch heh). So in effect it works just like the POWER9 QEMU VM I've tested: on QEMU it's file-backed on the host side, whilst vPMEM it's DRAM/DIMM-backed on PowerVM side, and vPMEM is really performant since it's a real HW. Best regards, Gustavo [1] https://ibmsystemsmag.com/Power-Systems/8/2019/Delivering-Persistence-Performance From gromero at linux.vnet.ibm.com Wed Dec 18 15:45:36 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 18 Dec 2019 12:45:36 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> Message-ID: <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Hi Martin, On 12/11/2019 01:55 PM, Doerr, Martin wrote: > Hi Gustavo, > > thanks for implementing it. Unfortunately, we can't test it at the moment. Thanks a lot for the review. > I have a few change requests: > > > macroAssembler_ppc.cpp > I don't like silently emitting nothing in case !VM_Version::supports_data_cache_line_flush(). > If you want to check for it, I suggest to assert VM_Version::supports_data_cache_line_flush() and avoid generating the stub otherwise (stubGenerator_ppc). Fixed. > > ppc.ad > The predicates are redundant and should better get removed (useless evaluation). oh ... Fixed. > cacheWBPreSync could use cost 0 for clearity. (The costs don't have any effect because there is no choice for the matcher.) Fixed. > stubGenerator_ppc.cpp > I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that "bool true" is one Byte with the value 1 and the C calling convention enforces extension to 8 Byte. > I would have used andi_ + bne to be on the safe side, but I believe your version is ok. I decided for the safe side as you suggested :) > Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'. Sorry, it was a "thinko" when placing the comment. Indeed, the comment is wrong and the code is correct. Fixed. I've also fixed the assert() compilation error on fastdebug accordingly to Matthias' comments. Finally I tweaked a bit the 'format' strings in ppc.add to show a better output on +PrintAssembly. For instance, previously it would print something like: 090 B7: # out( B7 B8 ) <- in( B6 B7 ) Loop( B7-B7 inner ) Freq: 3.99733 090 MR R17, R15 // Long->Ptr 094 cache wb [R17] for the cache writeback. Now: 094 cache writeback, address = [R17] Please find v2 at: http://cr.openjdk.java.net/~gromero/8234599/v2/ Best regards, Gustavo From adityam at microsoft.com Wed Dec 18 16:44:26 2019 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Wed, 18 Dec 2019 16:44:26 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS Message-ID: Hi all, I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine. The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers. Bug: bugs.openjdk.java.net/browse/JDK-8236179 Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev The rest of this email contains more information about the issue and the analysis that led to the fix. Thank you, Aditya Mandaleeka ======== How this was found As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. ======== Analysis Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead): 00000261`9beaf4da 488b4978 mov rcx,qword ptr [rcx+78h] 00000261`9beaf4de 488b11 mov rdx,qword ptr [rcx] 00000261`9beaf4e1 488d09 lea rcx,[rcx] 00000261`9beaf4e4 48898c24b8010000 mov qword ptr [rsp+1B8h],rcx 00000261`9beaf4ec 488bca mov rcx,rdx 00000261`9beaf4ef 8b9424b8010000 mov edx,dword ptr [rsp+1B8h] 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0) 00000261`9beaf500 41ffd2 call r10 The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid. I found the following in the IR which seemed suspicious: wide_move [Base:[R653|M] Disp: 0|L] [R654|L] leal [Base:[R653|M] Disp: 0|L] [R655|L] move [R654|L] [rcx|L] move [R655|L] [rdx|I] rtcall ShenandoahRuntime::load_reference_barrier_native As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself. In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland! ======== Testing done Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch: tier1 tier2 hotspot_gc_shenandoah From vladimir.kozlov at oracle.com Wed Dec 18 17:40:50 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Dec 2019 09:40:50 -0800 Subject: [14] RFR (S) 8236000: VM build without C2 fails In-Reply-To: References: <41aa561d-56a7-a7f9-2460-f2d20b6721a1@oracle.com> Message-ID: Thank you, Tobias Vladimir On 12/17/19 10:59 PM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > On 18.12.19 04:18, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8236000/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8236000 >> >> C2 flags should be checked only when C2 is present. Running debug VM build with JVMCI but without C2 >> found other issues which were fixed too: >> >> #? Internal Error (src/hotspot/share/jvmci/jvmci_globals.cpp:121), pid=2398, tid=2403 >> #? assert(MaxVectorSizechecked) failed: MaxVectorSize flag not checked >> >> and >> >> # Internal Error (src/hotspot/share/gc/serial/genMarkSweep.cpp:98), pid=10108, tid=10118 >> # assert(DerivedPointerTable::is_active()) failed: Sanity >> >> Tested build with --with-jvm-features=-compiler2 and --with-jvm-features=-compiler2,-jvmci >> and tier1. >> >> Thanks, >> Vladimir From rkennke at redhat.com Wed Dec 18 18:02:21 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 18 Dec 2019 19:02:21 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: Message-ID: Thanks, Aditya! I'll sponsor this change for you, once you've got the necessary reviews. Thank you for your contribution! Roman > Hi all, > > I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine. > > The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers. > > Bug: bugs.openjdk.java.net/browse/JDK-8236179 > Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev > > The rest of this email contains more information about the issue and the analysis that led to the fix. > > Thank you, > Aditya Mandaleeka > > ======== > How this was found > > As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. > > ======== > Analysis > > Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead): > > 00000261`9beaf4da 488b4978 mov rcx,qword ptr [rcx+78h] > 00000261`9beaf4de 488b11 mov rdx,qword ptr [rcx] > 00000261`9beaf4e1 488d09 lea rcx,[rcx] > 00000261`9beaf4e4 48898c24b8010000 mov qword ptr [rsp+1B8h],rcx > 00000261`9beaf4ec 488bca mov rcx,rdx > 00000261`9beaf4ef 8b9424b8010000 mov edx,dword ptr [rsp+1B8h] > 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0) > 00000261`9beaf500 41ffd2 call r10 > > The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid. > > I found the following in the IR which seemed suspicious: > > wide_move [Base:[R653|M] Disp: 0|L] [R654|L] > leal [Base:[R653|M] Disp: 0|L] [R655|L] > move [R654|L] [rcx|L] > move [R655|L] [rdx|I] > rtcall ShenandoahRuntime::load_reference_barrier_native > > As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself. > > In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland! > > ======== > Testing done > > Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch: > tier1 > tier2 > hotspot_gc_shenandoah > From augustnagro at gmail.com Wed Dec 18 19:51:54 2019 From: augustnagro at gmail.com (August Nagro) Date: Wed, 18 Dec 2019 13:51:54 -0600 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> Message-ID: John, Apologies for the late response, school has been busy lately. I do like the idea of growing by a 'size table' or similar, especially if the growth rate decreases as the hashmap size increases. Since the probability of fragmentation increases with the number of resizes, bumping down the resize factor could be a good solution. To remove the bounds-checking in C2, I've taken a look at the subop class. I think I understand the change, but I'm wondering about the best way to handle a negative hash value. In fast range the hash & size must be less than 2^32, so that the result of their multiplication fits in a long. So one way is to do hash * (long) size, or similar. However, there's no unsigned multiply in Java, so if hash is negative it is widened to a negative long. One could use Integer::toUnsignedLong, or just directly `hash & 0xffffffffL`. However, this complicates the code shape. So I wonder if there is a better way. One option might be to make a Math::fastRange(int hash, int size) method that is a hotspot intrinsic and directly uses the unsigned instructions. Would appreciate some guidance on this. - August On Tue, Dec 3, 2019 at 12:06 AM John Rose wrote: > On Nov 18, 2019, at 12:17 PM, Florian Weimer wrote: > > > int bucket = fr(h * M); // M = 0x2357BD or something > > or maybe something fast and sloppy like: > > int bucket = fr(h + (h << 8)); > > > Surely this one works, since fr is the final operation. > The shift/add is just a mixing step to precondition the input. > > Just for the record, I?d like to keep brainstorming a bit more, > though surely this sort of thing is of limited interest. So, > just a little more from me on this. > > If we had BITR I?d want to try something like fr(h - bitr(h)). > > But what I keep wishing for a good one- or two-cycle instruction > that will mix all input bits into all the output bits, so that any > change in one input bit is likely to cause cascading changes in > many output bits, perhaps even 50% on average. A pair of AES > steps is a good example of this. I think AES would be superior to > multiply (for mixing) when used on 128 bit payloads or larger, so > it looks appealing (to me) for vectorizable hashing applications. > Though it is overkill on scalars, I think it points in promising > directions for scalars also. > > > or even: > > int bucket = fr(h) ^ (h & (N-1)); > > Does this really work? I don't think so. > > > Oops, you are right. Something like it might work, though. > > The idea, on paper, is that h & (N-1) is less than N, for any N >=1. > And if N-1 has a high enough pop-count the information content is close to > 50% of h (though maybe 50% of the bits are masked out). The xor of two > quasi-independent values both less than N is, well, less than 2^(ceil lg > N), > not N, which is a bug. Oops. There are ways to quickly combine two values > less than N and reduce the result to less than N: You do a conditional > subtract of N if the sum is >= N. > > So the tactical area I?m trying to explore here is to take two reduced > hashes developed in parallel, which depend on different bits of the > input, and combine them into a single stronger hash (not by ^). > > Maybe (I can?t resist hacking at it some more): > > int h1 = H1(h), h2 = H2(h); > int bucket = CCA(h1 - h2, N); > // where H1 := fr, H2(h) := (h & (N-1)) > // where CCA(x, N) := x + ((x >> 31) & N) // Conditional Compensating Add > > In this framework, a better H2 for favoring the low bits of h might be > H2(h) := ((1< the number of low bits of h that feed into the final bucket selection, > while fr (H1) arguably maximizes the number of influential high bits. > > I think this kind of perturbation is quite expensive. Arm's BITR should > be helpful here. > > > Yes, BITR is a helpful building block. If I understand you correctly, it > needs to be combined with other operations, such as multiply, shift, xor, > etc., > and can overcome biases towards high bits or towards low bits that come > from the simple arithmetic definitions of the other mixing operations. > > The hack with CCA(h1 - h2, N) seems competitive with a BITR-based > mixing step, since H2 can be very simple. > > A scalar variant of two AES steps (with xor of a second register or > constant > parameter at both stages) would be a better building block for strongly > mixing bits. > Or some other shallow linear network with a layer of non-linear S-boxes. > > But even though this operation is commonly needed and > easily implemented in hardware, it's rarely found in CPUs. > > > Yep; the whole cottage industry of building clever mixing functions > out of hand calculator functions could be sidelined if CPUs gave us good > cheap mixing primitives out of the box. The crypto literature is full of > them, and many are designed to be easy to implement in silicon. > > ? John > > P.S. I mention AES because I?m familiar with that bit of crypto tech, and > also because I actually tried it out once on the smhasher quality > benchmark. > No surprise in hindsight; it passes the quality tests with just two rounds. > Given that it is as cheap as multiplication, and handles twice as many > bits at a time, but requires two steps for full mixing, it would seem to > be competitive with multiplication as a mixing step. It has no built-in > biases towards high or low bits, so that?s an advantage over > multiplication. > > Why two rounds? The one-round version has flaws, as a hash function, > which are obvious on inspection of the simple structure of an AES round. > Not every output bit is data-dependent on every input bit of one round, > but two rounds swirls them all together. Are back-to-back AES rounds > expensive? Maybe, although that?s how the instructions are designed to > be used, about 10 of them back to back to do real crypto. > > From rkennke at redhat.com Wed Dec 18 23:03:30 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 19 Dec 2019 00:03:30 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: Message-ID: <93a1525a-5560-c9c7-e0ba-be3131d2dda3@redhat.com> Hi all, Testing via jdk/submit returned with PASSED. (btw, your URLs lack the http:// makes it slightler harder to follow them) Thanks, Roman On 12/18/19 5:44 PM, Aditya Mandaleeka wrote: > Hi all, > > I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine. > > The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers. > > Bug: bugs.openjdk.java.net/browse/JDK-8236179 > Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev > > The rest of this email contains more information about the issue and the analysis that led to the fix. > > Thank you, > Aditya Mandaleeka > > ======== > How this was found > > As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. > > ======== > Analysis > > Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead): > > 00000261`9beaf4da 488b4978 mov rcx,qword ptr [rcx+78h] > 00000261`9beaf4de 488b11 mov rdx,qword ptr [rcx] > 00000261`9beaf4e1 488d09 lea rcx,[rcx] > 00000261`9beaf4e4 48898c24b8010000 mov qword ptr [rsp+1B8h],rcx > 00000261`9beaf4ec 488bca mov rcx,rdx > 00000261`9beaf4ef 8b9424b8010000 mov edx,dword ptr [rsp+1B8h] > 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0) > 00000261`9beaf500 41ffd2 call r10 > > The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid. > > I found the following in the IR which seemed suspicious: > > wide_move [Base:[R653|M] Disp: 0|L] [R654|L] > leal [Base:[R653|M] Disp: 0|L] [R655|L] > move [R654|L] [rcx|L] > move [R655|L] [rdx|I] > rtcall ShenandoahRuntime::load_reference_barrier_native > > As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself. > > In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland! > > ======== > Testing done > > Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch: > tier1 > tier2 > hotspot_gc_shenandoah > From david.holmes at oracle.com Thu Dec 19 02:11:59 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 Dec 2019 12:11:59 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> Message-ID: <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Hi Richard, I think my issue is with the way EliminateNestedLocks works so I'm going to look into that more deeply. Thanks for the explanations. David On 18/12/2019 12:47 am, Reingruber, Richard wrote: > Hi David, > > > > > Some further queries/concerns: > > > > > > > > src/hotspot/share/runtime/objectMonitor.cpp > > > > > > > > Can you please explain the changes to ObjectMonitor::wait: > > > > > > > > ! _recursions = save // restore the old recursion count > > > > ! + jt->get_and_reset_relock_count_after_wait(); // > > > > increased by the deferred relock count > > > > > > > > what is the "deferred relock count"? I gather it relates to > > > > > > > > "The code was extended to be able to deoptimize objects of a > > > frame that > > > > is not the top frame and to let another thread than the owning > > > thread do > > > > it." > > > > > > Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is > > > replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated > > > locking. New with the enhancement is that we do this also just before object references are > > > acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we > > > register deoptimized objects as deferred updates. When control returns to C it gets deoptimized, > > > we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again > > > (relocking twice would be incorrect of course). Deferred updates are copied into the new > > > interpreter frames. > > > > > > Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to > > > be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking > > > is deferred until T owns the monitor again. This is what the piece of code above does. > > > > Sorry I need some more detail here. How can you wait() on an object > > monitor if the object allocation and/or locking was optimised away? And > > what is a "non-local object" in this context? Isn't EA restricted to > > thread-confined objects? > > "Non-local object" is an object that escapes its thread. The issue I'm addressing with the changes > in ObjectMonitor::wait are almost unrelated to EA. They are caused by EliminateNestedLocks, where C2 > eliminates recursive locking of an already owned lock. The lock owning object exists on the heap, it > is locked and you can call wait() on it. > > EliminateLocks is the C2 option that controls lock elimination based on EA. Both optimizations have > in common that objects with eliminated locking need to be relocked when deoptimizing a frame, > i.e. when replacing a compiled frame with equivalent interpreter > frames. Deoptimization::relock_objects does that job for /all/ eliminated locks in scope. /All/ can > be a mix of eliminated nested locks and locks of not-escaping objects. > > New with the enhancement: I call relock_objects earlier, just before objects pontentially > escape. But then later when the owning compiled frame gets deoptimized, I must not do it again: > > See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp: > > 373 if ((jvmci_enabled || ((DoEscapeAnalysis || EliminateNestedLocks) && EliminateLocks)) > 374 && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { > 375 bool unused; > 376 eliminate_locks(thread, chunk, realloc_failures, deoptee, exec_mode, unused); > 377 } > > Now when calling relock_objects early it is quiet possible that I have to relock an object the > target thread currently waits for. Obviously I cannot relock in this case, instead I chose to > introduce relock_count_after_wait to JavaThread. > > > Is it just that some of the locking gets optimized away e.g. > > > > synchronised(obj) { > > synchronised(obj) { > > synchronised(obj) { > > obj.wait(); > > } > > } > > } > > > > If this is reduced to a form as-if it were a single lock of the monitor > > (due to EA) and the wait() triggers a JVM TI event which leads to the > > escape of "obj" then we need to reconstruct the true lock state, and so > > when the wait() internally unblocks and reacquires the monitor it has to > > set the true recursion count to 3, not the 1 that it appeared to be when > > wait() was initially called. Is that the scenario? > > Kind of... except that the locking is not eliminated due to EA and there is no JVM TI event > triggered by wait. > > Add > > LocalObject l1 = new LocalObject(); > > in front of the synchrnized blocks and assume a JVM TI agent acquires l1. This triggers the code in > question. > > See that relocking/reallocating is transactional. If it is done then for /all/ objects in scope and it is > done at most once. It wouldn't be quite so easy to split this in relocking of nested/EA-based > eliminated locks. > > > If so I find this truly awful. Anyone using wait() in a realistic form > > requires a notification and so the object cannot be thread confined. In > > It is not thread confined. > > > which case I would strongly argue that upon hitting the wait() the deopt > > should occur unconditionally and so the lock state is correct before we > > wait and so we don't need to mess with the recursion count internally > > when we reacquire the monitor. > > > > > > > > > which I don't like the sound of at all when it comes to ObjectMonitor > > > > state. So I'd like to understand in detail exactly what is going on here > > > > and why. This is a very intrusive change that seems to badly break > > > > encapsulation and impacts future changes to ObjectMonitor that are under > > > > investigation. > > > > > > I would not regard this as breaking encapsulation. Certainly not badly. > > > > > > I've added a property relock_count_after_wait to JavaThread. The property is well > > > encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free > > > in choosing a way to do that as long as that property is taken into account. This is hardly a > > > limitation. > > > > I do think this badly breaks encapsulation as you have to add a callout > > from the guts of the ObjectMonitor code to reach into the thread to get > > this lock count adjustment. I understand why you have had to do this but > > I would much rather see a change to the EA optimisation strategy so that > > this is not needed. > > > > > Note also that the property is a straight forward extension of the existing concept of deferred > > > local updates. It is embedded into the structure holding them. So not even the footprint of a > > > JavaThread is enlarged if no deferred updates are generated. > > > > [...] > > > > > > > > I'm actually duplicating the existing external suspend mechanism, because a thread can be > > > suspended at most once. And hey, and don't like that either! But it seems not unlikely that the > > > duplicate can be removed together with the original and the new type of handshakes that will be > > > used for thread suspend can be used for object deoptimization too. See today's discussion in > > > JDK-8227745 [2]. > > > > I hope that discussion bears some fruit, at the moment it seems not to > > be possible to use handshakes here. :( > > > > The external suspend mechanism is a royal pain in the proverbial that we > > have to carefully live with. The idea that we're duplicating that for > > use in another fringe area of functionality does not thrill me at all. > > > > To be clear, I understand the problem that exists and that you wish to > > solve, but for the runtime parts I balk at the complexity cost of > > solving it. > > I know it's complex, but by far no rocket science. > > Also I find it hard to imagine another fix for JDK-8233915 besides changing the JVM TI specification. > > Thanks, Richard. > > -----Original Message----- > From: David Holmes > Sent: Dienstag, 17. Dezember 2019 08:03 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Vladimir Kozlov (vladimir.kozlov at oracle.com) > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > > > David > > On 17/12/2019 4:57 pm, David Holmes wrote: >> Hi Richard, >> >> On 14/12/2019 5:01 am, Reingruber, Richard wrote: >>> Hi David, >>> >>> ?? > Some further queries/concerns: >>> ?? > >>> ?? > src/hotspot/share/runtime/objectMonitor.cpp >>> ?? > >>> ?? > Can you please explain the changes to ObjectMonitor::wait: >>> ?? > >>> ?? > !?? _recursions = save????? // restore the old recursion count >>> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >>> ?? > increased by the deferred relock count >>> ?? > >>> ?? > what is the "deferred relock count"? I gather it relates to >>> ?? > >>> ?? > "The code was extended to be able to deoptimize objects of a >>> frame that >>> ?? > is not the top frame and to let another thread than the owning >>> thread do >>> ?? > it." >>> >>> Yes, these relate. Currently EA based optimizations are reverted, when >>> a compiled frame is replaced >>> with corresponding interpreter frames. Part of this is relocking >>> objects with eliminated >>> locking. New with the enhancement is that we do this also just before >>> object references are acquired >>> through JVMTI. In this case we deoptimize also the owning compiled >>> frame C and we register >>> deoptimized objects as deferred updates. When control returns to C it >>> gets deoptimized, we notice >>> that objects are already deoptimized (reallocated and relocked), so we >>> don't do it again (relocking >>> twice would be incorrect of course). Deferred updates are copied into >>> the new interpreter frames. >>> >>> Problem: relocking is not possible if the target thread T is waiting >>> on the monitor that needs to be >>> relocked. This happens only with non-local objects with >>> EliminateNestedLocks. Instead relocking is >>> deferred until T owns the monitor again. This is what the piece of >>> code above does. >> >> Sorry I need some more detail here. How can you wait() on an object >> monitor if the object allocation and/or locking was optimised away? And >> what is a "non-local object" in this context? Isn't EA restricted to >> thread-confined objects? >> >> Is it just that some of the locking gets optimized away e.g. >> >> synchronised(obj) { >> ? synchronised(obj) { >> ??? synchronised(obj) { >> ????? obj.wait(); >> ??? } >> ? } >> } >> >> If this is reduced to a form as-if it were a single lock of the monitor >> (due to EA) and the wait() triggers a JVM TI event which leads to the >> escape of "obj" then we need to reconstruct the true lock state, and so >> when the wait() internally unblocks and reacquires the monitor it has to >> set the true recursion count to 3, not the 1 that it appeared to be when >> wait() was initially called. Is that the scenario? >> >> If so I find this truly awful. Anyone using wait() in a realistic form >> requires a notification and so the object cannot be thread confined. In >> which case I would strongly argue that upon hitting the wait() the deopt >> should occur unconditionally and so the lock state is correct before we >> wait and so we don't need to mess with the recursion count internally >> when we reacquire the monitor. >> >>> >>> ?? > which I don't like the sound of at all when it comes to >>> ObjectMonitor >>> ?? > state. So I'd like to understand in detail exactly what is going >>> on here >>> ?? > and why.? This is a very intrusive change that seems to badly break >>> ?? > encapsulation and impacts future changes to ObjectMonitor that >>> are under >>> ?? > investigation. >>> >>> I would not regard this as breaking encapsulation. Certainly not badly. >>> >>> I've added a property relock_count_after_wait to JavaThread. The >>> property is well >>> encapsulated. Future ObjectMonitor implementations have to deal with >>> recursion too. They are free in >>> choosing a way to do that as long as that property is taken into >>> account. This is hardly a >>> limitation. >> >> I do think this badly breaks encapsulation as you have to add a callout >> from the guts of the ObjectMonitor code to reach into the thread to get >> this lock count adjustment. I understand why you have had to do this but >> I would much rather see a change to the EA optimisation strategy so that >> this is not needed. >> >>> Note also that the property is a straight forward extension of the >>> existing concept of deferred >>> local updates. It is embedded into the structure holding them. So not >>> even the footprint of a >>> JavaThread is enlarged if no deferred updates are generated. >>> >>> ?? > --- >>> ?? > >>> ?? > src/hotspot/share/runtime/thread.cpp >>> ?? > >>> ?? > Can you please explain why >>> JavaThread::wait_for_object_deoptimization >>> ?? > has to be handcrafted in this way rather than using proper >>> transitions. >>> ?? > >>> >>> I wrote wait_for_object_deoptimization taking >>> JavaThread::java_suspend_self_with_safepoint_check >>> as template. So in short: for the same reasons :) >>> >>> Threads reach both methods as part of thread state transitions, >>> therefore special handling is >>> required to change thread state on top of ongoing transitions. >>> >>> ?? > We got rid of "deopt suspend" some time ago and it is disturbing >>> to see >>> ?? > it being added back (effectively). This seems like it may be >>> something >>> ?? > that handshakes could be used for. >>> >>> Deopt suspend used to be something rather different with a similar >>> name[1]. It is not being added back. >> >> I stand corrected. Despite comments in the code to the contrary >> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of >> cleanup in this area 13 years ago :) >> >>> >>> I'm actually duplicating the existing external suspend mechanism, >>> because a thread can be suspended >>> at most once. And hey, and don't like that either! But it seems not >>> unlikely that the duplicate can >>> be removed together with the original and the new type of handshakes >>> that will be used for >>> thread suspend can be used for object deoptimization too. See today's >>> discussion in JDK-8227745 [2]. >> >> I hope that discussion bears some fruit, at the moment it seems not to >> be possible to use handshakes here. :( >> >> The external suspend mechanism is a royal pain in the proverbial that we >> have to carefully live with. The idea that we're duplicating that for >> use in another fringe area of functionality does not thrill me at all. >> >> To be clear, I understand the problem that exists and that you wish to >> solve, but for the runtime parts I balk at the complexity cost of >> solving it. >> >> Thanks, >> David >> ----- >> >>> Thanks, Richard. >>> >>> [1] Deopt suspend was something like an async. handshake for >>> architectures with register windows, >>> ???? where patching the return pc for deoptimization of a compiled >>> frame was racy if the owner thread >>> ???? was in native code. Instead a "deopt" suspend flag was set on >>> which the thread patched its own >>> ???? frame upon return from native. So no thread was suspended. It got >>> its name only from the name of >>> ???? the flags. >>> >>> [2] Discussion about using handshakes to sync. with the target thread: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727 >>> >>> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Freitag, 13. Dezember 2019 00:56 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net; >>> hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> Some further queries/concerns: >>> >>> src/hotspot/share/runtime/objectMonitor.cpp >>> >>> Can you please explain the changes to ObjectMonitor::wait: >>> >>> !?? _recursions = save????? // restore the old recursion count >>> !???????????????? + jt->get_and_reset_relock_count_after_wait(); // >>> increased by the deferred relock count >>> >>> what is the "deferred relock count"? I gather it relates to >>> >>> "The code was extended to be able to deoptimize objects of a frame that >>> is not the top frame and to let another thread than the owning thread do >>> it." >>> >>> which I don't like the sound of at all when it comes to ObjectMonitor >>> state. So I'd like to understand in detail exactly what is going on here >>> and why.? This is a very intrusive change that seems to badly break >>> encapsulation and impacts future changes to ObjectMonitor that are under >>> investigation. >>> >>> --- >>> >>> src/hotspot/share/runtime/thread.cpp >>> >>> Can you please explain why JavaThread::wait_for_object_deoptimization >>> has to be handcrafted in this way rather than using proper transitions. >>> >>> We got rid of "deopt suspend" some time ago and it is disturbing to see >>> it being added back (effectively). This seems like it may be something >>> that handshakes could be used for. >>> >>> Thanks, >>> David >>> ----- >>> >>> On 12/12/2019 7:02 am, David Holmes wrote: >>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote: >>>>> Hi David, >>>>> >>>>> ??? > Most of the details here are in areas I can comment on in detail, >>>>> but I >>>>> ??? > did take an initial general look at things. >>>>> >>>>> Thanks for taking the time! >>>> >>>> Apologies the above should read: >>>> >>>> "Most of the details here are in areas I *can't* comment on in detail >>>> ..." >>>> >>>> David >>>> >>>>> ??? > The only thing that jumped out at me is that I think the >>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. >>>>> ??? > >>>>> ??? > +? bool is_hidden_from_external_view() const { return true; } >>>>> >>>>> Yes, it should. Will add the method like above. >>>>> >>>>> ??? > Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>> Without >>>>> ??? > active testing this will just bit-rot. >>>>> >>>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>>> workload. I will add a minimal test >>>>> to keep it fresh. >>>>> >>>>> ??? > Also on the tests I don't understand your @requires clause: >>>>> ??? > >>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>> ??? > (vm.opt.TieredCompilation != true)) >>>>> ??? > >>>>> ??? > This seems to require that TieredCompilation is disabled, but >>>>> tiered is >>>>> ??? > our normal mode of operation. ?? >>>>> ??? > >>>>> >>>>> I removed the clause. I guess I wanted to target the tests towards the >>>>> code they are supposed to >>>>> test, and it's easier to analyze failures w/o tiered compilation and >>>>> with just one compiler thread. >>>>> >>>>> Additionally I will make use of >>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>>> To: Reingruber, Richard ; >>>>> serviceability-dev at openjdk.java.net; >>>>> hotspot-compiler-dev at openjdk.java.net; >>>>> hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>> Performance in the Presence of JVMTI Agents >>>>> >>>>> Hi Richard, >>>>> >>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>>> Hi, >>>>>> >>>>>> I would like to get reviews please for >>>>>> >>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>>> >>>>>> Corresponding RFE: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>>> >>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] >>>>>> >>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without >>>>>> issues (thanks!). In addition the >>>>>> change is being tested at SAP since I posted the first RFR some >>>>>> months ago. >>>>>> >>>>>> The intention of this enhancement is to benefit performance wise from >>>>>> escape analysis even if JVMTI >>>>>> agents request capabilities that allow them to access local variable >>>>>> values. E.g. if you start-up >>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then >>>>>> escape analysis is disabled right >>>>>> from the beginning, well before a debugger attaches -- if ever one >>>>>> should do so. With the >>>>>> enhancement, escape analysis will remain enabled until and after a >>>>>> debugger attaches. EA based >>>>>> optimizations are reverted just before an agent acquires the >>>>>> reference to an object. In the JBS item >>>>>> you'll find more details. >>>>> >>>>> Most of the details here are in areas I can comment on in detail, but I >>>>> did take an initial general look at things. >>>>> >>>>> The only thing that jumped out at me is that I think the >>>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>>> >>>>> +? bool is_hidden_from_external_view() const { return true; } >>>>> >>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>> Without >>>>> active testing this will just bit-rot. >>>>> >>>>> Also on the tests I don't understand your @requires clause: >>>>> >>>>> ??? @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled & >>>>> (vm.opt.TieredCompilation != true)) >>>>> >>>>> This seems to require that TieredCompilation is disabled, but tiered is >>>>> our normal mode of operation. ?? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch >>>>>> >>>>>> >>>>>> From smita.kamath at intel.com Thu Dec 19 02:33:08 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Thu, 19 Dec 2019 02:33:08 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> Message-ID: Hi Vladimir, I have made the code changes you suggested (please look at the email below). I have also enabled the intrinsic to run only when VBMI2 feature is available. The intrinsic shows gains of >1.5x above 4k bit BigInteger. Webrev link: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ Thanks, Smita -----Original Message----- From: Vladimir Kozlov Sent: Wednesday, December 11, 2019 10:55 AM To: Kamath, Smita ; 'hotspot compiler' ; Viswanathan, Sandhya Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 Hi Kamath, First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. Smita >>>done I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. Smita >> removed UseVBMI2 flag In vm_version_x86.cpp you need to add more %s in print statement for new output. Smita >>> done You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. Smita >>> done You need to add intrinsics to Graal's test to ignore them: http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 Smita >>>done Thanks, Vladimir On 12/10/19 5:41 PM, Kamath, Smita wrote: > Hi, > > > As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. > > Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 > > Link to webrev : > http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ > > > > I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. > > > [1] > https://software.intel.com/sites/default/files/managed/39/c5/325462-sd > m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. > 2C 5-471) > > [2] > https://software.intel.com/en-us/articles/intel-software-development-e > mulator > > > Regards, > > Smita Kamath > From goetz.lindenmaier at sap.com Thu Dec 19 09:27:27 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 19 Dec 2019 09:27:27 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: Hi David, Vladimir, stringStream is a ResourceObj, thus it lives on an arena. This is uncritical, as it does not resize. 8224193 only changed the allocation of the internal char*, which always caused problems with resizing under ResourceMarks that were not placed for the string but to free other memory. Thus stringStream must not be deallocated, and also there was no mem leak before that change. But we need to call the destructor to free the char*. Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev > On Behalf Of David Holmes > Sent: Mittwoch, 18. Dezember 2019 04:33 > To: Vladimir Kozlov ; hotspot-compiler- > dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > '8224193: stringStream should not use Resouce Area'. > > On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > > CCing to Runtime group. > > > > For me the use of `_print_inlining_stream->~stringStream()` is not obvious. > > I would definitively miss to do that if I use stringStreams in some new > > code. > > But that is not a problem added by this changeset, the problem is that > we're not deallocating these stringStreams even though we should be. If > you use a stringStream in new code you have to manage its lifecycle. > > That said why is this: > > if (_print_inlining_stream != NULL) > _print_inlining_stream->~stringStream(); > > not just: > > delete _print_inlining_stream; > > ? > > Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly > calling the destructor rather than calling delete? > > Cheers, > David > > > May be someone can suggest some C++ trick to do that automatically. > > Thanks, > > Vladimir > > > > On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > >> Hi, > >> > >> I'm resending this with fixed bugId ... > >> Sorry! > >> > >> Best regards, > >> ?? Goetz > >> > >>> Hi, > >>> > >>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and > >>> relied on the fact that all memory used is on the ResourceArea cleaned > >>> after the compilation. > >>> > >>> Since 8224193 the char* of the stringStream is malloced and thus > >>> must be freed. No doing so manifests a memory leak. > >>> This is only relevant if the corresponding tracing is active. > >>> > >>> To fix TraceLoopPredicate I added the destructor call > >>> Fixing PrintInlining is a bit more complicated, as it uses several > >>> stringStreams. A row of them is in a GrowableArray which must > >>> be walked to free all of them. > >>> As the GrowableArray is on an arena no destructor is called for it. > >>> > >>> I also changed some as_string() calls to base() calls which reduced > >>> memory need of the traces, and added a comment explaining the > >>> constructor of GrowableArray that calls the copyconstructor for its > >>> elements. > >>> > >>> Please review: > >>> http://cr.openjdk.java.net/~goetz/wr19/8235998- > c2_tracing_mem_leak/01/ > >>> > >>> Best regards, > >>> ?? Goetz. > >> From aph at redhat.com Thu Dec 19 10:31:44 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 19 Dec 2019 11:31:44 +0100 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> Message-ID: <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com> On 12/3/19 6:06 AM, John Rose wrote: > But what I keep wishing for a good one- or two-cycle instruction > that will mix all input bits into all the output bits, so that any > change in one input bit is likely to cause cascading changes in > many output bits, perhaps even 50% on average. A pair of AES > steps is a good example of this. I've searched for something like this too. However, I experimented with two rounds of AES and I didn't get very good results. From what I remember, it took at least three or four rounds to get decent mixing, let alone full avalanche. Also, the latency of AES instructions tends to be a few cycles and the latency of moving data from integer to vector registers is as many as five cycles. So I gave up. This was a while ago, so probably badly remembered... -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Thu Dec 19 10:37:39 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 19 Dec 2019 10:37:39 +0000 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for the update. Looks good. Please remove the whitespaces between the instructions and '(' in generate_data_cache_writeback_sync() before pushing. Marked here by 'X': + __ andi_X(temp, is_presync, 1); + __ bneX(CCR0, SKIP); + __ cache_wbsync(false); // post sync => emit 'sync' + __ bindX(SKIP); // pre sync => emit nothing Best regards, Martin > -----Original Message----- > From: Gustavo Romero > Sent: Mittwoch, 18. Dezember 2019 16:46 > To: Doerr, Martin ; Baesken, Matthias > > Cc: Andrew Dinn ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux > for JEP-352 > > Hi Martin, > > On 12/11/2019 01:55 PM, Doerr, Martin wrote: > > Hi Gustavo, > > > > thanks for implementing it. Unfortunately, we can't test it at the moment. > > Thanks a lot for the review. > > > > I have a few change requests: > > > > > > macroAssembler_ppc.cpp > > I don't like silently emitting nothing in case > !VM_Version::supports_data_cache_line_flush(). > > If you want to check for it, I suggest to assert > VM_Version::supports_data_cache_line_flush() and avoid generating the > stub otherwise (stubGenerator_ppc). > > Fixed. > > > > > > ppc.ad > > The predicates are redundant and should better get removed (useless > evaluation). > > oh ... Fixed. > > > > cacheWBPreSync could use cost 0 for clearity. (The costs don't have any > effect because there is no choice for the matcher.) > > Fixed. > > > > stubGenerator_ppc.cpp > > I think checking cmpwi(... is_presync, 1) is ok because the ABI specifies that > "bool true" is one Byte with the value 1 and the C calling convention enforces > extension to 8 Byte. > > I would have used andi_ + bne to be on the safe side, but I believe your > version is ok. > > I decided for the safe side as you suggested :) > > > > Comment "// post sync => emit 'lwsync'" is wrong. We use 'sync'. > > Sorry, it was a "thinko" when placing the comment. Indeed, the comment is > wrong > and the code is correct. Fixed. > > I've also fixed the assert() compilation error on fastdebug accordingly to > Matthias' comments. > > Finally I tweaked a bit the 'format' strings in ppc.add to show a better output > on +PrintAssembly. For instance, previously it would print something like: > > 090 B7: # out( B7 B8 ) <- in( B6 B7 ) Loop( B7-B7 inner ) Freq: 3.99733 > 090 MR R17, R15 // Long->Ptr > 094 cache wb [R17] > > for the cache writeback. Now: > > 094 cache writeback, address = [R17] > > > Please find v2 at: > > http://cr.openjdk.java.net/~gromero/8234599/v2/ > > > Best regards, > Gustavo From david.holmes at oracle.com Thu Dec 19 10:38:41 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 Dec 2019 20:38:41 +1000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: > Hi David, Vladimir, > > stringStream is a ResourceObj, thus it lives on an arena. > This is uncritical, as it does not resize. > 8224193 only changed the allocation of the internal char*, > which always caused problems with resizing under > ResourceMarks that were not placed for the string but to > free other memory. > Thus stringStream must not be deallocated, and > also there was no mem leak before that change. > But we need to call the destructor to free the char*. I think we have a confusing mix of arena and C_heap usage with stringStream. Not clear to me why stringStream remains a resourceObj now? In many cases the stringStream is just local on the stack. In other cases if it is new'd then it should be C-heap same as the array and then you could delete it too. What you have may suffice to initially address the leak but I think this whole thing needs revisiting. Thanks, David > > Best regards, > Goetz. > >> -----Original Message----- >> From: hotspot-runtime-dev >> On Behalf Of David Holmes >> Sent: Mittwoch, 18. Dezember 2019 04:33 >> To: Vladimir Kozlov ; hotspot-compiler- >> dev at openjdk.java.net >> Cc: hotspot-runtime-dev at openjdk.java.net runtime > dev at openjdk.java.net> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >> '8224193: stringStream should not use Resouce Area'. >> >> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: >>> CCing to Runtime group. >>> >>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. >>> I would definitively miss to do that if I use stringStreams in some new >>> code. >> >> But that is not a problem added by this changeset, the problem is that >> we're not deallocating these stringStreams even though we should be. If >> you use a stringStream in new code you have to manage its lifecycle. >> >> That said why is this: >> >> if (_print_inlining_stream != NULL) >> _print_inlining_stream->~stringStream(); >> >> not just: >> >> delete _print_inlining_stream; >> >> ? >> >> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly >> calling the destructor rather than calling delete? >> >> Cheers, >> David >> >>> May be someone can suggest some C++ trick to do that automatically. >>> Thanks, >>> Vladimir >>> >>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> I'm resending this with fixed bugId ... >>>> Sorry! >>>> >>>> Best regards, >>>> ?? Goetz >>>> >>>>> Hi, >>>>> >>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new and >>>>> relied on the fact that all memory used is on the ResourceArea cleaned >>>>> after the compilation. >>>>> >>>>> Since 8224193 the char* of the stringStream is malloced and thus >>>>> must be freed. No doing so manifests a memory leak. >>>>> This is only relevant if the corresponding tracing is active. >>>>> >>>>> To fix TraceLoopPredicate I added the destructor call >>>>> Fixing PrintInlining is a bit more complicated, as it uses several >>>>> stringStreams. A row of them is in a GrowableArray which must >>>>> be walked to free all of them. >>>>> As the GrowableArray is on an arena no destructor is called for it. >>>>> >>>>> I also changed some as_string() calls to base() calls which reduced >>>>> memory need of the traces, and added a comment explaining the >>>>> constructor of GrowableArray that calls the copyconstructor for its >>>>> elements. >>>>> >>>>> Please review: >>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- >> c2_tracing_mem_leak/01/ >>>>> >>>>> Best regards, >>>>> ?? Goetz. >>>> From goetz.lindenmaier at sap.com Thu Dec 19 11:35:54 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 19 Dec 2019 11:35:54 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: Hi, yes, it is confusing that parts are on the arena, other parts are allocated in the C-heap. But usages which allocate the stringStream with new() are rare, usually it's allocated on the stack making all this more simple. And the previous design was even more error-prone. Also, the whole way to print the inlining information is quite complex, with strange usage of the copy constructor of PrintInliningBuffer ... which reaches into GrowableArray which should have a constructor that does not use the copy constructor to initialize the elements ... I do not intend to change stringStream in this change. So can I consider this reviewed from your side? Or at least that there is no veto :)? Thanks and best regards, Goetz. > -----Original Message----- > From: David Holmes > Sent: Donnerstag, 19. Dezember 2019 11:39 > To: Lindenmaier, Goetz ; Vladimir Kozlov > ; hotspot-compiler-dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > '8224193: stringStream should not use Resouce Area'. > > On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: > > Hi David, Vladimir, > > > > stringStream is a ResourceObj, thus it lives on an arena. > > This is uncritical, as it does not resize. > > 8224193 only changed the allocation of the internal char*, > > which always caused problems with resizing under > > ResourceMarks that were not placed for the string but to > > free other memory. > > Thus stringStream must not be deallocated, and > > also there was no mem leak before that change. > > But we need to call the destructor to free the char*. > > I think we have a confusing mix of arena and C_heap usage with > stringStream. Not clear to me why stringStream remains a resourceObj > now? In many cases the stringStream is just local on the stack. In other > cases if it is new'd then it should be C-heap same as the array and then > you could delete it too. > > What you have may suffice to initially address the leak but I think this > whole thing needs revisiting. > > Thanks, > David > > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: hotspot-runtime-dev bounces at openjdk.java.net> > >> On Behalf Of David Holmes > >> Sent: Mittwoch, 18. Dezember 2019 04:33 > >> To: Vladimir Kozlov ; hotspot-compiler- > >> dev at openjdk.java.net > >> Cc: hotspot-runtime-dev at openjdk.java.net runtime >> dev at openjdk.java.net> > >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >> '8224193: stringStream should not use Resouce Area'. > >> > >> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > >>> CCing to Runtime group. > >>> > >>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. > >>> I would definitively miss to do that if I use stringStreams in some new > >>> code. > >> > >> But that is not a problem added by this changeset, the problem is that > >> we're not deallocating these stringStreams even though we should be. If > >> you use a stringStream in new code you have to manage its lifecycle. > >> > >> That said why is this: > >> > >> if (_print_inlining_stream != NULL) > >> _print_inlining_stream->~stringStream(); > >> > >> not just: > >> > >> delete _print_inlining_stream; > >> > >> ? > >> > >> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly > >> calling the destructor rather than calling delete? > >> > >> Cheers, > >> David > >> > >>> May be someone can suggest some C++ trick to do that automatically. > >>> Thanks, > >>> Vladimir > >>> > >>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > >>>> Hi, > >>>> > >>>> I'm resending this with fixed bugId ... > >>>> Sorry! > >>>> > >>>> Best regards, > >>>> ?? Goetz > >>>> > >>>>> Hi, > >>>>> > >>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new > and > >>>>> relied on the fact that all memory used is on the ResourceArea cleaned > >>>>> after the compilation. > >>>>> > >>>>> Since 8224193 the char* of the stringStream is malloced and thus > >>>>> must be freed. No doing so manifests a memory leak. > >>>>> This is only relevant if the corresponding tracing is active. > >>>>> > >>>>> To fix TraceLoopPredicate I added the destructor call > >>>>> Fixing PrintInlining is a bit more complicated, as it uses several > >>>>> stringStreams. A row of them is in a GrowableArray which must > >>>>> be walked to free all of them. > >>>>> As the GrowableArray is on an arena no destructor is called for it. > >>>>> > >>>>> I also changed some as_string() calls to base() calls which reduced > >>>>> memory need of the traces, and added a comment explaining the > >>>>> constructor of GrowableArray that calls the copyconstructor for its > >>>>> elements. > >>>>> > >>>>> Please review: > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- > >> c2_tracing_mem_leak/01/ > >>>>> > >>>>> Best regards, > >>>>> ?? Goetz. > >>>> From goetz.lindenmaier at sap.com Thu Dec 19 11:37:24 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 19 Dec 2019 11:37:24 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: One more thing: I think I should push this to jdk14, right? It's a P3 bug. Best regards, Goetz > -----Original Message----- > From: David Holmes > Sent: Donnerstag, 19. Dezember 2019 11:39 > To: Lindenmaier, Goetz ; Vladimir Kozlov > ; hotspot-compiler-dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > '8224193: stringStream should not use Resouce Area'. > > On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: > > Hi David, Vladimir, > > > > stringStream is a ResourceObj, thus it lives on an arena. > > This is uncritical, as it does not resize. > > 8224193 only changed the allocation of the internal char*, > > which always caused problems with resizing under > > ResourceMarks that were not placed for the string but to > > free other memory. > > Thus stringStream must not be deallocated, and > > also there was no mem leak before that change. > > But we need to call the destructor to free the char*. > > I think we have a confusing mix of arena and C_heap usage with > stringStream. Not clear to me why stringStream remains a resourceObj > now? In many cases the stringStream is just local on the stack. In other > cases if it is new'd then it should be C-heap same as the array and then > you could delete it too. > > What you have may suffice to initially address the leak but I think this > whole thing needs revisiting. > > Thanks, > David > > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: hotspot-runtime-dev bounces at openjdk.java.net> > >> On Behalf Of David Holmes > >> Sent: Mittwoch, 18. Dezember 2019 04:33 > >> To: Vladimir Kozlov ; hotspot-compiler- > >> dev at openjdk.java.net > >> Cc: hotspot-runtime-dev at openjdk.java.net runtime >> dev at openjdk.java.net> > >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >> '8224193: stringStream should not use Resouce Area'. > >> > >> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > >>> CCing to Runtime group. > >>> > >>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. > >>> I would definitively miss to do that if I use stringStreams in some new > >>> code. > >> > >> But that is not a problem added by this changeset, the problem is that > >> we're not deallocating these stringStreams even though we should be. If > >> you use a stringStream in new code you have to manage its lifecycle. > >> > >> That said why is this: > >> > >> if (_print_inlining_stream != NULL) > >> _print_inlining_stream->~stringStream(); > >> > >> not just: > >> > >> delete _print_inlining_stream; > >> > >> ? > >> > >> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly > >> calling the destructor rather than calling delete? > >> > >> Cheers, > >> David > >> > >>> May be someone can suggest some C++ trick to do that automatically. > >>> Thanks, > >>> Vladimir > >>> > >>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > >>>> Hi, > >>>> > >>>> I'm resending this with fixed bugId ... > >>>> Sorry! > >>>> > >>>> Best regards, > >>>> ?? Goetz > >>>> > >>>>> Hi, > >>>>> > >>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new > and > >>>>> relied on the fact that all memory used is on the ResourceArea cleaned > >>>>> after the compilation. > >>>>> > >>>>> Since 8224193 the char* of the stringStream is malloced and thus > >>>>> must be freed. No doing so manifests a memory leak. > >>>>> This is only relevant if the corresponding tracing is active. > >>>>> > >>>>> To fix TraceLoopPredicate I added the destructor call > >>>>> Fixing PrintInlining is a bit more complicated, as it uses several > >>>>> stringStreams. A row of them is in a GrowableArray which must > >>>>> be walked to free all of them. > >>>>> As the GrowableArray is on an arena no destructor is called for it. > >>>>> > >>>>> I also changed some as_string() calls to base() calls which reduced > >>>>> memory need of the traces, and added a comment explaining the > >>>>> constructor of GrowableArray that calls the copyconstructor for its > >>>>> elements. > >>>>> > >>>>> Please review: > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- > >> c2_tracing_mem_leak/01/ > >>>>> > >>>>> Best regards, > >>>>> ?? Goetz. > >>>> From vladimir.x.ivanov at oracle.com Thu Dec 19 11:38:46 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 19 Dec 2019 14:38:46 +0300 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: Message-ID: <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com> Aditya, > I'll sponsor this change for you, once you've got the necessary reviews. Please, either post the webrev on cr.openjdk.java.net or just include the patch inline. Best regards, Vladimir Ivanov > > Thank you for your contribution! > Roman > > >> Hi all, >> >> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine. >> >> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers. >> >> Bug: bugs.openjdk.java.net/browse/JDK-8236179 >> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev >> >> The rest of this email contains more information about the issue and the analysis that led to the fix. >> >> Thank you, >> Aditya Mandaleeka >> >> ======== >> How this was found >> >> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. >> >> ======== >> Analysis >> >> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead): >> >> 00000261`9beaf4da 488b4978 mov rcx,qword ptr [rcx+78h] >> 00000261`9beaf4de 488b11 mov rdx,qword ptr [rcx] >> 00000261`9beaf4e1 488d09 lea rcx,[rcx] >> 00000261`9beaf4e4 48898c24b8010000 mov qword ptr [rsp+1B8h],rcx >> 00000261`9beaf4ec 488bca mov rcx,rdx >> 00000261`9beaf4ef 8b9424b8010000 mov edx,dword ptr [rsp+1B8h] >> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0) >> 00000261`9beaf500 41ffd2 call r10 >> >> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid. >> >> I found the following in the IR which seemed suspicious: >> >> wide_move [Base:[R653|M] Disp: 0|L] [R654|L] >> leal [Base:[R653|M] Disp: 0|L] [R655|L] >> move [R654|L] [rcx|L] >> move [R655|L] [rdx|I] >> rtcall ShenandoahRuntime::load_reference_barrier_native >> >> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself. >> >> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland! >> >> ======== >> Testing done >> >> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch: >> tier1 >> tier2 >> hotspot_gc_shenandoah >> > From david.holmes at oracle.com Thu Dec 19 12:52:22 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 Dec 2019 22:52:22 +1000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote: > Hi, > > yes, it is confusing that parts are on the arena, other parts > are allocated in the C-heap. > But usages which allocate the stringStream with new() are > rare, usually it's allocated on the stack making all this > more simple. And the previous design was even more > error-prone. > Also, the whole way to print the inlining information > is quite complex, with strange usage of the copy constructor > of PrintInliningBuffer ... which reaches into GrowableArray > which should have a constructor that does not use the > copy constructor to initialize the elements ... > > I do not intend to change stringStream in this change. > So can I consider this reviewed from your side? Or at > least that there is no veto :)? Sorry I was trying to convey this is Reviewed, but I do think this needs further work in the future. Thanks, David > Thanks and best regards, > Goetz. > > >> -----Original Message----- >> From: David Holmes >> Sent: Donnerstag, 19. Dezember 2019 11:39 >> To: Lindenmaier, Goetz ; Vladimir Kozlov >> ; hotspot-compiler-dev at openjdk.java.net >> Cc: hotspot-runtime-dev at openjdk.java.net runtime > dev at openjdk.java.net> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >> '8224193: stringStream should not use Resouce Area'. >> >> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: >>> Hi David, Vladimir, >>> >>> stringStream is a ResourceObj, thus it lives on an arena. >>> This is uncritical, as it does not resize. >>> 8224193 only changed the allocation of the internal char*, >>> which always caused problems with resizing under >>> ResourceMarks that were not placed for the string but to >>> free other memory. >>> Thus stringStream must not be deallocated, and >>> also there was no mem leak before that change. >>> But we need to call the destructor to free the char*. >> >> I think we have a confusing mix of arena and C_heap usage with >> stringStream. Not clear to me why stringStream remains a resourceObj >> now? In many cases the stringStream is just local on the stack. In other >> cases if it is new'd then it should be C-heap same as the array and then >> you could delete it too. >> >> What you have may suffice to initially address the leak but I think this >> whole thing needs revisiting. >> >> Thanks, >> David >> >>> >>> Best regards, >>> Goetz. >>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev > bounces at openjdk.java.net> >>>> On Behalf Of David Holmes >>>> Sent: Mittwoch, 18. Dezember 2019 04:33 >>>> To: Vladimir Kozlov ; hotspot-compiler- >>>> dev at openjdk.java.net >>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>> dev at openjdk.java.net> >>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>>> '8224193: stringStream should not use Resouce Area'. >>>> >>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: >>>>> CCing to Runtime group. >>>>> >>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. >>>>> I would definitively miss to do that if I use stringStreams in some new >>>>> code. >>>> >>>> But that is not a problem added by this changeset, the problem is that >>>> we're not deallocating these stringStreams even though we should be. If >>>> you use a stringStream in new code you have to manage its lifecycle. >>>> >>>> That said why is this: >>>> >>>> if (_print_inlining_stream != NULL) >>>> _print_inlining_stream->~stringStream(); >>>> >>>> not just: >>>> >>>> delete _print_inlining_stream; >>>> >>>> ? >>>> >>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly >>>> calling the destructor rather than calling delete? >>>> >>>> Cheers, >>>> David >>>> >>>>> May be someone can suggest some C++ trick to do that automatically. >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I'm resending this with fixed bugId ... >>>>>> Sorry! >>>>>> >>>>>> Best regards, >>>>>> ?? Goetz >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new >> and >>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned >>>>>>> after the compilation. >>>>>>> >>>>>>> Since 8224193 the char* of the stringStream is malloced and thus >>>>>>> must be freed. No doing so manifests a memory leak. >>>>>>> This is only relevant if the corresponding tracing is active. >>>>>>> >>>>>>> To fix TraceLoopPredicate I added the destructor call >>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several >>>>>>> stringStreams. A row of them is in a GrowableArray which must >>>>>>> be walked to free all of them. >>>>>>> As the GrowableArray is on an arena no destructor is called for it. >>>>>>> >>>>>>> I also changed some as_string() calls to base() calls which reduced >>>>>>> memory need of the traces, and added a comment explaining the >>>>>>> constructor of GrowableArray that calls the copyconstructor for its >>>>>>> elements. >>>>>>> >>>>>>> Please review: >>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- >>>> c2_tracing_mem_leak/01/ >>>>>>> >>>>>>> Best regards, >>>>>>> ?? Goetz. >>>>>> From david.holmes at oracle.com Thu Dec 19 12:54:15 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 Dec 2019 22:54:15 +1000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: <0392412b-93e1-d99d-f919-96c90c895e30@oracle.com> On 19/12/2019 9:37 pm, Lindenmaier, Goetz wrote: > One more thing: > > I think I should push this to jdk14, right? > It's a P3 bug. Yes this can go to 14 (and will forward port to 15 automatically). Thanks, David > Best regards, > Goetz > >> -----Original Message----- >> From: David Holmes >> Sent: Donnerstag, 19. Dezember 2019 11:39 >> To: Lindenmaier, Goetz ; Vladimir Kozlov >> ; hotspot-compiler-dev at openjdk.java.net >> Cc: hotspot-runtime-dev at openjdk.java.net runtime > dev at openjdk.java.net> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >> '8224193: stringStream should not use Resouce Area'. >> >> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: >>> Hi David, Vladimir, >>> >>> stringStream is a ResourceObj, thus it lives on an arena. >>> This is uncritical, as it does not resize. >>> 8224193 only changed the allocation of the internal char*, >>> which always caused problems with resizing under >>> ResourceMarks that were not placed for the string but to >>> free other memory. >>> Thus stringStream must not be deallocated, and >>> also there was no mem leak before that change. >>> But we need to call the destructor to free the char*. >> >> I think we have a confusing mix of arena and C_heap usage with >> stringStream. Not clear to me why stringStream remains a resourceObj >> now? In many cases the stringStream is just local on the stack. In other >> cases if it is new'd then it should be C-heap same as the array and then >> you could delete it too. >> >> What you have may suffice to initially address the leak but I think this >> whole thing needs revisiting. >> >> Thanks, >> David >> >>> >>> Best regards, >>> Goetz. >>> >>>> -----Original Message----- >>>> From: hotspot-runtime-dev > bounces at openjdk.java.net> >>>> On Behalf Of David Holmes >>>> Sent: Mittwoch, 18. Dezember 2019 04:33 >>>> To: Vladimir Kozlov ; hotspot-compiler- >>>> dev at openjdk.java.net >>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>> dev at openjdk.java.net> >>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>>> '8224193: stringStream should not use Resouce Area'. >>>> >>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: >>>>> CCing to Runtime group. >>>>> >>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. >>>>> I would definitively miss to do that if I use stringStreams in some new >>>>> code. >>>> >>>> But that is not a problem added by this changeset, the problem is that >>>> we're not deallocating these stringStreams even though we should be. If >>>> you use a stringStream in new code you have to manage its lifecycle. >>>> >>>> That said why is this: >>>> >>>> if (_print_inlining_stream != NULL) >>>> _print_inlining_stream->~stringStream(); >>>> >>>> not just: >>>> >>>> delete _print_inlining_stream; >>>> >>>> ? >>>> >>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly >>>> calling the destructor rather than calling delete? >>>> >>>> Cheers, >>>> David >>>> >>>>> May be someone can suggest some C++ trick to do that automatically. >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >>>>>> Hi, >>>>>> >>>>>> I'm resending this with fixed bugId ... >>>>>> Sorry! >>>>>> >>>>>> Best regards, >>>>>> ?? Goetz >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new >> and >>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned >>>>>>> after the compilation. >>>>>>> >>>>>>> Since 8224193 the char* of the stringStream is malloced and thus >>>>>>> must be freed. No doing so manifests a memory leak. >>>>>>> This is only relevant if the corresponding tracing is active. >>>>>>> >>>>>>> To fix TraceLoopPredicate I added the destructor call >>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several >>>>>>> stringStreams. A row of them is in a GrowableArray which must >>>>>>> be walked to free all of them. >>>>>>> As the GrowableArray is on an arena no destructor is called for it. >>>>>>> >>>>>>> I also changed some as_string() calls to base() calls which reduced >>>>>>> memory need of the traces, and added a comment explaining the >>>>>>> constructor of GrowableArray that calls the copyconstructor for its >>>>>>> elements. >>>>>>> >>>>>>> Please review: >>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- >>>> c2_tracing_mem_leak/01/ >>>>>>> >>>>>>> Best regards, >>>>>>> ?? Goetz. >>>>>> From gromero at linux.vnet.ibm.com Thu Dec 19 13:43:50 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 19 Dec 2019 10:43:50 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Message-ID: Hi Martin, On 12/19/2019 07:37 AM, Doerr, Martin wrote: > Hi Gustavo, > > thanks for the update. Looks good. > > Please remove the whitespaces between the instructions and '(' in generate_data_cache_writeback_sync() before pushing. > Marked here by 'X': > + __ andi_X(temp, is_presync, 1); > + __ bneX(CCR0, SKIP); > + __ cache_wbsync(false); // post sync => emit 'sync' > + __ bindX(SKIP); // pre sync => emit nothing Just for records, I uploaded v3 without the whitespaces to: http://cr.openjdk.java.net/~gromero/8234599/v3/ instead of fixing it in place. Should I wait any test to complete at SAP side? Thanks a lot. Best regards, Gustavo From adinn at redhat.com Thu Dec 19 14:05:46 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 19 Dec 2019 14:05:46 +0000 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Message-ID: <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com> Hi Gustavo, On 19/12/2019 13:43, Gustavo Romero wrote: > http://cr.openjdk.java.net/~gromero/8234599/v3/ Disclaimer: I reviewed an early version of this patch offline before anything arrived on hotspot-compiler-dev. Just for the record I am happy with this final version. I make no claim to have understood the ppc-specific aspects correctly but I am happy that Matthias and Martin will have covered that part of the review adequately. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From rwestrel at redhat.com Thu Dec 19 14:14:51 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 19 Dec 2019 15:14:51 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: Message-ID: <87h81wl7vo.fsf@redhat.com> Hi Aditya, AFAIK, it's a requirement that the patch be posted on the openjdk infrastructure. So here it is: http://cr.openjdk.java.net/~roland/8236179/webrev.00/ The change looks good to me but it would be good to check whether architectures other than x86 need a similar change. Roland. From gromero at linux.vnet.ibm.com Thu Dec 19 14:19:30 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 19 Dec 2019 11:19:30 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com> References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> <5a48bb64-2ed8-cfaa-0577-7db5200be4a1@redhat.com> Message-ID: <5baa6af8-d48c-3a51-f851-f0e27e3eeb79@linux.vnet.ibm.com> Hi Andrew, On 12/19/2019 11:05 AM, Andrew Dinn wrote: > Hi Gustavo, > > On 19/12/2019 13:43, Gustavo Romero wrote: >> http://cr.openjdk.java.net/~gromero/8234599/v3/ > Disclaimer: I reviewed an early version of this patch offline before > anything arrived on hotspot-compiler-dev. > > Just for the record I am happy with this final version. I make no claim > to have understood the ppc-specific aspects correctly but I am happy > that Matthias and Martin will have covered that part of the review > adequately. Thanks for the reviews. Also, thanks a lot for all the discussions and suggestions on how to test the change on different scenarios on Power, w/ and w/o pmem support. Best regards, Gustavo From matthias.baesken at sap.com Thu Dec 19 14:07:34 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Thu, 19 Dec 2019 14:07:34 +0000 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Message-ID: Hi Gustavo , please remove the blank after __ bind too, as suggested by Martin . http://cr.openjdk.java.net/~gromero/8234599/v3/src/hotspot/cpu/ppc/stubGenerator_ppc.cpp.frames.html 3070 __ bind (SKIP); // pre sync => emit nothing Otherwise looks good to me . Thanks, Matthias > > Hi Martin, > > On 12/19/2019 07:37 AM, Doerr, Martin wrote: > > Hi Gustavo, > > > > thanks for the update. Looks good. > > > > Please remove the whitespaces between the instructions and '(' in > generate_data_cache_writeback_sync() before pushing. > > Marked here by 'X': > > + __ andi_X(temp, is_presync, 1); > > + __ bneX(CCR0, SKIP); > > + __ cache_wbsync(false); // post sync => emit 'sync' > > + __ bindX(SKIP); // pre sync => emit nothing <-------------------------------------------------------- > > Just for records, I uploaded v3 without the whitespaces to: > > http://cr.openjdk.java.net/~gromero/8234599/v3/ > > instead of fixing it in place. > > Should I wait any test to complete at SAP side? > > Thanks a lot. > > > Best regards, > Gustavo From gromero at linux.vnet.ibm.com Thu Dec 19 14:39:10 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 19 Dec 2019 11:39:10 -0300 Subject: RFR(M): 8234599: PPC64: Add support on recent CPUs and Linux for JEP-352 In-Reply-To: References: <414ff627-4816-4f71-80d9-7b1d398ca8e4@linux.vnet.ibm.com> <1cb79ee7-8700-660d-fb06-b234f556f489@linux.vnet.ibm.com> Message-ID: <212d785f-13fa-1c77-26d4-f973d032fb56@linux.vnet.ibm.com> Hi Matthias, On 12/19/2019 11:07 AM, Baesken, Matthias wrote: > Hi Gustavo , please remove the blank after __ bind too, as suggested by Martin . > > http://cr.openjdk.java.net/~gromero/8234599/v3/src/hotspot/cpu/ppc/stubGenerator_ppc.cpp.frames.html > > 3070 __ bind (SKIP); // pre sync => emit nothing > > Otherwise looks good to me . Amazing how many times I was able to mess with the spaces. v4 with Andrew added as Reviewer: http://cr.openjdk.java.net/~gromero/8234599/v4/ I plan push it to jdk/jdk today. Thanks! Best regards, Gustavo From augustnagro at gmail.com Thu Dec 19 14:38:49 2019 From: augustnagro at gmail.com (August Nagro) Date: Thu, 19 Dec 2019 08:38:49 -0600 Subject: Bounds Check Elimination with Fast-Range In-Reply-To: <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com> References: <19544187-6670-4A65-AF47-297F38E8D555@gmail.com> <87r22549fg.fsf@oldenburg2.str.redhat.com> <1AB809E9-27B3-423E-AB1A-448D6B4689F6@oracle.com> <877e3x0wji.fsf@oldenburg2.str.redhat.com> <28CF6CF4-3E20-489C-9659-C50E4222EC22@oracle.com> <2ca24471-a5e8-580a-66bc-37b0eece8898@redhat.com> Message-ID: <2F5736F4-3B57-4E97-9A0D-9B32B2EDD82A@gmail.com> One thing I?ve come to realize is that the bit-mixing step in fast-range does not require a high quality hash function. It just needs to distribute the bits so that the probability of falling into the buckets [0, 2^32), [2^32, 2 * 2^32), [2*2^32, 3*2^32), ?, is even. This is why I?m so enthusiastic about fibonacci hashing for this operation, since it costs only a single multiply. To prove it works, take a look at this sample. The integer gFactor is equal to 2^32 / phi, where phi is the Golden Ratio. Note that hash * gFactor is 32-bit multiplication, and we AND with 0xffffffffL for the unsigned multiply with int size. Set::add returns false if the element is present. import java.util.HashSet; import java.util.Set; class Scratch { public static void main(String[] args) { int gFactor = -1640531527; int size = 1_000; int range = 4_000; int collisions = 0; Set set = new HashSet<>(); for (int hash = 0; hash < range; ++hash) { int reduced = (int) (((hash * gFactor) & 0xffffffffL) * size >>> 32); if (!set.add(reduced)) collisions++; } System.out.println("Optimal collisions: " + (range - size)); System.out.println("Actual collisions: " + collisions); } } > On Dec 19, 2019, at 4:31 AM, Andrew Haley wrote: > > On 12/3/19 6:06 AM, John Rose wrote: >> But what I keep wishing for a good one- or two-cycle instruction >> that will mix all input bits into all the output bits, so that any >> change in one input bit is likely to cause cascading changes in >> many output bits, perhaps even 50% on average. A pair of AES >> steps is a good example of this. > > I've searched for something like this too. However, I > experimented with two rounds of AES and I didn't get very good > results. From what I remember, it took at least three or four > rounds to get decent mixing, let alone full avalanche. > > Also, the latency of AES instructions tends to be a few cycles > and the latency of moving data from integer to vector registers > is as many as five cycles. So I gave up. > > This was a while ago, so probably badly remembered... > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From martin.doerr at sap.com Thu Dec 19 16:31:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 19 Dec 2019 16:31:05 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: <87h81wl7vo.fsf@redhat.com> References: <87h81wl7vo.fsf@redhat.com> Message-ID: Hi everybody, thanks for fixing this issue. I guess it's currently used on some platforms, but I think we should fix it for all platforms. Otherwise it will break when using the parts which were only fixed for x86. Here's my proposal: http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/ I'll run tests on more platforms. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Roland Westrelin > Sent: Donnerstag, 19. Dezember 2019 15:15 > To: Aditya Mandaleeka ; hotspot compiler > > Cc: shenandoah-dev > Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS > > > Hi Aditya, > > AFAIK, it's a requirement that the patch be posted on the openjdk > infrastructure. So here it is: > > http://cr.openjdk.java.net/~roland/8236179/webrev.00/ > > The change looks good to me but it would be good to check whether > architectures other than x86 need a similar change. > > Roland. From adityam at microsoft.com Thu Dec 19 17:37:47 2019 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 19 Dec 2019 17:37:47 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com> References: <6e19116b-a94e-1634-488c-b1573da0d707@oracle.com> Message-ID: Thanks for the feedback Vladimir. I don't yet have access to cr.openjdk.java.net, but will paste inline diffs in the future. For now, it appears Martin Doerr has posted an updated webrev on cr.openjdk.java.net in another fork of this mail thread. Thanks, Aditya -----Original Message----- From: Vladimir Ivanov Sent: Thursday, December 19, 2019 3:39 AM To: Aditya Mandaleeka ; hotspot compiler Cc: Roman Kennke ; shenandoah-dev Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS Aditya, > I'll sponsor this change for you, once you've got the necessary reviews. Please, either post the webrev on cr.openjdk.java.net or just include the patch inline. Best regards, Vladimir Ivanov > > Thank you for your contribution! > Roman > > >> Hi all, >> >> I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine. >> >> The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers. >> >> Bug: bugs.openjdk.java.net/browse/JDK-8236179 >> Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev >> >> The rest of this email contains more information about the issue and the analysis that led to the fix. >> >> Thank you, >> Aditya Mandaleeka >> >> ======== >> How this was found >> >> As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. >> >> ======== >> Analysis >> >> Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead): >> >> 00000261`9beaf4da 488b4978 mov rcx,qword ptr [rcx+78h] >> 00000261`9beaf4de 488b11 mov rdx,qword ptr [rcx] >> 00000261`9beaf4e1 488d09 lea rcx,[rcx] >> 00000261`9beaf4e4 48898c24b8010000 mov qword ptr [rsp+1B8h],rcx >> 00000261`9beaf4ec 488bca mov rcx,rdx >> 00000261`9beaf4ef 8b9424b8010000 mov edx,dword ptr [rsp+1B8h] >> 00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0) >> 00000261`9beaf500 41ffd2 call r10 >> >> The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid. >> >> I found the following in the IR which seemed suspicious: >> >> wide_move [Base:[R653|M] Disp: 0|L] [R654|L] >> leal [Base:[R653|M] Disp: 0|L] [R655|L] >> move [R654|L] [rcx|L] >> move [R655|L] [rdx|I] >> rtcall ShenandoahRuntime::load_reference_barrier_native >> >> As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself. >> >> In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland! >> >> ======== >> Testing done >> >> Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch: >> tier1 >> tier2 >> hotspot_gc_shenandoah >> > From adityam at microsoft.com Thu Dec 19 17:49:19 2019 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Thu, 19 Dec 2019 17:49:19 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> Message-ID: Thanks for updating the other platforms Martin. Those changes look right to me. -Aditya -----Original Message----- From: Doerr, Martin Sent: Thursday, December 19, 2019 8:31 AM To: Roland Westrelin ; Aditya Mandaleeka ; hotspot compiler Cc: shenandoah-dev Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS Hi everybody, thanks for fixing this issue. I guess it's currently used on some platforms, but I think we should fix it for all platforms. Otherwise it will break when using the parts which were only fixed for x86. Here's my proposal: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BUWGY%3D&reserved=0 I'll run tests on more platforms. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Roland Westrelin > Sent: Donnerstag, 19. Dezember 2019 15:15 > To: Aditya Mandaleeka ; hotspot compiler > > Cc: shenandoah-dev > Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS > > > Hi Aditya, > > AFAIK, it's a requirement that the patch be posted on the openjdk > infrastructure. So here it is: > > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open > jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&data=02%7C01%7Cadi > tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf86f1 > 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&sdata=vQ1xR87EjA > bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&reserved=0 > > The change looks good to me but it would be good to check whether > architectures other than x86 need a similar change. > > Roland. From vladimir.kozlov at oracle.com Thu Dec 19 18:06:41 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 10:06:41 -0800 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> Message-ID: <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com> Please, file RFE for refactoring stringStream. Yes, the fix can go into JDK 14. But before that, I see the same pattern used 3 times in compile.cpp: + if (_print_inlining_stream != NULL) _print_inlining_stream->~stringStream(); _print_inlining_stream = ; Can you use one function for that? Also our coding style requires to put body of 'if' on separate line and use {}. thanks, Vladimir On 12/19/19 4:52 AM, David Holmes wrote: > On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote: >> Hi, >> >> yes, it is confusing that parts are on the arena, other parts >> are allocated in the C-heap. >> But usages which allocate the stringStream with new() are >> rare, usually it's allocated on the stack making all this >> more simple.? And the previous design was even more >> error-prone. >> Also, the whole way to print the inlining information >> is quite complex, with strange usage of the copy constructor >> of PrintInliningBuffer ... which reaches into GrowableArray >> which should have a constructor that does not use the >> copy constructor to initialize the elements ... >> >> I do not intend to change stringStream in this change. >> So can I consider this reviewed from your side? Or at >> least that there is no veto :)? > > Sorry I was trying to convey this is Reviewed, but I do think this needs further work in the future. > > Thanks, > David > >> Thanks and best regards, >> ?? Goetz. >> >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Donnerstag, 19. Dezember 2019 11:39 >>> To: Lindenmaier, Goetz ; Vladimir Kozlov >>> ; hotspot-compiler-dev at openjdk.java.net >>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >> dev at openjdk.java.net> >>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>> '8224193: stringStream should not use Resouce Area'. >>> >>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: >>>> Hi David, Vladimir, >>>> >>>> stringStream is a ResourceObj, thus it lives on an arena. >>>> This is uncritical, as it does not resize. >>>> 8224193 only changed the allocation of the internal char*, >>>> which always caused problems with resizing under >>>> ResourceMarks that were not placed for the string but to >>>> free other memory. >>>> Thus stringStream must not be deallocated, and >>>> also there was no mem leak before that change. >>>> But we need to call the destructor to free the char*. >>> >>> I think we have a confusing mix of arena and C_heap usage with >>> stringStream. Not clear to me why stringStream remains a resourceObj >>> now? In many cases the stringStream is just local on the stack. In other >>> cases if it is new'd then it should be C-heap same as the array and then >>> you could delete it too. >>> >>> What you have may suffice to initially address the leak but I think this >>> whole thing needs revisiting. >>> >>> Thanks, >>> David >>> >>>> >>>> Best regards, >>>> ??? Goetz. >>>> >>>>> -----Original Message----- >>>>> From: hotspot-runtime-dev >> bounces at openjdk.java.net> >>>>> On Behalf Of David Holmes >>>>> Sent: Mittwoch, 18. Dezember 2019 04:33 >>>>> To: Vladimir Kozlov ; hotspot-compiler- >>>>> dev at openjdk.java.net >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>>> dev at openjdk.java.net> >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>>>> '8224193: stringStream should not use Resouce Area'. >>>>> >>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: >>>>>> CCing to Runtime group. >>>>>> >>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not obvious. >>>>>> I would definitively miss to do that if I use stringStreams in some new >>>>>> code. >>>>> >>>>> But that is not a problem added by this changeset, the problem is that >>>>> we're not deallocating these stringStreams even though we should be.? If >>>>> you use a stringStream in new code you have to manage its lifecycle. >>>>> >>>>> That said why is this: >>>>> >>>>> ??? if (_print_inlining_stream != NULL) >>>>> _print_inlining_stream->~stringStream(); >>>>> >>>>> not just: >>>>> >>>>> delete _print_inlining_stream; >>>>> >>>>> ? >>>>> >>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we explicitly >>>>> calling the destructor rather than calling delete? >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> May be someone can suggest some C++ trick to do that automatically. >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm resending this with fixed bugId ... >>>>>>> Sorry! >>>>>>> >>>>>>> Best regards, >>>>>>> ? ?? Goetz >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with new >>> and >>>>>>>> relied on the fact that all memory used is on the ResourceArea cleaned >>>>>>>> after the compilation. >>>>>>>> >>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus >>>>>>>> must be freed. No doing so manifests a memory leak. >>>>>>>> This is only relevant if the corresponding tracing is active. >>>>>>>> >>>>>>>> To fix TraceLoopPredicate I added the destructor call >>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several >>>>>>>> stringStreams. A row of them is in a GrowableArray which must >>>>>>>> be walked to free all of them. >>>>>>>> As the GrowableArray is on an arena no destructor is called for it. >>>>>>>> >>>>>>>> I also changed some as_string() calls to base() calls which reduced >>>>>>>> memory need of the traces, and added a comment explaining the >>>>>>>> constructor of GrowableArray that calls the copyconstructor for its >>>>>>>> elements. >>>>>>>> >>>>>>>> Please review: >>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- >>>>> c2_tracing_mem_leak/01/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> ? ?? Goetz. >>>>>>> From vladimir.x.ivanov at oracle.com Thu Dec 19 18:15:26 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 19 Dec 2019 21:15:26 +0300 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> Message-ID: Thanks for the feedback, Vladimir. > c1_CodeStubs.hpp - I think it should be stronger than assert to catch it > in product too (we can do check in product because it is not performance > critical code). Do you prefer to see guarantee/fatal instead? Frankly speaking, even the assert doesn't look warranted enough. I put it there mainly to validate my own changes. I could have introduced ConversionStubs for x86-64 as well, but decided to simplify the implementation. Considering x86-32 is the only consumer, I'd prefer to have ConversionStub x86_32-specific instead and completely hide it from other platforms, but it requires putting more #ifdefs in shared code which I don't like either. So, if you see a value in having a runtime check in product binaries there, I'll put it there. > c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we > use AMD64: > > https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 Yes, you are right. Good catch! :-) Best regards, Vladimir Ivanov > On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-7175279 >> >> There was a major rewrite of math intrinsics which in 9 time frame >> which almost completely eliminated x87 code in x86-64 code base. >> >> Proposed patch removes the rest and makes x86-64 code x87-free. >> >> The main motivation for the patch is to completely eliminate >> non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] >> and related enhancements [2]. >> >> Most of the changes are in C1, but there is one case in template >> interpreter (java_lang_math_abs) which now uses >> StubRoutines::x86::double_sign_mask(). It forces its initialization to >> be moved to StubRoutines::initialize1(). >> >> x87 instructions are made available only on x86-32. >> >> C1 changes involve removing FPU support on x86-64 and effectively make >> x86-specific support in linear scan allocator [2] x86-32-only. >> >> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >> >> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp >> From vladimir.kozlov at oracle.com Thu Dec 19 18:34:25 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 10:34:25 -0800 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> Message-ID: <536139cd-53a1-f697-9c83-296f2d311536@oracle.com> On 12/19/19 10:15 AM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir. > >> c1_CodeStubs.hpp - I think it should be stronger than assert to catch it in product too (we can do check in product >> because it is not performance critical code). > > Do you prefer to see guarantee/fatal instead? > > Frankly speaking, even the assert doesn't look warranted enough. > I put it there mainly to validate my own changes. I could have introduced ConversionStubs for x86-64 as well, but > decided to simplify the implementation. > > Considering x86-32 is the only consumer, I'd prefer to have ConversionStub x86_32-specific instead and completely hide > it from other platforms, but it requires putting more #ifdefs in shared code which I don't like either. > > So, if you see a value in having a runtime check in product binaries there, I'll put it there. Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I actually can't see how it could be only 32-specific. Hmm? Vladimir K > >> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64: >> >> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 > > Yes, you are right. Good catch! :-) > > Best regards, > Vladimir Ivanov > >> On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>> >>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in >>> x86-64 code base. >>> >>> Proposed patch removes the rest and makes x86-64 code x87-free. >>> >>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM >>> for JEP 306 [1] and related enhancements [2]. >>> >>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses >>> StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). >>> >>> x87 instructions are made available only on x86-32. >>> >>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator >>> [2] x86-32-only. >>> >>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>> >>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>> >>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From vladimir.x.ivanov at oracle.com Thu Dec 19 18:40:46 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 19 Dec 2019 21:40:46 +0300 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <536139cd-53a1-f697-9c83-296f2d311536@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> <536139cd-53a1-f697-9c83-296f2d311536@oracle.com> Message-ID: <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com> > Make ConversionStub x86_32-specific only if possible. From what I see it > is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I actually > can't see how it could be only 32-specific. Hmm? I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which I don't like. So, I don't consider it as an option right now. Best regards, Vladimir Ivanov >>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 >>> we use AMD64: >>> >>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 >> >> >> Yes, you are right. Good catch! :-) >> >> Best regards, >> Vladimir Ivanov >> >>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>>> >>>> There was a major rewrite of math intrinsics which in 9 time frame >>>> which almost completely eliminated x87 code in x86-64 code base. >>>> >>>> Proposed patch removes the rest and makes x86-64 code x87-free. >>>> >>>> The main motivation for the patch is to completely eliminate >>>> non-strictfp behaving code in order to prepare the JVM for JEP 306 >>>> [1] and related enhancements [2]. >>>> >>>> Most of the changes are in C1, but there is one case in template >>>> interpreter (java_lang_math_abs) which now uses >>>> StubRoutines::x86::double_sign_mask(). It forces its initialization >>>> to be moved to StubRoutines::initialize1(). >>>> >>>> x87 instructions are made available only on x86-32. >>>> >>>> C1 changes involve removing FPU support on x86-64 and effectively >>>> make x86-specific support in linear scan allocator [2] x86-32-only. >>>> >>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>>> >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>>> >>>> [3] >>>> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp >>>> From vladimir.kozlov at oracle.com Thu Dec 19 18:52:53 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 10:52:53 -0800 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> <536139cd-53a1-f697-9c83-296f2d311536@oracle.com> <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com> Message-ID: <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com> On 12/19/19 10:40 AM, Vladimir Ivanov wrote: > >> Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have >> to deal with. I actually can't see how it could be only 32-specific. Hmm? > > I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which I don't like. So, I don't consider it as an > option right now. Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp should be enough for now. Vladimir K > > Best regards, > Vladimir Ivanov > >>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64: >>>> >>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 >>> >>> >>> Yes, you are right. Good catch! :-) >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>>>> >>>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in >>>>> x86-64 code base. >>>>> >>>>> Proposed patch removes the rest and makes x86-64 code x87-free. >>>>> >>>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM >>>>> for JEP 306 [1] and related enhancements [2]. >>>>> >>>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses >>>>> StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). >>>>> >>>>> x87 instructions are made available only on x86-32. >>>>> >>>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan >>>>> allocator [2] x86-32-only. >>>>> >>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>>>> >>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>>>> >>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From igor.veresov at oracle.com Thu Dec 19 19:59:53 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 19 Dec 2019 11:59:53 -0800 Subject: [14] RFR(M) 8235927: Update Graal Message-ID: JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ Please find the list of changes in the JBS issue. Thanks, igor From ekaterina.pavlova at oracle.com Thu Dec 19 20:34:35 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 19 Dec 2019 12:34:35 -0800 Subject: [14] RFR(T/XXS) 8236139: [Graal] java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled Message-ID: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com> Hi, please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file. The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build. JBS: https://bugs.openjdk.java.net/browse/JDK-8236139 webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html thanks, -katya From vladimir.kozlov at oracle.com Thu Dec 19 20:43:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 12:43:39 -0800 Subject: [14] RFR(T/XXS) 8236139: [Graal] java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled In-Reply-To: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com> References: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com> Message-ID: <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com> Good and trivial. Thanks, Vladimir On 12/19/19 12:34 PM, Ekaterina Pavlova wrote: > Hi, > > please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file. > The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding > entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build. > > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8236139 > ?webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html > > > thanks, > -katya From vladimir.kozlov at oracle.com Thu Dec 19 20:54:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 12:54:02 -0800 Subject: [14] RFR(M) 8235927: Update Graal In-Reply-To: References: Message-ID: <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com> Looks good. There seem no new testing failures, only old. Thanks, Vladimir On 12/19/19 11:59 AM, Igor Veresov wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 > Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ > > Please find the list of changes in the JBS issue. > > Thanks, > igor > > > From ekaterina.pavlova at oracle.com Thu Dec 19 21:41:02 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 19 Dec 2019 13:41:02 -0800 Subject: [14] RFR(T/XXS) 8236139: [Graal] java/lang/RuntimeTests/exec/LotsOfOutput.java fails with JVMCI enabled In-Reply-To: <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com> References: <63b09d2d-d0be-3051-dad7-dc29221eb7eb@oracle.com> <6a44546d-1933-b8c1-82d1-21eb6c382e88@oracle.com> Message-ID: <9471ece1-75b9-e6b9-3aa5-9759cad0c001@oracle.com> Thanks Vladimir for prompt review, integrated. -katya On 12/19/19 12:43 PM, Vladimir Kozlov wrote: > Good and trivial. > > Thanks, > Vladimir > > On 12/19/19 12:34 PM, Ekaterina Pavlova wrote: >> Hi, >> >> please review very trivial fix which returns LotsOfOutput.java test back to Graal specific problem list file. >> The test was recently moved from java/lang/Runtime/exec directory to java/lang/RuntimeTests but corresponding >> entry in test/jdk/ProblemList-graal.txt was not patched. This is why the test started to fail in latest build. >> >> >> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8236139 >> ??webrev: http://cr.openjdk.java.net/~epavlova//8236139/webrev.00/index.html >> >> >> thanks, >> -katya From vladimir.x.ivanov at oracle.com Thu Dec 19 21:58:19 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 20 Dec 2019 00:58:19 +0300 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com> References: <02b93959-9ec3-45c1-9ba7-ebd5682548e4@oracle.com> <536139cd-53a1-f697-9c83-296f2d311536@oracle.com> <14878e70-359e-0716-1fb2-cc54bc77f093@oracle.com> <660d8e71-6fcd-3e71-fac3-7a4caa787872@oracle.com> Message-ID: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com> >>> Make ConversionStub x86_32-specific only if possible. From what I see >>> it is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I >>> actually can't see how it could be only 32-specific. Hmm? >> >> I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp >> which I don't like. So, I don't consider it as an? > option right now. > > Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp > should be enough for now. Incremental diff: http://cr.openjdk.java.net/~vlivanov/7175279/webrev.01-00/ Best regards, Vladimir Ivanov >>>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit >>>>> x86 we use AMD64: >>>>> >>>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 >>>> >>>> >>>> >>>> Yes, you are right. Good catch! :-) >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >>>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>>>>> >>>>>> There was a major rewrite of math intrinsics which in 9 time frame >>>>>> which almost completely eliminated x87 code in x86-64 code base. >>>>>> >>>>>> Proposed patch removes the rest and makes x86-64 code x87-free. >>>>>> >>>>>> The main motivation for the patch is to completely eliminate >>>>>> non-strictfp behaving code in order to prepare the JVM for JEP 306 >>>>>> [1] and related enhancements [2]. >>>>>> >>>>>> Most of the changes are in C1, but there is one case in template >>>>>> interpreter (java_lang_math_abs) which now uses >>>>>> StubRoutines::x86::double_sign_mask(). It forces its >>>>>> initialization to be moved to StubRoutines::initialize1(). >>>>>> >>>>>> x87 instructions are made available only on x86-32. >>>>>> >>>>>> C1 changes involve removing FPU support on x86-64 and effectively >>>>>> make x86-specific support in linear scan allocator [2] x86-32-only. >>>>>> >>>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>>>>> >>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>>>>> >>>>>> [3] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp >>>>>> From vladimir.kozlov at oracle.com Thu Dec 19 22:00:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 14:00:02 -0800 Subject: [15] RFR (L): 7175279: Don't use x87 FPU on x86-64 In-Reply-To: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com> References: <0b0897e0-1dbc-306d-b2cb-31de13fb8b34@oracle.com> Message-ID: <7063EB29-D415-4A48-BA4F-B16C5A1F52F8@oracle.com> Good Thanks Vladimir > On Dec 19, 2019, at 1:58 PM, Vladimir Ivanov wrote: > > ? >>>> Make ConversionStub x86_32-specific only if possible. From what I see it is only LIR_OpConvert in c1_LIR.hpp we have to deal with. I actually can't see how it could be only 32-specific. Hmm? >>> >>> I experimented with it, but it requires #ifdefs in c1_LIR.cpp/hpp which I don't like. So, I don't consider it as an > option right now. >> Okay, NOT_IA32( ShouldNotReachHere() ) with comment in c1_CodeStubs.hpp should be enough for now. > > Incremental diff: > http://cr.openjdk.java.net/~vlivanov/7175279/webrev.01-00/ > > Best regards, > Vladimir Ivanov > >>>>>> c1_LinearScan.cpp - I think IA64 is used for Itanium. For 64-bit x86 we use AMD64: >>>>>> >>>>>> https://hg.openjdk.java.net/jdk/jdk/file/cfaa2457a60a/src/hotspot/share/utilities/macros.hpp#l456 >>>>> >>>>> >>>>> >>>>> Yes, you are right. Good catch! :-) >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> On 12/17/19 4:50 AM, Vladimir Ivanov wrote: >>>>>>> http://cr.openjdk.java.net/~vlivanov/7175279/webrev.00/ >>>>>>> https://bugs.openjdk.java.net/browse/JDK-7175279 >>>>>>> >>>>>>> There was a major rewrite of math intrinsics which in 9 time frame which almost completely eliminated x87 code in x86-64 code base. >>>>>>> >>>>>>> Proposed patch removes the rest and makes x86-64 code x87-free. >>>>>>> >>>>>>> The main motivation for the patch is to completely eliminate non-strictfp behaving code in order to prepare the JVM for JEP 306 [1] and related enhancements [2]. >>>>>>> >>>>>>> Most of the changes are in C1, but there is one case in template interpreter (java_lang_math_abs) which now uses StubRoutines::x86::double_sign_mask(). It forces its initialization to be moved to StubRoutines::initialize1(). >>>>>>> >>>>>>> x87 instructions are made available only on x86-32. >>>>>>> >>>>>>> C1 changes involve removing FPU support on x86-64 and effectively make x86-specific support in linear scan allocator [2] x86-32-only. >>>>>>> >>>>>>> Testing: tier1-6, build on x86-23, linux-aarch64, linux-arm32. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8175916 >>>>>>> >>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8136414 >>>>>>> >>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/ff7cd49f2aef/src/hotspot/cpu/x86/c1_LinearScan_x86.cpp From igor.veresov at oracle.com Thu Dec 19 23:12:06 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 19 Dec 2019 15:12:06 -0800 Subject: [14] RFR(M) 8235927: Update Graal In-Reply-To: <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com> References: <285e0384-3c94-c0fd-466f-2a0faaa1f3b0@oracle.com> Message-ID: <6AC6A96E-4055-4629-862D-256152EB7E6C@oracle.com> Thanks, Vladimir! igor > On Dec 19, 2019, at 12:54 PM, Vladimir Kozlov wrote: > > Looks good. There seem no new testing failures, only old. > > Thanks, > Vladimir > > On 12/19/19 11:59 AM, Igor Veresov wrote: >> JBS: https://bugs.openjdk.java.net/browse/JDK-8235927 >> Webrev: http://cr.openjdk.java.net/~iveresov/8235927/webrev/ >> Please find the list of changes in the JBS issue. >> Thanks, >> igor From sandhya.viswanathan at intel.com Fri Dec 20 00:39:40 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 20 Dec 2019 00:39:40 +0000 Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) Message-ID: TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) With Generic Operands (JDK-8234391), TEMP is now specialized to a vector register of max_vector_size(). There is a correctness issue with this. Say if we have an ad file instruct which had TEMP as vecX or vecY and it was required that the vector register be limited to xmm0-15 for KNL but not for SKX. Now we replaced that TEMP by max_vector_size(), the Matcher::specialize_generic_vector_operand() sets the temp to vecZ which has the range xmm0-31. As a background: vecX/vecY is the entire range (xmm0-xmm31) for SKX and (xmm0-xmm15) for KNL. vecZ is entire range (xmm0-xmm31) for both SKX and KNL. Such cases arise for add and mul reduction rules and the long replicates in x86.ad. The fix proposed for JDK 14 is to specialize the TEMP to legVecZ instead for KNL, thereby limiting it to xmm0-15. JBS: https://bugs.openjdk.java.net/browse/JDK-8236364 Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/ Best Regards, Sandhya From vladimir.kozlov at oracle.com Fri Dec 20 01:17:13 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 19 Dec 2019 17:17:13 -0800 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> Message-ID: <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: StubRoutines::_bigIntegerRightShiftWorker StubRoutines::_bigIntegerLeftShiftWorker In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). Thanks, Vladimir [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ On 12/18/19 6:33 PM, Kamath, Smita wrote: > Hi Vladimir, > > I have made the code changes you suggested (please look at the email below). > I have also enabled the intrinsic to run only when VBMI2 feature is available. > The intrinsic shows gains of >1.5x above 4k bit BigInteger. > > Webrev link: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ > > Thanks, > Smita > > -----Original Message----- > From: Vladimir Kozlov > Sent: Wednesday, December 11, 2019 10:55 AM > To: Kamath, Smita ; 'hotspot compiler' ; Viswanathan, Sandhya > Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 > > Hi Kamath, > > First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. > What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. > > Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? > > Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. > > In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. > For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. > Smita >>>done > > I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. > Smita >> removed UseVBMI2 flag > > In vm_version_x86.cpp you need to add more %s in print statement for new output. > Smita >>> done > > You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. > Smita >>> done > > You need to add intrinsics to Graal's test to ignore them: > > http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 > Smita >>>done > > Thanks, > Vladimir > > On 12/10/19 5:41 PM, Kamath, Smita wrote: >> Hi, >> >> >> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. >> >> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >> >> Link to webrev : >> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >> >> >> >> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. >> >> >> [1] >> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd >> m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. >> 2C 5-471) >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development-e >> mulator >> >> >> Regards, >> >> Smita Kamath >> From xxinliu at amazon.com Fri Dec 20 03:47:38 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 20 Dec 2019 03:47:38 +0000 Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler Message-ID: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com> Hi, Reviewers, Could you take a look at my webrev? I feel that those barrisetSet interfaces have nothing with c1_LIRAssembler. Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ I try to build on aarch64 and x86_64 and it?s fine. Thanks, --lx From goetz.lindenmaier at sap.com Fri Dec 20 10:12:53 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 20 Dec 2019 10:12:53 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com> References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com> Message-ID: Hi Vladimir, I refactored it a bit, see new webrev: http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/02/ Also, I filed 8236414: stringStream allocates on ResourceArea and C-heap https://bugs.openjdk.java.net/browse/JDK-8236414 ... but I'm not sure how to solve it, as stringStream inherits this capability, and having the char* on the ResourceArea as before is not good, either. Anyways, there are very few places where new is used with the stringStream, and the PrintInlining implementation is the only one where it's problematic. Maybe it would be better to simplify PrintInlining. Best regards, Goetz. > -----Original Message----- > From: Vladimir Kozlov > Sent: Donnerstag, 19. Dezember 2019 19:07 > To: Lindenmaier, Goetz ; hotspot-compiler- > dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > '8224193: stringStream should not use Resouce Area'. > > Please, file RFE for refactoring stringStream. > > Yes, the fix can go into JDK 14. > > But before that, I see the same pattern used 3 times in compile.cpp: > > + if (_print_inlining_stream != NULL) _print_inlining_stream- > >~stringStream(); > _print_inlining_stream = ; > > Can you use one function for that? Also our coding style requires to put body of > 'if' on separate line and use {}. > > thanks, > Vladimir > > On 12/19/19 4:52 AM, David Holmes wrote: > > On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote: > >> Hi, > >> > >> yes, it is confusing that parts are on the arena, other parts > >> are allocated in the C-heap. > >> But usages which allocate the stringStream with new() are > >> rare, usually it's allocated on the stack making all this > >> more simple.? And the previous design was even more > >> error-prone. > >> Also, the whole way to print the inlining information > >> is quite complex, with strange usage of the copy constructor > >> of PrintInliningBuffer ... which reaches into GrowableArray > >> which should have a constructor that does not use the > >> copy constructor to initialize the elements ... > >> > >> I do not intend to change stringStream in this change. > >> So can I consider this reviewed from your side? Or at > >> least that there is no veto :)? > > > > Sorry I was trying to convey this is Reviewed, but I do think this needs further > work in the future. > > > > Thanks, > > David > > > >> Thanks and best regards, > >> ?? Goetz. > >> > >> > >>> -----Original Message----- > >>> From: David Holmes > >>> Sent: Donnerstag, 19. Dezember 2019 11:39 > >>> To: Lindenmaier, Goetz ; Vladimir Kozlov > >>> ; hotspot-compiler-dev at openjdk.java.net > >>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>> dev at openjdk.java.net> > >>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >>> '8224193: stringStream should not use Resouce Area'. > >>> > >>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: > >>>> Hi David, Vladimir, > >>>> > >>>> stringStream is a ResourceObj, thus it lives on an arena. > >>>> This is uncritical, as it does not resize. > >>>> 8224193 only changed the allocation of the internal char*, > >>>> which always caused problems with resizing under > >>>> ResourceMarks that were not placed for the string but to > >>>> free other memory. > >>>> Thus stringStream must not be deallocated, and > >>>> also there was no mem leak before that change. > >>>> But we need to call the destructor to free the char*. > >>> > >>> I think we have a confusing mix of arena and C_heap usage with > >>> stringStream. Not clear to me why stringStream remains a resourceObj > >>> now? In many cases the stringStream is just local on the stack. In other > >>> cases if it is new'd then it should be C-heap same as the array and then > >>> you could delete it too. > >>> > >>> What you have may suffice to initially address the leak but I think this > >>> whole thing needs revisiting. > >>> > >>> Thanks, > >>> David > >>> > >>>> > >>>> Best regards, > >>>> ??? Goetz. > >>>> > >>>>> -----Original Message----- > >>>>> From: hotspot-runtime-dev >>> bounces at openjdk.java.net> > >>>>> On Behalf Of David Holmes > >>>>> Sent: Mittwoch, 18. Dezember 2019 04:33 > >>>>> To: Vladimir Kozlov ; hotspot-compiler- > >>>>> dev at openjdk.java.net > >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>>>> dev at openjdk.java.net> > >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >>>>> '8224193: stringStream should not use Resouce Area'. > >>>>> > >>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > >>>>>> CCing to Runtime group. > >>>>>> > >>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not > obvious. > >>>>>> I would definitively miss to do that if I use stringStreams in some new > >>>>>> code. > >>>>> > >>>>> But that is not a problem added by this changeset, the problem is that > >>>>> we're not deallocating these stringStreams even though we should be.? If > >>>>> you use a stringStream in new code you have to manage its lifecycle. > >>>>> > >>>>> That said why is this: > >>>>> > >>>>> ??? if (_print_inlining_stream != NULL) > >>>>> _print_inlining_stream->~stringStream(); > >>>>> > >>>>> not just: > >>>>> > >>>>> delete _print_inlining_stream; > >>>>> > >>>>> ? > >>>>> > >>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we > explicitly > >>>>> calling the destructor rather than calling delete? > >>>>> > >>>>> Cheers, > >>>>> David > >>>>> > >>>>>> May be someone can suggest some C++ trick to do that automatically. > >>>>>> Thanks, > >>>>>> Vladimir > >>>>>> > >>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I'm resending this with fixed bugId ... > >>>>>>> Sorry! > >>>>>>> > >>>>>>> Best regards, > >>>>>>> ? ?? Goetz > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with > new > >>> and > >>>>>>>> relied on the fact that all memory used is on the ResourceArea > cleaned > >>>>>>>> after the compilation. > >>>>>>>> > >>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus > >>>>>>>> must be freed. No doing so manifests a memory leak. > >>>>>>>> This is only relevant if the corresponding tracing is active. > >>>>>>>> > >>>>>>>> To fix TraceLoopPredicate I added the destructor call > >>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several > >>>>>>>> stringStreams. A row of them is in a GrowableArray which must > >>>>>>>> be walked to free all of them. > >>>>>>>> As the GrowableArray is on an arena no destructor is called for it. > >>>>>>>> > >>>>>>>> I also changed some as_string() calls to base() calls which reduced > >>>>>>>> memory need of the traces, and added a comment explaining the > >>>>>>>> constructor of GrowableArray that calls the copyconstructor for its > >>>>>>>> elements. > >>>>>>>> > >>>>>>>> Please review: > >>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- > >>>>> c2_tracing_mem_leak/01/ > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> ? ?? Goetz. > >>>>>>> From vladimir.x.ivanov at oracle.com Fri Dec 20 10:19:41 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 20 Dec 2019 13:19:41 +0300 Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) In-Reply-To: References: Message-ID: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com> Hi Sandhya, > Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/ I'd prefer to see the check as a special case with a comment: MachOper* Matcher::specialize_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg) { assert(Matcher::is_generic_vector(generic_opnd), "not generic"); bool legacy = (generic_opnd->opcode() == LEGVEC); + if (!VM_Version::supports_avx512vlbwdq() && // KNL + is_temp && !legacy && (ideal_reg == Op_VecZ)) { + // Conservatively specialize 512bit vec TEMP operands to legVecZ (zmm0-15) on KNL. + return new legVecZOper(); + } if (legacy) { switch (ideal_reg) { case Op_VecS: return new legVecSOper(); Otherwise, looks good. I consider it as a stop-the-gap solution for 14. In 15 we need to get rid of it and adjust TEMP operand types in x86.ad instead. Please, file an RFE for it. Best regards, Vladimir Ivanov > > Best Regards, > Sandhya > > > From Pengfei.Li at arm.com Fri Dec 20 10:21:43 2019 From: Pengfei.Li at arm.com (Pengfei Li) Date: Fri, 20 Dec 2019 10:21:43 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> Message-ID: Hi, I'm back for this patch. > That is starting to sound very attractive. With a 64-bit address space I'm > finding it very hard to imagine a scenario in which we don't find a suitable > address. I think AOT-compiled code would still be OK, because it generates > different code, but we'd have to do some testing. Since Nick's recent metaspace reservation fix [1] has completely removed the use of r27, my patch becomes much simpler now. I have removed the condition of UseCompressedClassPointers, rebased the code and created a new webrev. Could you please help review again? Webrev: http://cr.openjdk.java.net/~pli/rfr/8233743/webrev.02/ [1] http://hg.openjdk.java.net/jdk/jdk/rev/dd4b4f273274 -- Thanks, Pengfei From martin.doerr at sap.com Fri Dec 20 10:58:13 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 20 Dec 2019 10:58:13 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> Message-ID: Hi, builds were successful on all the platforms I have added. A lot of tests were running over night and I haven't seen any new issues. Can I push this version? http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/ Best regards, Martin > -----Original Message----- > From: Aditya Mandaleeka > Sent: Donnerstag, 19. Dezember 2019 18:49 > To: Doerr, Martin ; Roland Westrelin > ; hotspot compiler dev at openjdk.java.net> > Cc: shenandoah-dev > Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS > > Thanks for updating the other platforms Martin. Those changes look right to > me. > > -Aditya > > -----Original Message----- > From: Doerr, Martin > Sent: Thursday, December 19, 2019 8:31 AM > To: Roland Westrelin ; Aditya Mandaleeka > ; hotspot compiler dev at openjdk.java.net> > Cc: shenandoah-dev > Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS > > Hi everybody, > > thanks for fixing this issue. > > I guess it's currently used on some platforms, but I think we should fix it for > all platforms. Otherwise it will break when using the parts which were only > fixed for x86. > > Here's my proposal: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openj > dk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&am > p;data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d > 784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698 > 748787948&sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BU > WGY%3D&reserved=0 > > I'll run tests on more platforms. > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Roland Westrelin > > Sent: Donnerstag, 19. Dezember 2019 15:15 > > To: Aditya Mandaleeka ; hotspot compiler > > > > Cc: shenandoah-dev > > Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS > > > > > > Hi Aditya, > > > > AFAIK, it's a requirement that the patch be posted on the openjdk > > infrastructure. So here it is: > > > > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open > > > jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&data=02%7C01 > %7Cadi > > > tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf > 86f1 > > > 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&sdata=vQ1 > xR87EjA > > bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&reserved=0 > > > > The change looks good to me but it would be good to check whether > > architectures other than x86 need a similar change. > > > > Roland. From aph at redhat.com Fri Dec 20 11:09:13 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 20 Dec 2019 12:09:13 +0100 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> Message-ID: <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com> On 12/20/19 10:21 AM, Pengfei Li wrote: > Since Nick's recent metaspace reservation fix [1] has completely removed the use of r27, my patch becomes much simpler now. I have removed the condition of UseCompressedClassPointers, rebased the code and created a new webrev. Could you please help review again? What happens when we use Graal as a replacement for C2, particularly when Graal needs a heap base register? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Fri Dec 20 11:10:59 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 20 Dec 2019 12:10:59 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> Message-ID: <509cc49c-4e68-f2a0-6d11-4ece4ddfdd9e@redhat.com> Fine by me. Thanks, Roman > Hi, > > builds were successful on all the platforms I have added. A lot of tests were running over night and I haven't seen any new issues. > > Can I push this version? > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/ > > Best regards, > Martin > > >> -----Original Message----- >> From: Aditya Mandaleeka >> Sent: Donnerstag, 19. Dezember 2019 18:49 >> To: Doerr, Martin ; Roland Westrelin >> ; hotspot compiler > dev at openjdk.java.net> >> Cc: shenandoah-dev >> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS >> >> Thanks for updating the other platforms Martin. Those changes look right to >> me. >> >> -Aditya >> >> -----Original Message----- >> From: Doerr, Martin >> Sent: Thursday, December 19, 2019 8:31 AM >> To: Roland Westrelin ; Aditya Mandaleeka >> ; hotspot compiler > dev at openjdk.java.net> >> Cc: shenandoah-dev >> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS >> >> Hi everybody, >> >> thanks for fixing this issue. >> >> I guess it's currently used on some platforms, but I think we should fix it for >> all platforms. Otherwise it will break when using the parts which were only >> fixed for x86. >> >> Here's my proposal: >> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openj >> dk.java.net%2F~mdoerr%2F8236179_C1_T_ADDRESS%2Fwebrev.01%2F&am >> p;data=02%7C01%7Cadityam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d >> 784a0dcc1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637123698 >> 748787948&sdata=r11YVMnHSLm1Ms1Ipbq4vPDOhIwlrM8fz1QlAl%2BU >> WGY%3D&reserved=0 >> >> I'll run tests on more platforms. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Roland Westrelin >>> Sent: Donnerstag, 19. Dezember 2019 15:15 >>> To: Aditya Mandaleeka ; hotspot compiler >>> >>> Cc: shenandoah-dev >>> Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS >>> >>> >>> Hi Aditya, >>> >>> AFAIK, it's a requirement that the patch be posted on the openjdk >>> infrastructure. So here it is: >>> >>> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open >>> >> jdk.java.net%2F~roland%2F8236179%2Fwebrev.00%2F&data=02%7C01 >> %7Cadi >>> >> tyam%40microsoft.com%7C3d2013a8ff1d4ffbafaa08d784a0dcc1%7C72f988bf >> 86f1 >>> >> 41af91ab2d7cd011db47%7C1%7C0%7C637123698748787948&sdata=vQ1 >> xR87EjA >>> bf%2Bnwscs1c%2BpTqWLfeVODLz%2FleIsdmthU%3D&reserved=0 >>> >>> The change looks good to me but it would be good to check whether >>> architectures other than x86 need a similar change. >>> >>> Roland. > From tobias.hartmann at oracle.com Fri Dec 20 11:13:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 20 Dec 2019 12:13:15 +0100 Subject: [14] RFR(S): 8233164: C2 fails with assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr)) failed: correct memory chain Message-ID: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8233164 http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/ The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on individual loads from a non-escaping src array. I was able to extract a test from the intermittently failing application that depends on indify string concat (test1). From that, I've created another test that does not depend on Strings (test2). Gory details based on test2: Two subsequent arraycopies copy data from two sources into the same destination. The ideal transformation replaces the first arraycopy by a forward and backward copy (because ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added to select between the memory outputs of the forward and backward stores: http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617 574 StoreB === 559 572 553 573 [[ 568 576 ]] @byte[int:0..max-2]:NotNull:exact+any ... 567 StoreB === 558 562 565 566 [[ 560 576 ]] @byte[int:0..max-2]:NotNull:exact+any ... 575 Region === 575 558 559 [[ 575 576 ]] 576 Phi === 575 567 574 [[]] #memory Memory: @byte[int:>=0]:exact+any *, idx=6; The second arraycopy can't be optimized (yet) because the src operand and its length are not known but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now the ideal transformation is able to replace the second arraycopy by loads/stores as well. The memory slice for the first load is selected from the MergeMem based on its address type ('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created when optimizing the first arraycopy: 305 CheckCastPP === 302 300 [[ 385 316 420 494 ]] #byte[int:1]:NotNull:exact *,iid=288 494 CastPP === 486 305 [[ 526 514 626 626 ]] #byte[int:>=0]:NotNull:exact * 626 AddP === _ 494 494 68 [[ 629 ]] 576 Phi === 575 567 574 [[ 560 628 629 ]] #memory Memory: @byte[int:>=0]:exact+any 629 LoadB === 508 576 626 [[]] @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte That's obviously incorrect because now the LoadB has an address input into a non-escaping source with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice: http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain. The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads and stores. These will have the correct type if the source or destination array is non-escaping. The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08 when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The code was also changed significantly by JDK-8210887 in JDK 12. Thanks, Tobias From vladimir.x.ivanov at oracle.com Fri Dec 20 11:25:30 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 20 Dec 2019 14:25:30 +0300 Subject: [14] RFR(S): 8233164: C2 fails with assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr)) failed: correct memory chain In-Reply-To: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> Message-ID: <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com> > http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/ Looks good. Best regards, Vladimir Ivanov > The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on > individual loads from a non-escaping src array. > > I was able to extract a test from the intermittently failing application that depends on indify > string concat (test1). From that, I've created another test that does not depend on Strings (test2). > > Gory details based on test2: > Two subsequent arraycopies copy data from two sources into the same destination. The ideal > transformation replaces the first arraycopy by a forward and backward copy (because > ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added > to select between the memory outputs of the forward and backward stores: > http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617 > > 574 StoreB === 559 572 553 573 [[ 568 576 ]] @byte[int:0..max-2]:NotNull:exact+any ... > 567 StoreB === 558 562 565 566 [[ 560 576 ]] @byte[int:0..max-2]:NotNull:exact+any ... > 575 Region === 575 558 559 [[ 575 576 ]] > 576 Phi === 575 567 574 [[]] #memory Memory: @byte[int:>=0]:exact+any *, idx=6; > > The second arraycopy can't be optimized (yet) because the src operand and its length are not known > but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now > the ideal transformation is able to replace the second arraycopy by loads/stores as well. > > The memory slice for the first load is selected from the MergeMem based on its address type > ('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the > non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created > when optimizing the first arraycopy: > > 305 CheckCastPP === 302 300 [[ 385 316 420 494 ]] #byte[int:1]:NotNull:exact *,iid=288 > 494 CastPP === 486 305 [[ 526 514 626 626 ]] #byte[int:>=0]:NotNull:exact * > 626 AddP === _ 494 494 68 [[ 629 ]] > 576 Phi === 575 567 574 [[ 560 628 629 ]] #memory Memory: @byte[int:>=0]:exact+any > 629 LoadB === 508 576 626 [[]] @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte > > That's obviously incorrect because now the LoadB has an address input into a non-escaping source > with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice: > http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png > > Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into > non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain. > > The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads > and stores. These will have the correct type if the source or destination array is non-escaping. > > The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08 > when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on > the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if > we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The > code was also changed significantly by JDK-8210887 in JDK 12. > > Thanks, > Tobias > From tobias.hartmann at oracle.com Fri Dec 20 11:34:50 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 20 Dec 2019 12:34:50 +0100 Subject: [14] RFR(S): 8233164: C2 fails with assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr)) failed: correct memory chain In-Reply-To: <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com> References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> <9f1a459c-2f46-eb5d-ff3d-0258e6bd4888@oracle.com> Message-ID: Hi Vladimir, thanks for the quick review! Best regards, Tobias On 20.12.19 12:25, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> The problem is that the arraycopy ideal transformation does not correctly wire memory inputs on >> individual loads from a non-escaping src array. >> >> I was able to extract a test from the intermittently failing application that depends on indify >> string concat (test1). From that, I've created another test that does not depend on Strings (test2). >> >> Gory details based on test2: >> Two subsequent arraycopies copy data from two sources into the same destination. The ideal >> transformation replaces the first arraycopy by a forward and backward copy (because >> ArrayCopyNode::array_copy_test_overlap determines that the arrays may overlap). A PhiNode is added >> to select between the memory outputs of the forward and backward stores: >> http://hg.openjdk.java.net/jdk/jdk/file/2fbc66ef1a1d/src/hotspot/share/opto/arraycopynode.cpp#l617 >> >> ? 574??? StoreB??? ===? 559? 572? 553? 573? [[ 568? 576 ]]? @byte[int:0..max-2]:NotNull:exact+any ... >> ? 567??? StoreB??? ===? 558? 562? 565? 566? [[ 560? 576 ]]? @byte[int:0..max-2]:NotNull:exact+any ... >> ? 575??? Region??? ===? 575? 558? 559? [[ 575? 576 ]] >> ? 576??? Phi??? ===? 575? 567? 574? [[]]? #memory? Memory: @byte[int:>=0]:exact+any *, idx=6; >> >> The second arraycopy can't be optimized (yet) because the src operand and its length are not known >> but once Escape Analysis is executed, it determines that both source arrays are non-escaping. Now >> the ideal transformation is able to replace the second arraycopy by loads/stores as well. >> >> The memory slice for the first load is selected from the MergeMem based on its address type >> ('atp_src') which is the general byte[int:>=0] slice due to a CastPP that hides the type of the >> non-escaping source array. As a result, we wire it to the byte[int:>=0] memory Phi that was created >> when optimizing the first arraycopy: >> >> ? 305??? CheckCastPP ===? 302? 300? [[ 385? 316? 420? 494 ]]? #byte[int:1]:NotNull:exact *,iid=288 >> ? 494??? CastPP??? ===? 486? 305? [[ 526? 514? 626? 626 ]]? #byte[int:>=0]:NotNull:exact * >> ? 626??? AddP??? === _? 494? 494? 68? [[ 629 ]] >> ? 576??? Phi??? ===? 575? 567? 574? [[ 560? 628? 629 ]]? #memory? Memory: @byte[int:>=0]:exact+any >> ? 629??? LoadB??? ===? 508? 576? 626? [[]]? @byte[int:>=0]:NotNull:exact+any *, idx=6; #byte >> >> That's obviously incorrect because now the LoadB has an address input into a non-escaping source >> with type byte[int:1], iid=288 and a memory input from the independent byte[int:>=0] slice: >> http://cr.openjdk.java.net/~thartmann/8233164/graph_Failure.png >> >> Basically, 629 LoadB uses memory from 47 AllocateArray when loading from an address into >> non-escaping 288 AllocateArray. We then hit an assert in MemNode::optimize_memory_chain. >> >> The fix is to use _src_type/_dest_type (introduced by JDK-8076188) as address types for the loads >> and stores. These will have the correct type if the source or destination array is non-escaping. >> >> The affected code is in there for a long time but I can only reproduce this back until JDK 13 b08 >> when JDK-8217990 was fixed. I don't think that fix is related but since the crash greatly depends on >> the order in which nodes are processed by IGVN, it probably just triggers the issue (for example, if >> we process a CastPP node first, it goes away and we get the correct type from the CheckCastPP). The >> code was also changed significantly by JDK-8210887 in JDK 12. >> >> Thanks, >> Tobias >> From rwestrel at redhat.com Fri Dec 20 14:01:20 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 20 Dec 2019 15:01:20 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> Message-ID: <87bls3ksen.fsf@redhat.com> Hi Martin, > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/ c1_LIRAssembler_aarch64.cpp and c1_LIRAssembler_s390.cpp: shouldn't stack2reg be fixed too? Roland. From martin.doerr at sap.com Fri Dec 20 14:47:59 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 20 Dec 2019 14:47:59 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: <87bls3ksen.fsf@redhat.com> References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> Message-ID: Hi Roland, good catch. I'll push this version if there are no objections: http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ Best regards, Martin > -----Original Message----- > From: Roland Westrelin > Sent: Freitag, 20. Dezember 2019 15:01 > To: Doerr, Martin ; Aditya Mandaleeka > ; hotspot compiler dev at openjdk.java.net> > Cc: shenandoah-dev > Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS > > > Hi Martin, > > > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.01/ > > c1_LIRAssembler_aarch64.cpp and c1_LIRAssembler_s390.cpp: shouldn't > stack2reg be fixed too? > > Roland. From martin.doerr at sap.com Fri Dec 20 15:03:44 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 20 Dec 2019 15:03:44 +0000 Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com> References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com> Message-ID: Hi lx, PPC and s390 parts are ok. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Liu, Xin > Sent: Freitag, 20. Dezember 2019 04:48 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > c1_LIRAssembler > > Hi, Reviewers, > > Could you take a look at my webrev? I feel that those barrisetSet interfaces > have nothing with c1_LIRAssembler. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > > I try to build on aarch64 and x86_64 and it?s fine. > > Thanks, > --lx From rwestrel at redhat.com Fri Dec 20 15:53:15 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 20 Dec 2019 16:53:15 +0100 Subject: [14] RFR(S): 8233164: C2 fails with assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr)) failed: correct memory chain In-Reply-To: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> Message-ID: <878sn7kn84.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8233164/webrev.00/ I'm wondering whether this problem could show up elsewhere and if a more generic fix would be needed (maybe EA should set the type of the CastPP so there's no inconsistency when IGVN runs). The fix looks good to address this particular problem but I think this deserves a follow up bug to investigate this further. Roland. From rwestrel at redhat.com Fri Dec 20 15:58:53 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 20 Dec 2019 16:58:53 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> Message-ID: <875zibkmyq.fsf@redhat.com> > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ That looks good to me. Roland. From rwestrel at redhat.com Fri Dec 20 16:48:24 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 20 Dec 2019 17:48:24 +0100 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: References: <87o8w8ptsq.fsf@redhat.com> Message-ID: <8736dfkko7.fsf@redhat.com> Thanks for reviewing this, Vladimir. > cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout? This: http://cr.openjdk.java.net/~roland/8231291/webrev.02/ > Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. > Did you find some issues? I hit a crash when doing some CTW testing because an argument to must_be_not_null() was already known to be not null. I suppose it's unrelated to that change and something changed in the libraries instead. Roland. From rwestrel at redhat.com Fri Dec 20 16:56:37 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 20 Dec 2019 17:56:37 +0100 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com> References: <87o8w8ptsq.fsf@redhat.com> <3dd0917e-f307-5f22-dd68-786c90ec47a5@oracle.com> Message-ID: <87zhfnj5q2.fsf@redhat.com> Hi Vladimir, Thanks for reviewing this. > As I understand, the intention of the change you propose is to perform > complete loop unrolling earlier so EA can benefit from it. Yes. > It looks like LoopOptsMaxUnroll is a shortened version of > IdealLoopTree::iteration_split/iteration_split_impl(). Yes. > Have you considered factoring out the common code? Right now, its hard > to correlate the checks for LoopOptsMaxUnroll with iteration_split() and > there's a risk they'll diverge eventually. No I haven't. But it seems it's either we duplicate the code as I did or we add some flag to iteration_split() and make some of the code their conditional. Wouldn't that affect clarity and for what's overall a fairly simple change? > Do you need the following steps from the original version? > > ================================================ > // Look for loop-exit tests with my 50/50 guesses from the Parsing stage. > // Replace with a 1-in-10 exit guess. > if (!is_root() && is_loop()) { > adjust_loop_exit_prob(phase); > } > > // Compute loop trip count from profile data > compute_profile_trip_cnt(phase); > > No use of profiling data since full unrolling is happening anyway? Right. > ======================== > if (!cl->is_valid_counted_loop()) return true; // Ignore various > kinds of broken loops policy_maximally_unroll() has that check. > ======================== > // Do nothing special to pre- and post- loops > if (cl->is_pre_loop() || cl->is_post_loop()) return true; > > I assume there are no pre-/post-loops exist at that point, so these > checks are redundant. Turn them into asserts? I added a check for is_normal_loop() to be safe. > ======================== > if (cl->is_normal_loop()) { > if (policy_unswitching(phase)) { > phase->do_unswitching(this, old_new); > return true; > } > if (policy_maximally_unroll(phase)) { > // Here we did some unrolling and peeling. Eventually we will > // completely unroll this loop and it will no longer be a loop. > phase->do_maximally_unroll(this, old_new); > return true; > } > > You don't perform loop unswitching at all. So, the order of operations > changes. Do you see any problems with that? I'm not sure what to think about that one. Yes, I suppose it could be that unswitching helps but if we unswitch we have to perform 2 passes of loop opts to get to the maximally unrolled loop. Isn't that more disruptive that we would want? > ======================== > > > src/hotspot/share/opto/compile.cpp > // Perform escape analysis > if (_do_escape_analysis && ConnectionGraph::has_candidates(this)) { > if (has_loops()) { > // Cleanup graph (remove dead nodes). > TracePhase tp("idealLoop", &timers[_t_idealLoop]); > - PhaseIdealLoop::optimize(igvn, LoopOptsNone); > + PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll); > if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2); > if (failing()) return; > } > ConnectionGraph::do_analysis(this, &igvn); > > Does it make sense to do more elaborate checks before performing early > full loop unrolling? Like whether candidates are used inside loop bodies? We would maximally unroll the loop anyway, only a bit later so I don't think we need to make this more involved that it needs to be. new webrev: http://cr.openjdk.java.net/~roland/8231291/webrev.02/ Roland. From navy.xliu at gmail.com Fri Dec 20 17:24:35 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Fri, 20 Dec 2019 09:24:35 -0800 Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com> Message-ID: Martin? Thank you very much. May I know how to validate Sparc? I don't have any SPARC machine to access. Thanks, --lx On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin wrote: > Hi lx, > > PPC and s390 parts are ok. > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Liu, Xin > > Sent: Freitag, 20. Dezember 2019 04:48 > > To: 'hotspot-compiler-dev at openjdk.java.net' > dev at openjdk.java.net> > > Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > > c1_LIRAssembler > > > > Hi, Reviewers, > > > > Could you take a look at my webrev? I feel that those barrisetSet > interfaces > > have nothing with c1_LIRAssembler. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > > Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > > > > I try to build on aarch64 and x86_64 and it?s fine. > > > > Thanks, > > --lx > > From martin.doerr at sap.com Fri Dec 20 18:54:47 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 20 Dec 2019 18:54:47 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: <875zibkmyq.fsf@redhat.com> References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> <875zibkmyq.fsf@redhat.com> Message-ID: Hi Roland, thanks for reviewing it. Pushed to jdk/jdk. I guess we'll have to backport it after some testing time. Best regards, Martin > -----Original Message----- > From: Roland Westrelin > Sent: Freitag, 20. Dezember 2019 16:59 > To: Doerr, Martin ; Aditya Mandaleeka > ; hotspot compiler dev at openjdk.java.net> > Cc: shenandoah-dev > Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS > > > > http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ > > That looks good to me. > > Roland. From sandhya.viswanathan at intel.com Fri Dec 20 19:35:13 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 20 Dec 2019 19:35:13 +0000 Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) In-Reply-To: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com> References: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com> Message-ID: Hi Vladimir, Thanks for the review. Please find the updated webrev below for JDK 14: JBS: https://bugs.openjdk.java.net/browse/JDK-8236364 Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/ RFE filed for JDK 15: https://bugs.openjdk.java.net/browse/JDK-8236446 Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Friday, December 20, 2019 2:20 AM To: Viswanathan, Sandhya ; hotspot compiler Subject: Re: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) Hi Sandhya, > Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/ I'd prefer to see the check as a special case with a comment: MachOper* Matcher::specialize_generic_vector_operand(MachOper* generic_opnd, uint ideal_reg) { assert(Matcher::is_generic_vector(generic_opnd), "not generic"); bool legacy = (generic_opnd->opcode() == LEGVEC); + if (!VM_Version::supports_avx512vlbwdq() && // KNL + is_temp && !legacy && (ideal_reg == Op_VecZ)) { + // Conservatively specialize 512bit vec TEMP operands to legVecZ (zmm0-15) on KNL. + return new legVecZOper(); + } if (legacy) { switch (ideal_reg) { case Op_VecS: return new legVecSOper(); Otherwise, looks good. I consider it as a stop-the-gap solution for 14. In 15 we need to get rid of it and adjust TEMP operand types in x86.ad instead. Please, file an RFE for it. Best regards, Vladimir Ivanov > > Best Regards, > Sandhya > > > From rkennke at redhat.com Fri Dec 20 19:58:42 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 20 Dec 2019 20:58:42 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> <875zibkmyq.fsf@redhat.com> Message-ID: <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com> Hi Martin, > thanks for reviewing it. Pushed to jdk/jdk. Thanks a lot! > I guess we'll have to backport it after some testing time. Yes, we're gonna need it in 11u and 8u. Thanks and have a nice weekend (and xmas, etc, if you're also taking time off)! Cheers, Roman > Best regards, > Martin > > >> -----Original Message----- >> From: Roland Westrelin >> Sent: Freitag, 20. Dezember 2019 16:59 >> To: Doerr, Martin ; Aditya Mandaleeka >> ; hotspot compiler > dev at openjdk.java.net> >> Cc: shenandoah-dev >> Subject: RE: RFR: 8236179: C1 register allocation error with T_ADDRESS >> >> >>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ >> >> That looks good to me. >> >> Roland. > From adityam at microsoft.com Fri Dec 20 20:06:14 2019 From: adityam at microsoft.com (Aditya Mandaleeka) Date: Fri, 20 Dec 2019 20:06:14 +0000 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com> References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> <875zibkmyq.fsf@redhat.com> <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com> Message-ID: Thanks again to everyone who helped get this change in! I am happy to help backport it as well if it can wait until January. Thanks, Aditya -----Original Message----- From: Roman Kennke Sent: Friday, December 20, 2019 11:59 AM To: Doerr, Martin ; Roland Westrelin ; Aditya Mandaleeka ; hotspot compiler Cc: shenandoah-dev Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS Hi Martin, > thanks for reviewing it. Pushed to jdk/jdk. Thanks a lot! > I guess we'll have to backport it after some testing time. Yes, we're gonna need it in 11u and 8u. Thanks and have a nice weekend (and xmas, etc, if you're also taking time off)! Cheers, Roman > Best regards, > Martin > > >> -----Original Message----- >> From: Roland Westrelin >> Sent: Freitag, 20. Dezember 2019 16:59 >> To: Doerr, Martin ; Aditya Mandaleeka >> ; hotspot compiler > dev at openjdk.java.net> >> Cc: shenandoah-dev >> Subject: RE: RFR: 8236179: C1 register allocation error with >> T_ADDRESS >> >> >>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ >> >> That looks good to me. >> >> Roland. > From rkennke at redhat.com Fri Dec 20 20:37:24 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 20 Dec 2019 21:37:24 +0100 Subject: RFR: 8236179: C1 register allocation error with T_ADDRESS In-Reply-To: References: <87h81wl7vo.fsf@redhat.com> <87bls3ksen.fsf@redhat.com> <875zibkmyq.fsf@redhat.com> <0e22c4e6-74b9-a50b-df44-83c332457f14@redhat.com> Message-ID: Hi Aditya, > Thanks again to everyone who helped get this change in! I am happy to help backport it as well if it can wait until January. Thank *you* for figuring this out in the first place. This seems a rather serious bug for Shenandoah GC (and I'm still a bit surprised how we haven't seen it yet). I just realized we're also gonna need it in JDK14. I am not even quite sure what the process for this would be. We'll figure it out. Thank you! Roman > Thanks, > Aditya > > -----Original Message----- > From: Roman Kennke > Sent: Friday, December 20, 2019 11:59 AM > To: Doerr, Martin ; Roland Westrelin ; Aditya Mandaleeka ; hotspot compiler > Cc: shenandoah-dev > Subject: Re: RFR: 8236179: C1 register allocation error with T_ADDRESS > > Hi Martin, > >> thanks for reviewing it. Pushed to jdk/jdk. > > Thanks a lot! > >> I guess we'll have to backport it after some testing time. > > Yes, we're gonna need it in 11u and 8u. > > Thanks and have a nice weekend (and xmas, etc, if you're also taking time off)! > > Cheers, > Roman > >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: Roland Westrelin >>> Sent: Freitag, 20. Dezember 2019 16:59 >>> To: Doerr, Martin ; Aditya Mandaleeka >>> ; hotspot compiler >> dev at openjdk.java.net> >>> Cc: shenandoah-dev >>> Subject: RE: RFR: 8236179: C1 register allocation error with >>> T_ADDRESS >>> >>> >>>> http://cr.openjdk.java.net/~mdoerr/8236179_C1_T_ADDRESS/webrev.02/ >>> >>> That looks good to me. >>> >>> Roland. >> > From vladimir.kozlov at oracle.com Fri Dec 20 21:14:54 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 13:14:54 -0800 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com> Message-ID: On 12/20/19 2:12 AM, Lindenmaier, Goetz wrote: > Hi Vladimir, > > I refactored it a bit, see new webrev: > http://cr.openjdk.java.net/~goetz/wr19/8235998-c2_tracing_mem_leak/02/ Looks good to me. > > Also, I filed > 8236414: stringStream allocates on ResourceArea and C-heap > https://bugs.openjdk.java.net/browse/JDK-8236414 > ... but I'm not sure how to solve it, as stringStream inherits > this capability, and having the char* on the ResourceArea > as before is not good, either. > Anyways, there are very few places where new is used > with the stringStream, and the PrintInlining implementation > is the only one where it's problematic. Maybe it would be > better to simplify PrintInlining. Yes, simplifying PrintInlining is also option. Thanks, Vladimir > > Best regards, > Goetz. > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Donnerstag, 19. Dezember 2019 19:07 >> To: Lindenmaier, Goetz ; hotspot-compiler- >> dev at openjdk.java.net >> Cc: hotspot-runtime-dev at openjdk.java.net runtime > dev at openjdk.java.net> >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >> '8224193: stringStream should not use Resouce Area'. >> >> Please, file RFE for refactoring stringStream. >> >> Yes, the fix can go into JDK 14. >> >> But before that, I see the same pattern used 3 times in compile.cpp: >> >> + if (_print_inlining_stream != NULL) _print_inlining_stream- >>> ~stringStream(); >> _print_inlining_stream = ; >> >> Can you use one function for that? Also our coding style requires to put body of >> 'if' on separate line and use {}. >> >> thanks, >> Vladimir >> >> On 12/19/19 4:52 AM, David Holmes wrote: >>> On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote: >>>> Hi, >>>> >>>> yes, it is confusing that parts are on the arena, other parts >>>> are allocated in the C-heap. >>>> But usages which allocate the stringStream with new() are >>>> rare, usually it's allocated on the stack making all this >>>> more simple.? And the previous design was even more >>>> error-prone. >>>> Also, the whole way to print the inlining information >>>> is quite complex, with strange usage of the copy constructor >>>> of PrintInliningBuffer ... which reaches into GrowableArray >>>> which should have a constructor that does not use the >>>> copy constructor to initialize the elements ... >>>> >>>> I do not intend to change stringStream in this change. >>>> So can I consider this reviewed from your side? Or at >>>> least that there is no veto :)? >>> >>> Sorry I was trying to convey this is Reviewed, but I do think this needs further >> work in the future. >>> >>> Thanks, >>> David >>> >>>> Thanks and best regards, >>>> ?? Goetz. >>>> >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Donnerstag, 19. Dezember 2019 11:39 >>>>> To: Lindenmaier, Goetz ; Vladimir Kozlov >>>>> ; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>>> dev at openjdk.java.net> >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>>>> '8224193: stringStream should not use Resouce Area'. >>>>> >>>>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: >>>>>> Hi David, Vladimir, >>>>>> >>>>>> stringStream is a ResourceObj, thus it lives on an arena. >>>>>> This is uncritical, as it does not resize. >>>>>> 8224193 only changed the allocation of the internal char*, >>>>>> which always caused problems with resizing under >>>>>> ResourceMarks that were not placed for the string but to >>>>>> free other memory. >>>>>> Thus stringStream must not be deallocated, and >>>>>> also there was no mem leak before that change. >>>>>> But we need to call the destructor to free the char*. >>>>> >>>>> I think we have a confusing mix of arena and C_heap usage with >>>>> stringStream. Not clear to me why stringStream remains a resourceObj >>>>> now? In many cases the stringStream is just local on the stack. In other >>>>> cases if it is new'd then it should be C-heap same as the array and then >>>>> you could delete it too. >>>>> >>>>> What you have may suffice to initially address the leak but I think this >>>>> whole thing needs revisiting. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> >>>>>> Best regards, >>>>>> ??? Goetz. >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: hotspot-runtime-dev >>>> bounces at openjdk.java.net> >>>>>>> On Behalf Of David Holmes >>>>>>> Sent: Mittwoch, 18. Dezember 2019 04:33 >>>>>>> To: Vladimir Kozlov ; hotspot-compiler- >>>>>>> dev at openjdk.java.net >>>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime >>>>>> dev at openjdk.java.net> >>>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after >>>>>>> '8224193: stringStream should not use Resouce Area'. >>>>>>> >>>>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: >>>>>>>> CCing to Runtime group. >>>>>>>> >>>>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not >> obvious. >>>>>>>> I would definitively miss to do that if I use stringStreams in some new >>>>>>>> code. >>>>>>> >>>>>>> But that is not a problem added by this changeset, the problem is that >>>>>>> we're not deallocating these stringStreams even though we should be.? If >>>>>>> you use a stringStream in new code you have to manage its lifecycle. >>>>>>> >>>>>>> That said why is this: >>>>>>> >>>>>>> ??? if (_print_inlining_stream != NULL) >>>>>>> _print_inlining_stream->~stringStream(); >>>>>>> >>>>>>> not just: >>>>>>> >>>>>>> delete _print_inlining_stream; >>>>>>> >>>>>>> ? >>>>>>> >>>>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we >> explicitly >>>>>>> calling the destructor rather than calling delete? >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> May be someone can suggest some C++ trick to do that automatically. >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm resending this with fixed bugId ... >>>>>>>>> Sorry! >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> ? ?? Goetz >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams with >> new >>>>> and >>>>>>>>>> relied on the fact that all memory used is on the ResourceArea >> cleaned >>>>>>>>>> after the compilation. >>>>>>>>>> >>>>>>>>>> Since 8224193 the char* of the stringStream is malloced and thus >>>>>>>>>> must be freed. No doing so manifests a memory leak. >>>>>>>>>> This is only relevant if the corresponding tracing is active. >>>>>>>>>> >>>>>>>>>> To fix TraceLoopPredicate I added the destructor call >>>>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several >>>>>>>>>> stringStreams. A row of them is in a GrowableArray which must >>>>>>>>>> be walked to free all of them. >>>>>>>>>> As the GrowableArray is on an arena no destructor is called for it. >>>>>>>>>> >>>>>>>>>> I also changed some as_string() calls to base() calls which reduced >>>>>>>>>> memory need of the traces, and added a comment explaining the >>>>>>>>>> constructor of GrowableArray that calls the copyconstructor for its >>>>>>>>>> elements. >>>>>>>>>> >>>>>>>>>> Please review: >>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- >>>>>>> c2_tracing_mem_leak/01/ >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> ? ?? Goetz. >>>>>>>>> From vladimir.kozlov at oracle.com Fri Dec 20 21:36:07 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 13:36:07 -0800 Subject: RFR(XXS): clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: References: <80614DEF-BE53-4A0C-A517-F28EF734C180@amazon.com> Message-ID: <2b56976e-a705-ce77-4dee-aa3611c83f67@oracle.com> Hi Lx First, when you post RFR, please, include bug id in email's Subject. You can use jdk/submit testing to verify builds on SPARC. We are still building on it, with warning. Changes seems fine to me but make sure verify that it builds without PCH. Regards, Vladimir On 12/20/19 9:24 AM, Liu Xin wrote: > Martin? > > Thank you very much. May I know how to validate Sparc? I don't have any > SPARC machine to access. > > Thanks, > --lx > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin wrote: > >> Hi lx, >> >> PPC and s390 parts are ok. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Liu, Xin >>> Sent: Freitag, 20. Dezember 2019 04:48 >>> To: 'hotspot-compiler-dev at openjdk.java.net' >> dev at openjdk.java.net> >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in >>> c1_LIRAssembler >>> >>> Hi, Reviewers, >>> >>> Could you take a look at my webrev? I feel that those barrisetSet >> interfaces >>> have nothing with c1_LIRAssembler. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ >>> >>> I try to build on aarch64 and x86_64 and it?s fine. >>> >>> Thanks, >>> --lx >> >> From vladimir.kozlov at oracle.com Fri Dec 20 21:42:31 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 13:42:31 -0800 Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) In-Reply-To: References: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com> Message-ID: On 12/20/19 11:35 AM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks for the review. Please find the updated webrev below for JDK 14: > JBS: https://bugs.openjdk.java.net/browse/JDK-8236364 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/ Looks good. Thanks, Vladimir K > > > RFE filed for JDK 15: > https://bugs.openjdk.java.net/browse/JDK-8236446 > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov > Sent: Friday, December 20, 2019 2:20 AM > To: Viswanathan, Sandhya ; hotspot compiler > Subject: Re: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) > > Hi Sandhya, > >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.00/ > > I'd prefer to see the check as a special case with a comment: > > MachOper* Matcher::specialize_generic_vector_operand(MachOper* > generic_opnd, uint ideal_reg) { > assert(Matcher::is_generic_vector(generic_opnd), "not generic"); > bool legacy = (generic_opnd->opcode() == LEGVEC); > + if (!VM_Version::supports_avx512vlbwdq() && // KNL > + is_temp && !legacy && (ideal_reg == Op_VecZ)) { > + // Conservatively specialize 512bit vec TEMP operands to legVecZ > (zmm0-15) on KNL. > + return new legVecZOper(); > + } > if (legacy) { > switch (ideal_reg) { > case Op_VecS: return new legVecSOper(); > > Otherwise, looks good. > > I consider it as a stop-the-gap solution for 14. In 15 we need to get rid of it and adjust TEMP operand types in x86.ad instead. Please, file an RFE for it. > > Best regards, > Vladimir Ivanov > >> >> Best Regards, >> Sandhya >> >> >> From vladimir.x.ivanov at oracle.com Fri Dec 20 21:51:12 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 21 Dec 2019 00:51:12 +0300 Subject: [14] RFR(XXS):8236364:TEMP vector registers could be incorrectly assigned upper bank xmm registers after Generic Operands (JDK-8234391) In-Reply-To: References: <63862b62-3dd4-4a56-3545-7631f77d81fb@oracle.com> Message-ID: <242a71d4-093e-513b-44d9-0fa119a86dd1@oracle.com> > Webrev:http://cr.openjdk.java.net/~sviswanathan/8236364/webrev.01/ Looks good. > RFE filed for JDK 15: > https://bugs.openjdk.java.net/browse/JDK-8236446 Thanks! Best regards, Vladimir Ivanov From smita.kamath at intel.com Fri Dec 20 21:52:17 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Fri, 20 Dec 2019 21:52:17 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> Message-ID: Hi Vladimir, Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below). Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/ Regards, Smita -----Original Message----- From: Vladimir Kozlov Sent: Thursday, December 19, 2019 5:17 PM To: Kamath, Smita Cc: Viswanathan, Sandhya ; 'hotspot compiler' Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: StubRoutines::_bigIntegerRightShiftWorker StubRoutines::_bigIntegerLeftShiftWorker Smita>>>done In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). Smita>>>done Thanks, Vladimir [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ On 12/18/19 6:33 PM, Kamath, Smita wrote: > Hi Vladimir, > > I have made the code changes you suggested (please look at the email below). > I have also enabled the intrinsic to run only when VBMI2 feature is available. > The intrinsic shows gains of >1.5x above 4k bit BigInteger. > > Webrev link: > https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ > > Thanks, > Smita > > -----Original Message----- > From: Vladimir Kozlov > Sent: Wednesday, December 11, 2019 10:55 AM > To: Kamath, Smita ; 'hotspot compiler' > ; Viswanathan, Sandhya > > Subject: Re: RFR(M):8167065: Add intrinsic support for double > precision shifting on x86_64 > > Hi Kamath, > > First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. > What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. > > Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? > > Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. > > In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. > For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. > Smita >>>done > > I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. > Smita >> removed UseVBMI2 flag > > In vm_version_x86.cpp you need to add more %s in print statement for new output. > Smita >>> done > > You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. > Smita >>> done > > You need to add intrinsics to Graal's test to ignore them: > > http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal. > vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr > aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 > Smita >>>done > > Thanks, > Vladimir > > On 12/10/19 5:41 PM, Kamath, Smita wrote: >> Hi, >> >> >> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. >> >> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >> >> Link to webrev : >> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >> >> >> >> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. >> >> >> [1] >> https://software.intel.com/sites/default/files/managed/39/c5/325462-s >> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> >> Vol. >> 2C 5-471) >> >> [2] >> https://software.intel.com/en-us/articles/intel-software-development- >> e >> mulator >> >> >> Regards, >> >> Smita Kamath >> From mikael.vidstedt at oracle.com Fri Dec 20 21:54:47 2019 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Fri, 20 Dec 2019 13:54:47 -0800 Subject: RFR(T): 8236449: Problem list compiler/jsr292/ContinuousCallSiteTargetChange.java on solaris-sparcv9 Message-ID: compiler/jsr292/ContinuousCallSiteTargetChange.java is timing out intermittently on solaris-sparcv9, frequently enough to cause some unwanted noise. Let?s problem list it: Bug: https://bugs.openjdk.java.net/browse/JDK-8236449 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/ Cheers, Mikael From igor.ignatyev at oracle.com Fri Dec 20 22:02:20 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Fri, 20 Dec 2019 14:02:20 -0800 Subject: RFR(T): 8236449: Problem list compiler/jsr292/ContinuousCallSiteTargetChange.java on solaris-sparcv9 In-Reply-To: References: Message-ID: <35692795-874C-40DA-8789-7FE449E7B906@oracle.com> LGTM ? Igor > On Dec 20, 2019, at 1:55 PM, Mikael Vidstedt wrote: > > ? > compiler/jsr292/ContinuousCallSiteTargetChange.java is timing out intermittently on solaris-sparcv9, frequently enough to cause some unwanted noise. Let?s problem list it: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8236449 > Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8236449/webrev.00/open/webrev/ > > Cheers, > Mikael From vladimir.kozlov at oracle.com Fri Dec 20 22:10:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 14:10:02 -0800 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: <8736dfkko7.fsf@redhat.com> References: <87o8w8ptsq.fsf@redhat.com> <8736dfkko7.fsf@redhat.com> Message-ID: <7f4bcc83-e049-20ad-deef-6c5e15819d10@oracle.com> Looks good. Thanks, Vladimir K On 12/20/19 8:48 AM, Roland Westrelin wrote: > > Thanks for reviewing this, Vladimir. > >> cfgnode.cpp - should we also check for is_top() to set `doit = false` and bailout? > > This: > > http://cr.openjdk.java.net/~roland/8231291/webrev.02/ > >> Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. >> Did you find some issues? > > I hit a crash when doing some CTW testing because an argument to > must_be_not_null() was already known to be not null. I suppose it's > unrelated to that change and something changed in the libraries instead. > > Roland. > From vladimir.kozlov at oracle.com Fri Dec 20 22:19:54 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 14:19:54 -0800 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> Message-ID: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> We should have added core-libs to review since you modified BigInteger.java. webrev02 looks good to me. Let me test it. Thanks, Vladimir On 12/20/19 1:52 PM, Kamath, Smita wrote: > Hi Vladimir, > > Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below). > Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/ > > Regards, > Smita > > -----Original Message----- > From: Vladimir Kozlov > Sent: Thursday, December 19, 2019 5:17 PM > To: Kamath, Smita > Cc: Viswanathan, Sandhya ; 'hotspot compiler' > Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 > > We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: > > StubRoutines::_bigIntegerRightShiftWorker > StubRoutines::_bigIntegerLeftShiftWorker > Smita>>>done > > In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). > Smita>>>done > > Thanks, > Vladimir > > [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ > > On 12/18/19 6:33 PM, Kamath, Smita wrote: >> Hi Vladimir, >> >> I have made the code changes you suggested (please look at the email below). >> I have also enabled the intrinsic to run only when VBMI2 feature is available. >> The intrinsic shows gains of >1.5x above 4k bit BigInteger. >> >> Webrev link: >> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ >> >> Thanks, >> Smita >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Wednesday, December 11, 2019 10:55 AM >> To: Kamath, Smita ; 'hotspot compiler' >> ; Viswanathan, Sandhya >> >> Subject: Re: RFR(M):8167065: Add intrinsic support for double >> precision shifting on x86_64 >> >> Hi Kamath, >> >> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2. >> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. >> >> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? >> >> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing. >> >> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. >> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. >> Smita >>>done >> >> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. >> Smita >> removed UseVBMI2 flag >> >> In vm_version_x86.cpp you need to add more %s in print statement for new output. >> Smita >>> done >> >> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. >> Smita >>> done >> >> You need to add intrinsics to Graal's test to ignore them: >> >> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal. >> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr >> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 >> Smita >>>done >> >> Thanks, >> Vladimir >> >> On 12/10/19 5:41 PM, Kamath, Smita wrote: >>> Hi, >>> >>> >>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. >>> >>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >>> >>> Link to webrev : >>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >>> >>> >>> >>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented. >>> >>> >>> [1] >>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s >>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> >>> Vol. >>> 2C 5-471) >>> >>> [2] >>> https://software.intel.com/en-us/articles/intel-software-development- >>> e >>> mulator >>> >>> >>> Regards, >>> >>> Smita Kamath >>> From vladimir.x.ivanov at oracle.com Fri Dec 20 22:22:13 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 21 Dec 2019 01:22:13 +0300 Subject: RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops In-Reply-To: <8736dfkko7.fsf@redhat.com> References: <87o8w8ptsq.fsf@redhat.com> <8736dfkko7.fsf@redhat.com> Message-ID: <7bc32dc2-1612-cb77-a467-ac67987b148c@oracle.com> >> Why you added check in must_be_not_null()? It is used only in library_call.cpp and should not relate to this changes. >> Did you find some issues? > > I hit a crash when doing some CTW testing because an argument to > must_be_not_null() was already known to be not null. I suppose it's > unrelated to that change and something changed in the libraries instead. That's interesting. I initially thought it's just to simplify the IR and avoid keeping redundant null check until loop opts are over. Doesn't it signal about a more general problem with GraphKit::must_be_not_null()? What if a value becomes non-null after GraphKit::must_be_not_null()? Is there a chance to hit the very same problem you observed? Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Dec 20 23:47:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 15:47:21 -0800 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> Message-ID: Hi Smita, You have typo (should be supports_vbmi2): src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version' assert(VM_Version::support_vbmi2(), "requires vbmi2"); ^~~~~~~~~~~~~ Debug build failed. I am retesting with local fix. Regards, Vladimir K On 12/20/19 2:19 PM, Vladimir Kozlov wrote: > We should have added core-libs to review since you modified BigInteger.java. > > webrev02 looks good to me. Let me test it. > > Thanks, > Vladimir > > On 12/20/19 1:52 PM, Kamath, Smita wrote: >> Hi Vladimir, >> >> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below). >> Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/ >> >> Regards, >> Smita >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Thursday, December 19, 2019 5:17 PM >> To: Kamath, Smita >> Cc: Viswanathan, Sandhya ; 'hotspot compiler' >> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 >> >> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: >> >> StubRoutines::_bigIntegerRightShiftWorker >> StubRoutines::_bigIntegerLeftShiftWorker >> Smita>>>done >> >> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). >> Smita>>>done >> >> Thanks, >> Vladimir >> >> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ >> >> On 12/18/19 6:33 PM, Kamath, Smita wrote: >>> Hi Vladimir, >>> >>> I have made the code changes you suggested (please look at the email below). >>> I have also enabled the intrinsic to run only when VBMI2 feature is available. >>> The intrinsic shows gains of >1.5x above 4k bit BigInteger. >>> >>> Webrev link: >>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ >>> >>> Thanks, >>> Smita >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Wednesday, December 11, 2019 10:55 AM >>> To: Kamath, Smita ; 'hotspot compiler' >>> ; Viswanathan, Sandhya >>> >>> Subject: Re: RFR(M):8167065: Add intrinsic support for double >>> precision shifting on x86_64 >>> >>> Hi Kamath, >>> >>> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code >>> generated by C2. >>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. >>> >>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you >>> change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? >>> >>> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we >>> would need test this including performance testing. >>> >>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. >>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use >>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. >>> Smita >>>done >>> >>> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or >>> other avx512 instructions subset. >>> Smita >> removed UseVBMI2 flag >>> >>> In vm_version_x86.cpp you need to add more %s in print statement for new output. >>> Smita? >>> done >>> >>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. >>> Smita >>> done >>> >>> You need to add intrinsics to Graal's test to ignore them: >>> >>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal. >>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr >>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 >>> Smita >>>done >>> >>> Thanks, >>> Vladimir >>> >>> On 12/10/19 5:41 PM, Kamath, Smita wrote: >>>> Hi, >>>> >>>> >>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I >>>> would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This >>>> optimization is for x86_64 architecture enabled. >>>> >>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >>>> >>>> Link to webrev : >>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >>>> >>>> >>>> >>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly >>>> implemented. >>>> >>>> >>>> [1] >>>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s >>>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> >>>> Vol. >>>> 2C 5-471) >>>> >>>> [2] >>>> https://software.intel.com/en-us/articles/intel-software-development- >>>> e >>>> mulator >>>> >>>> >>>> Regards, >>>> >>>> Smita Kamath >>>> From vladimir.kozlov at oracle.com Sat Dec 21 05:12:15 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Dec 2019 21:12:15 -0800 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> Message-ID: Testing results are good after fixing the typo. We should consider implementing this intrinsic in Graal too. We have to upload AOT and Graal test changes anyway. Thanks, Vladimir On 12/20/19 3:47 PM, Vladimir Kozlov wrote: > Hi Smita, > > You have typo (should be supports_vbmi2): > > src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version' > ??? assert(VM_Version::support_vbmi2(), "requires vbmi2"); > ?????????????????????? ^~~~~~~~~~~~~ > > Debug build failed. I am retesting with local fix. > > Regards, > Vladimir K > > On 12/20/19 2:19 PM, Vladimir Kozlov wrote: >> We should have added core-libs to review since you modified BigInteger.java. >> >> webrev02 looks good to me. Let me test it. >> >> Thanks, >> Vladimir >> >> On 12/20/19 1:52 PM, Kamath, Smita wrote: >>> Hi Vladimir, >>> >>> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below). >>> Link to the updated webrev: https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/ >>> >>> Regards, >>> Smita >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Thursday, December 19, 2019 5:17 PM >>> To: Kamath, Smita >>> Cc: Viswanathan, Sandhya ; 'hotspot compiler' >>> Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 >>> >>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: >>> >>> StubRoutines::_bigIntegerRightShiftWorker >>> StubRoutines::_bigIntegerLeftShiftWorker >>> Smita>>>done >>> >>> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). >>> Smita>>>done >>> >>> Thanks, >>> Vladimir >>> >>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ >>> >>> On 12/18/19 6:33 PM, Kamath, Smita wrote: >>>> Hi Vladimir, >>>> >>>> I have made the code changes you suggested (please look at the email below). >>>> I have also enabled the intrinsic to run only when VBMI2 feature is available. >>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger. >>>> >>>> Webrev link: >>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ >>>> >>>> Thanks, >>>> Smita >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Wednesday, December 11, 2019 10:55 AM >>>> To: Kamath, Smita ; 'hotspot compiler' >>>> ; Viswanathan, Sandhya >>>> >>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double >>>> precision shifting on x86_64 >>>> >>>> Hi Kamath, >>>> >>>> First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code >>>> generated by C2. >>>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. >>>> >>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does >>>> you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? >>>> >>>> Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we >>>> would need test this including performance testing. >>>> >>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. >>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use >>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. >>>> Smita >>>done >>>> >>>> I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or >>>> other avx512 instructions subset. >>>> Smita >> removed UseVBMI2 flag >>>> >>>> In vm_version_x86.cpp you need to add more %s in print statement for new output. >>>> Smita? >>> done >>>> >>>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. >>>> Smita >>> done >>>> >>>> You need to add intrinsics to Graal's test to ignore them: >>>> >>>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal. >>>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/gr >>>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 >>>> Smita >>>done >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/10/19 5:41 PM, Kamath, Smita wrote: >>>>> Hi, >>>>> >>>>> >>>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I >>>>> would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This >>>>> optimization is for x86_64 architecture enabled. >>>>> >>>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >>>>> >>>>> Link to webrev : >>>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >>>>> >>>>> >>>>> >>>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly >>>>> implemented. >>>>> >>>>> >>>>> [1] >>>>> https://software.intel.com/sites/default/files/managed/39/c5/325462-s >>>>> d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> >>>>> Vol. >>>>> 2C 5-471) >>>>> >>>>> [2] >>>>> https://software.intel.com/en-us/articles/intel-software-development- >>>>> e >>>>> mulator >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Smita Kamath >>>>> From goetz.lindenmaier at sap.com Sat Dec 21 08:14:57 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Sat, 21 Dec 2019 08:14:57 +0000 Subject: RFR(M): 8235998: [c2] Memory leaks during tracing after '8224193: stringStream should not use Resouce Area'. In-Reply-To: References: <0817fb9b-8e81-049a-5ee3-f30831cd3e2c@oracle.com> <74f551e7-c418-5c7c-a422-abe9df07bb12@oracle.com> Message-ID: Thanks Vladimir! Best regards, Goetz. > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, December 20, 2019 10:15 PM > To: Lindenmaier, Goetz ; hotspot-compiler- > dev at openjdk.java.net > Cc: hotspot-runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > '8224193: stringStream should not use Resouce Area'. > > On 12/20/19 2:12 AM, Lindenmaier, Goetz wrote: > > Hi Vladimir, > > > > I refactored it a bit, see new webrev: > > http://cr.openjdk.java.net/~goetz/wr19/8235998- > c2_tracing_mem_leak/02/ > > Looks good to me. > > > > > Also, I filed > > 8236414: stringStream allocates on ResourceArea and C-heap > > https://bugs.openjdk.java.net/browse/JDK-8236414 > > ... but I'm not sure how to solve it, as stringStream inherits > > this capability, and having the char* on the ResourceArea > > as before is not good, either. > > Anyways, there are very few places where new is used > > with the stringStream, and the PrintInlining implementation > > is the only one where it's problematic. Maybe it would be > > better to simplify PrintInlining. > > Yes, simplifying PrintInlining is also option. > > Thanks, > Vladimir > > > > > Best regards, > > Goetz. > > > >> -----Original Message----- > >> From: Vladimir Kozlov > >> Sent: Donnerstag, 19. Dezember 2019 19:07 > >> To: Lindenmaier, Goetz ; hotspot- > compiler- > >> dev at openjdk.java.net > >> Cc: hotspot-runtime-dev at openjdk.java.net runtime >> dev at openjdk.java.net> > >> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >> '8224193: stringStream should not use Resouce Area'. > >> > >> Please, file RFE for refactoring stringStream. > >> > >> Yes, the fix can go into JDK 14. > >> > >> But before that, I see the same pattern used 3 times in compile.cpp: > >> > >> + if (_print_inlining_stream != NULL) _print_inlining_stream- > >>> ~stringStream(); > >> _print_inlining_stream = ; > >> > >> Can you use one function for that? Also our coding style requires to put > body of > >> 'if' on separate line and use {}. > >> > >> thanks, > >> Vladimir > >> > >> On 12/19/19 4:52 AM, David Holmes wrote: > >>> On 19/12/2019 9:35 pm, Lindenmaier, Goetz wrote: > >>>> Hi, > >>>> > >>>> yes, it is confusing that parts are on the arena, other parts > >>>> are allocated in the C-heap. > >>>> But usages which allocate the stringStream with new() are > >>>> rare, usually it's allocated on the stack making all this > >>>> more simple.? And the previous design was even more > >>>> error-prone. > >>>> Also, the whole way to print the inlining information > >>>> is quite complex, with strange usage of the copy constructor > >>>> of PrintInliningBuffer ... which reaches into GrowableArray > >>>> which should have a constructor that does not use the > >>>> copy constructor to initialize the elements ... > >>>> > >>>> I do not intend to change stringStream in this change. > >>>> So can I consider this reviewed from your side? Or at > >>>> least that there is no veto :)? > >>> > >>> Sorry I was trying to convey this is Reviewed, but I do think this needs > further > >> work in the future. > >>> > >>> Thanks, > >>> David > >>> > >>>> Thanks and best regards, > >>>> ?? Goetz. > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: David Holmes > >>>>> Sent: Donnerstag, 19. Dezember 2019 11:39 > >>>>> To: Lindenmaier, Goetz ; Vladimir > Kozlov > >>>>> ; hotspot-compiler- > dev at openjdk.java.net > >>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime runtime- > >>>>> dev at openjdk.java.net> > >>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing after > >>>>> '8224193: stringStream should not use Resouce Area'. > >>>>> > >>>>> On 19/12/2019 7:27 pm, Lindenmaier, Goetz wrote: > >>>>>> Hi David, Vladimir, > >>>>>> > >>>>>> stringStream is a ResourceObj, thus it lives on an arena. > >>>>>> This is uncritical, as it does not resize. > >>>>>> 8224193 only changed the allocation of the internal char*, > >>>>>> which always caused problems with resizing under > >>>>>> ResourceMarks that were not placed for the string but to > >>>>>> free other memory. > >>>>>> Thus stringStream must not be deallocated, and > >>>>>> also there was no mem leak before that change. > >>>>>> But we need to call the destructor to free the char*. > >>>>> > >>>>> I think we have a confusing mix of arena and C_heap usage with > >>>>> stringStream. Not clear to me why stringStream remains a > resourceObj > >>>>> now? In many cases the stringStream is just local on the stack. In > other > >>>>> cases if it is new'd then it should be C-heap same as the array and > then > >>>>> you could delete it too. > >>>>> > >>>>> What you have may suffice to initially address the leak but I think this > >>>>> whole thing needs revisiting. > >>>>> > >>>>> Thanks, > >>>>> David > >>>>> > >>>>>> > >>>>>> Best regards, > >>>>>> ??? Goetz. > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: hotspot-runtime-dev >>>>> bounces at openjdk.java.net> > >>>>>>> On Behalf Of David Holmes > >>>>>>> Sent: Mittwoch, 18. Dezember 2019 04:33 > >>>>>>> To: Vladimir Kozlov ; hotspot- > compiler- > >>>>>>> dev at openjdk.java.net > >>>>>>> Cc: hotspot-runtime-dev at openjdk.java.net runtime runtime- > >>>>>>> dev at openjdk.java.net> > >>>>>>> Subject: Re: RFR(M): 8235998: [c2] Memory leaks during tracing > after > >>>>>>> '8224193: stringStream should not use Resouce Area'. > >>>>>>> > >>>>>>> On 18/12/2019 12:40 pm, Vladimir Kozlov wrote: > >>>>>>>> CCing to Runtime group. > >>>>>>>> > >>>>>>>> For me the use of `_print_inlining_stream->~stringStream()` is not > >> obvious. > >>>>>>>> I would definitively miss to do that if I use stringStreams in some > new > >>>>>>>> code. > >>>>>>> > >>>>>>> But that is not a problem added by this changeset, the problem is > that > >>>>>>> we're not deallocating these stringStreams even though we should > be.? If > >>>>>>> you use a stringStream in new code you have to manage its > lifecycle. > >>>>>>> > >>>>>>> That said why is this: > >>>>>>> > >>>>>>> ??? if (_print_inlining_stream != NULL) > >>>>>>> _print_inlining_stream->~stringStream(); > >>>>>>> > >>>>>>> not just: > >>>>>>> > >>>>>>> delete _print_inlining_stream; > >>>>>>> > >>>>>>> ? > >>>>>>> > >>>>>>> Ditto for src/hotspot/share/opto/loopPredicate.cpp why are we > >> explicitly > >>>>>>> calling the destructor rather than calling delete? > >>>>>>> > >>>>>>> Cheers, > >>>>>>> David > >>>>>>> > >>>>>>>> May be someone can suggest some C++ trick to do that > automatically. > >>>>>>>> Thanks, > >>>>>>>> Vladimir > >>>>>>>> > >>>>>>>> On 12/16/19 6:34 AM, Lindenmaier, Goetz wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I'm resending this with fixed bugId ... > >>>>>>>>> Sorry! > >>>>>>>>> > >>>>>>>>> Best regards, > >>>>>>>>> ? ?? Goetz > >>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> PrintInlining and TraceLoopPredicate allocate stringStreams > with > >> new > >>>>> and > >>>>>>>>>> relied on the fact that all memory used is on the ResourceArea > >> cleaned > >>>>>>>>>> after the compilation. > >>>>>>>>>> > >>>>>>>>>> Since 8224193 the char* of the stringStream is malloced and > thus > >>>>>>>>>> must be freed. No doing so manifests a memory leak. > >>>>>>>>>> This is only relevant if the corresponding tracing is active. > >>>>>>>>>> > >>>>>>>>>> To fix TraceLoopPredicate I added the destructor call > >>>>>>>>>> Fixing PrintInlining is a bit more complicated, as it uses several > >>>>>>>>>> stringStreams. A row of them is in a GrowableArray which must > >>>>>>>>>> be walked to free all of them. > >>>>>>>>>> As the GrowableArray is on an arena no destructor is called for > it. > >>>>>>>>>> > >>>>>>>>>> I also changed some as_string() calls to base() calls which > reduced > >>>>>>>>>> memory need of the traces, and added a comment explaining > the > >>>>>>>>>> constructor of GrowableArray that calls the copyconstructor for > its > >>>>>>>>>> elements. > >>>>>>>>>> > >>>>>>>>>> Please review: > >>>>>>>>>> http://cr.openjdk.java.net/~goetz/wr19/8235998- > >>>>>>> c2_tracing_mem_leak/01/ > >>>>>>>>>> > >>>>>>>>>> Best regards, > >>>>>>>>>> ? ?? Goetz. > >>>>>>>>> From dean.long at oracle.com Mon Dec 23 07:22:47 2019 From: dean.long at oracle.com (Dean Long) Date: Sun, 22 Dec 2019 23:22:47 -0800 Subject: [15] Review Request: 8235975 Update copyright year to match last edit in jdk repository for 2014/15/16/17/18 In-Reply-To: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com> References: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com> Message-ID: The changes to the src/jdk.internal.vm.compiler tree is going to complicate our automated sync with upstream Graal.? Our sync script sets the date based on changes in the Graal repo, so a file that was last modified in 2018 (in upstream Graal) but was added to JDK in 2019 would still have 2018 as the date. dl On 12/22/19 12:24 PM, Sergey Bylokhov wrote: > Hello. > Please review the fix for JDK 15. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235975 > Patch (2 Mb): > http://cr.openjdk.java.net/~serb/8235975/webrev.02/open.patch > Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.02 > > I have updated the source code copyrights by the > "update_copyright_year.sh" > script for 2014/15/16/18/19 years, unfortunately, cannot run it for 2017 > because of: "JDK-8187443: Forest Consolidation: Move files to unified > layout" > which touched all files. > > From Pengfei.Li at arm.com Mon Dec 23 07:53:52 2019 From: Pengfei.Li at arm.com (Pengfei Li) Date: Mon, 23 Dec 2019 07:53:52 +0000 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com> References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com> Message-ID: Hi Andrew, > On 12/20/19 10:21 AM, Pengfei Li wrote: > > Since Nick's recent metaspace reservation fix [1] has completely removed > the use of r27, my patch becomes much simpler now. I have removed the > condition of UseCompressedClassPointers, rebased the code and created a > new webrev. Could you please help review again? > > What happens when we use Graal as a replacement for C2, particularly when > Graal needs a heap base register? Regarding your question, the Graal compiler (particularly on AArch64) uses r27 for compressing and uncompressing oops. Neither compressing nor uncompressing klass pointers uses r27. See code at [1], [2]. In Graal, we also wanted to make the heap base register allocatable when UseCompressedOops is off. Since before, AArch64 HotSpot didn't support r27 as an allocatable register, Graal patch e4d9c5f [3] marked r27 as non-allocatable and JVMCI patch JDK-8231754 [4] reserved r27 unconditionally. That's why I would like to revert JDK-8231754 in my patch. [1] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L256 [2] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L286 [3] https://github.com/oracle/graal/commit/e4d9c5f09a3c9be9f3c66ff0feff787519875a12 [4] http://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de -- Thanks, Pengfei From tobias.hartmann at oracle.com Mon Dec 23 08:42:08 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 23 Dec 2019 09:42:08 +0100 Subject: [14] RFR(S): 8233164: C2 fails with assert(phase->C->get_alias_index(t) == phase->C->get_alias_index(t_adr)) failed: correct memory chain In-Reply-To: <878sn7kn84.fsf@redhat.com> References: <687db4af-8572-1e8c-f397-0f48fe136fa7@oracle.com> <878sn7kn84.fsf@redhat.com> Message-ID: <8d800afc-4d98-c882-2005-5e3fc1bd69d4@oracle.com> Hi Roland, Thanks for the review. On 20.12.19 16:53, Roland Westrelin wrote: > I'm wondering whether this problem could show up elsewhere and if a more > generic fix would be needed (maybe EA should set the type of the CastPP > so there's no inconsistency when IGVN runs). The fix looks good to > address this particular problem but I think this deserves a follow up > bug to investigate this further. Yes, I agree. As we've discussed offline, I've filed 8236493 [1]. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8236493 From christian.hagedorn at oracle.com Mon Dec 23 09:30:25 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 23 Dec 2019 10:30:25 +0100 Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head || out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the slow or the fast loop Message-ID: Hi Please review the following options for: https://bugs.openjdk.java.net/browse/JDK-8235984 The problem: The original fix for JDK-8233033 [1] assumed that partially peeled statements always have a dependency inside the loop to be unswitched (i.e. when following their outputs eventually a phi node is hit belonging to either the slow or the fast loop). However, that is not always the case. As a result the assert was hit. I suggest the following options: Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/ We first check at the start of loop unswitching if there are loop predicates for the loop to be unswitched and if they have an additional control dependency to partially peeled statements (outcnt > 1). Then we explicitly check the assumption that partially peeled statements have a dependency in the loop to be unswitched (in order to keep the fix for JDK-8233033). If that is not the case, we bailout. We could then file an RFE for JDK-15 to handle this missing case properly and remove the bailout. Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/ We only check at the start of loop unswitching if there are loop predicates for the loop to be unswitched and if they have an additional control dependency to partially peeled statements (outcnt > 1). If that's the case we bailout without having the fix from JDK-8233033. We could then file an RFE for JDK-15 to properly handle partially peeled statements all together (kinda a REDO of the fix for JDK-8233033). Option c) Trying to fix the missing cases from JDK-8233033 for JDK-14 without a bailout. I've tried to come up with a fix (option c) last week but without success so far. The idea was to also clone the partially peeled statements without a dependency in the loop to be unswitched, change their control input to the correct cloned predicates and then add a phi node on each loop exit, where we merge the slow and fast loop, to select the correct value. However, this has not worked properly, yet and also involves a higher risk due to its complexity. I think we should not target that option for JDK-14 but do it for JDK-15 in an RFE. Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the complete bailout b) and Tier 1-8 for the "conditioned" bailout a). Both look fine. I also ran some standard benchmarks comparing a) and b) to a baseline where I excluded the fix for JDK-8233033 (without bailing out and trying to fix the problem). I could not see any difference in performance. Therefore, it suggests to go with the low risk option b) for JDK-14 and do the entire fix in an RFE for JDK-15. What do you think? Thank you! Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8233033 From richard.reingruber at sap.com Mon Dec 23 09:40:52 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 23 Dec 2019 09:40:52 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: Message-ID: Hi, webrev.3 didn't apply anymore after 8236000 [1]. I've rebased and updated in place: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ The change was minimal. Cheers, Richard. [1] JDK-8236000: VM build without C2 fails -----Original Message----- From: Reingruber, Richard Sent: Dienstag, 10. Dezember 2019 22:45 To: serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi, I would like to get reviews please for http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ Corresponding RFE: https://bugs.openjdk.java.net/browse/JDK-8227745 Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1] Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without issues (thanks!). In addition the change is being tested at SAP since I posted the first RFR some months ago. The intention of this enhancement is to benefit performance wise from escape analysis even if JVMTI agents request capabilities that allow them to access local variable values. E.g. if you start-up with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then escape analysis is disabled right from the beginning, well before a debugger attaches -- if ever one should do so. With the enhancement, escape analysis will remain enabled until and after a debugger attaches. EA based optimizations are reverted just before an agent acquires the reference to an object. In the JBS item you'll find more details. Thanks, Richard. [1] Experimental fix for JDK-8214584 based on JDK-8227745 http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch From aph at redhat.com Mon Dec 23 09:43:16 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 23 Dec 2019 10:43:16 +0100 Subject: [aarch64-port-dev ] RFR(M): 8233743: AArch64: Make r27 conditionally allocatable In-Reply-To: References: <93b1dff3-b760-e305-1422-d21178e9ed75@redhat.com> <28d32c06-9a8e-6f4b-8425-3f07a4fbfe82@redhat.com> <34081dab-0049-dda7-dead-3b2934f8c41b@arm.com> <97c0b309-e911-ff9f-4cea-b6f00bd4962d@redhat.com> <70f2fb20-f72d-07f6-b8ff-7d1e467c3c12@redhat.com> <379d9c30-a139-4544-9339-a6ddec913b78@redhat.com> Message-ID: <57e85dcb-0d1c-135f-27dd-ec81534f0ad6@redhat.com> On 12/23/19 8:53 AM, Pengfei Li wrote: > Regarding your question, the Graal compiler (particularly on AArch64) uses r27 for compressing and uncompressing oops. Neither compressing nor uncompressing klass pointers uses r27. See code at [1], [2]. > > In Graal, we also wanted to make the heap base register allocatable when UseCompressedOops is off. Since before, AArch64 HotSpot didn't support r27 as an allocatable register, Graal patch e4d9c5f [3] marked r27 as non-allocatable and JVMCI patch JDK-8231754 [4] reserved r27 unconditionally. That's why I would like to revert JDK-8231754 in my patch. > > [1] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L256 > [2] https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotLIRGenerator.java#L286 > [3] https://github.com/oracle/graal/commit/e4d9c5f09a3c9be9f3c66ff0feff787519875a12 > [4] http://hg.openjdk.java.net/jdk/jdk/rev/d068b1e534de OK. As long as you've fully tested Graal with UseJVMCICompiler after this patch was applied I'm happy. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Mon Dec 23 13:12:28 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 23 Dec 2019 14:12:28 +0100 Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head || out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the slow or the fast loop In-Reply-To: References: Message-ID: <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com> Hi Christian, version b) looks good to me and I would go with that for JDK 14. Best regards, Tobias On 23.12.19 10:30, Christian Hagedorn wrote: > Hi > > Please review the following options for: > https://bugs.openjdk.java.net/browse/JDK-8235984 > > The problem: > The original fix for JDK-8233033 [1] assumed that partially peeled statements always have a > dependency inside the loop to be unswitched (i.e. when following their outputs eventually a phi node > is hit belonging to either the slow or the fast loop). However, that is not always the case. As a > result the assert was hit. > > I suggest the following options: > > Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/ > We first check at the start of loop unswitching if there are loop predicates for the loop to be > unswitched and if they have an additional control dependency to partially peeled statements (outcnt >> 1). Then we explicitly check the assumption that partially peeled statements have a dependency in > the loop to be unswitched (in order to keep the fix for JDK-8233033). If that is not the case, we > bailout. We could then file an RFE for JDK-15 to handle this missing case properly and remove the > bailout. > > Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/ > We only check at the start of loop unswitching if there are loop predicates for the loop to be > unswitched and if they have an additional control dependency to partially peeled statements (outcnt >> 1). If that's the case we bailout without having the fix from JDK-8233033. We could then file an > RFE for JDK-15 to properly handle partially peeled statements all together (kinda a REDO of the fix > for JDK-8233033). > > Option c) > Trying to fix the missing cases from JDK-8233033 for JDK-14 without a bailout. > > > I've tried to come up with a fix (option c) last week but without success so far. The idea was to > also clone the partially peeled statements without a dependency in the loop to be unswitched, change > their control input to the correct cloned predicates and then add a phi node on each loop exit, > where we merge the slow and fast loop, to select the correct value. However, this has not worked > properly, yet and also involves a higher risk due to its complexity. I think we should not target > that option for JDK-14 but do it for JDK-15 in an RFE. > > Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the complete bailout b) and Tier 1-8 > for the "conditioned" bailout a). Both look fine. I also ran some standard benchmarks comparing a) > and b) to a baseline where I excluded the fix for JDK-8233033 (without bailing out and trying to fix > the problem). I could not see any difference in performance. Therefore, it suggests to go with the > low risk option b) for JDK-14 and do the entire fix in an RFE for JDK-15. > > What do you think? > > > Thank you! > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8233033 From christian.hagedorn at oracle.com Mon Dec 23 13:20:33 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 23 Dec 2019 14:20:33 +0100 Subject: [14] 8235984: C2: assert(out->in(PhiNode::Region) == head || out->in(PhiNode::Region) == slow_head) failed: phi must be either part of the slow or the fast loop In-Reply-To: <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com> References: <08f66862-fd59-feb1-5f3b-a0df81c07721@oracle.com> Message-ID: <8fcdc2b1-0e5e-2059-d1f9-b8fa84f8ece1@oracle.com> Hi Tobias Thank you for your review and your estimate! I also think that option b) is probably the best choice for JDK-14. Best regards, Christian On 23.12.19 14:12, Tobias Hartmann wrote: > Hi Christian, > > version b) looks good to me and I would go with that for JDK 14. > > Best regards, > Tobias > > On 23.12.19 10:30, Christian Hagedorn wrote: >> Hi >> >> Please review the following options for: >> https://bugs.openjdk.java.net/browse/JDK-8235984 >> >> The problem: >> The original fix for JDK-8233033 [1] assumed that partially peeled statements always have a >> dependency inside the loop to be unswitched (i.e. when following their outputs eventually a phi node >> is hit belonging to either the slow or the fast loop). However, that is not always the case. As a >> result the assert was hit. >> >> I suggest the following options: >> >> Option a) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.a.00/ >> We first check at the start of loop unswitching if there are loop predicates for the loop to be >> unswitched and if they have an additional control dependency to partially peeled statements (outcnt >>> 1). Then we explicitly check the assumption that partially peeled statements have a dependency in >> the loop to be unswitched (in order to keep the fix for JDK-8233033). If that is not the case, we >> bailout. We could then file an RFE for JDK-15 to handle this missing case properly and remove the >> bailout. >> >> Option b) http://cr.openjdk.java.net/~chagedorn/8235984/webrev.b.00/ >> We only check at the start of loop unswitching if there are loop predicates for the loop to be >> unswitched and if they have an additional control dependency to partially peeled statements (outcnt >>> 1). If that's the case we bailout without having the fix from JDK-8233033. We could then file an >> RFE for JDK-15 to properly handle partially peeled statements all together (kinda a REDO of the fix >> for JDK-8233033). >> >> Option c) >> Trying to fix the missing cases from JDK-8233033 for JDK-14 without a bailout. >> >> >> I've tried to come up with a fix (option c) last week but without success so far. The idea was to >> also clone the partially peeled statements without a dependency in the loop to be unswitched, change >> their control input to the correct cloned predicates and then add a phi node on each loop exit, >> where we merge the slow and fast loop, to select the correct value. However, this has not worked >> properly, yet and also involves a higher risk due to its complexity. I think we should not target >> that option for JDK-14 but do it for JDK-15 in an RFE. >> >> Thus, I'd opt for either option a) or b). I tested Tier 1-7 for the complete bailout b) and Tier 1-8 >> for the "conditioned" bailout a). Both look fine. I also ran some standard benchmarks comparing a) >> and b) to a baseline where I excluded the fix for JDK-8233033 (without bailing out and trying to fix >> the problem). I could not see any difference in performance. Therefore, it suggests to go with the >> low risk option b) for JDK-14 and do the entire fix in an RFE for JDK-15. >> >> What do you think? >> >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8233033 From smita.kamath at intel.com Mon Dec 23 18:25:48 2019 From: smita.kamath at intel.com (Kamath, Smita) Date: Mon, 23 Dec 2019 18:25:48 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> Message-ID: Hi Vladimir, Thanks for reviewing the code. Can you please sponsor and push the changes? Regards, Smita -----Original Message----- From: Vladimir Kozlov Sent: Friday, December 20, 2019 9:12 PM To: Kamath, Smita Cc: 'hotspot compiler' ; core-libs-dev at openjdk.java.net Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 Testing results are good after fixing the typo. We should consider implementing this intrinsic in Graal too. We have to upload AOT and Graal test changes anyway. Thanks, Vladimir On 12/20/19 3:47 PM, Vladimir Kozlov wrote: > Hi Smita, > > You have typo (should be supports_vbmi2): > > src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not a member of 'VM_Version' > ??? assert(VM_Version::support_vbmi2(), "requires vbmi2"); > ?????????????????????? ^~~~~~~~~~~~~ > > Debug build failed. I am retesting with local fix. > > Regards, > Vladimir K > > On 12/20/19 2:19 PM, Vladimir Kozlov wrote: >> We should have added core-libs to review since you modified BigInteger.java. >> >> webrev02 looks good to me. Let me test it. >> >> Thanks, >> Vladimir >> >> On 12/20/19 1:52 PM, Kamath, Smita wrote: >>> Hi Vladimir, >>> >>> Thank you for reviewing the code. I have updated the code as per your recommendations ( please look at the email below). >>> Link to the updated webrev: >>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/ >>> >>> Regards, >>> Smita >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Thursday, December 19, 2019 5:17 PM >>> To: Kamath, Smita >>> Cc: Viswanathan, Sandhya ; 'hotspot >>> compiler' >>> Subject: Re: RFR(M):8167065: Add intrinsic support for double >>> precision shifting on x86_64 >>> >>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record StubRoutines pointers: >>> >>> StubRoutines::_bigIntegerRightShiftWorker >>> StubRoutines::_bigIntegerLeftShiftWorker >>> Smita>>>done >>> >>> In the test add an other @run command with default setting (without -XX:-TieredCompilation -Xbatch). >>> Smita>>>done >>> >>> Thanks, >>> Vladimir >>> >>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/ >>> >>> On 12/18/19 6:33 PM, Kamath, Smita wrote: >>>> Hi Vladimir, >>>> >>>> I have made the code changes you suggested (please look at the email below). >>>> I have also enabled the intrinsic to run only when VBMI2 feature is available. >>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger. >>>> >>>> Webrev link: >>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/ >>>> >>>> Thanks, >>>> Smita >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Wednesday, December 11, 2019 10:55 AM >>>> To: Kamath, Smita ; 'hotspot compiler' >>>> ; Viswanathan, Sandhya >>>> >>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double >>>> precision shifting on x86_64 >>>> >>>> Hi Kamath, >>>> >>>> First, general question. What performance you see when VBMI2 >>>> instructions are *not* used with your new code vs code generated by C2. >>>> What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not. >>>> >>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 >>>> code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs? >>>> >>>> Third. I would suggest to wait after we fork JDK 14 with this >>>> changes. I think it may be too late for 14 because we would need test this including performance testing. >>>> >>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. >>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit >>>> when UseAVX < 3 ( < avx512). You can also use >>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that. >>>> Smita >>>done >>>> >>>> I don't think we need separate flag UseVBMI2 - it could be >>>> controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset. >>>> Smita >> removed UseVBMI2 flag >>>> >>>> In vm_version_x86.cpp you need to add more %s in print statement for new output. >>>> Smita? >>> done >>>> >>>> You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2. >>>> Smita >>> done >>>> >>>> You need to add intrinsics to Graal's test to ignore them: >>>> >>>> http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal. >>>> vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org >>>> /gr >>>> aalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416 >>>> Smita >>>done >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/10/19 5:41 PM, Kamath, Smita wrote: >>>>> Hi, >>>>> >>>>> >>>>> As per Intel Architecture Instruction Set Reference [1] VBMI2 >>>>> Operations will be supported in future Intel ISA. I would like to >>>>> contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled. >>>>> >>>>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065 >>>>> >>>>> Link to webrev : >>>>> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/ >>>>> >>>>> >>>>> >>>>> I ran jtreg test suite with the algorithm on Intel SDE [2] to >>>>> confirm that encoding and semantics are correctly implemented. >>>>> >>>>> >>>>> [1] >>>>> https://software.intel.com/sites/default/files/managed/39/c5/32546 >>>>> 2-s d m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and >>>>> vpshldv -> Vol. >>>>> 2C 5-471) >>>>> >>>>> [2] >>>>> https://software.intel.com/en-us/articles/intel-software-developme >>>>> nt- >>>>> e >>>>> mulator >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Smita Kamath >>>>> From Sergey.Bylokhov at oracle.com Tue Dec 24 18:22:15 2019 From: Sergey.Bylokhov at oracle.com (Sergey Bylokhov) Date: Tue, 24 Dec 2019 21:22:15 +0300 Subject: [15] Review Request: 8235975 Update copyright year to match last edit in jdk repository for 2014/15/16/17/18 In-Reply-To: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com> References: <1e7d0395-fc57-4d5b-9cfa-c33e0f6462d5@oracle.com> Message-ID: <3460a6f6-6178-cc45-5840-0f215eebc53f@oracle.com> Hello. Here is an updated version: Bug: https://bugs.openjdk.java.net/browse/JDK-8235975 Patch (2 Mb): http://cr.openjdk.java.net/~serb/8235975/webrev.03/open.patch Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.03/ - "jdk.internal.vm.compiler" is removed from the patch. - "Aes128CtsHmacSha2EType.java" is updated to "Copyright (c) 2018" On 12/22/19 11:24 pm, Sergey Bylokhov wrote: > Hello. > Please review the fix for JDK 15. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8235975 > Patch (2 Mb): http://cr.openjdk.java.net/~serb/8235975/webrev.02/open.patch > Fix: http://cr.openjdk.java.net/~serb/8235975/webrev.02 > > I have updated the source code copyrights by the "update_copyright_year.sh" > script for 2014/15/16/18/19 years, unfortunately, cannot run it for 2017 > because of: "JDK-8187443: Forest Consolidation: Move files to unified layout" > which touched all files. > > -- Best regards, Sergey. From xxinliu at amazon.com Tue Dec 24 20:56:53 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 24 Dec 2019 20:56:53 +0000 Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler Message-ID: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> I update the subject and switch back the corporation email. Sorry, I didn?t realize that there?s code of conduct. I will pay more attention on it. I validate the patch without PCH using --disable-precompiled-headers on both x86_64 and aarch64. Both of them built well. I still have difficulty to verify SPARC. I try the submit repo but I don?t have permission to push an experimental branch. May I ask a sponsor submit it on behalf of me? Thanks, --lx Hi Lx First, when you post RFR, please, include bug id in email's Subject. You can use jdk/submit testing to verify builds on SPARC. We are still building on it, with warning. Changes seems fine to me but make sure verify that it builds without PCH. Regards, Vladimir On 12/20/19 9:24 AM, Liu Xin wrote: > Martin? > > Thank you very much. May I know how to validate Sparc? I don't have any > SPARC machine to access. > > Thanks, > --lx > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin > wrote: > >> Hi lx, >> >> PPC and s390 parts are ok. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Liu, Xin >>> Sent: Freitag, 20. Dezember 2019 04:48 >>> To: 'hotspot-compiler-dev at openjdk.java.net' >> dev at openjdk.java.net> >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in >>> c1_LIRAssembler >>> >>> Hi, Reviewers, >>> >>> Could you take a look at my webrev? I feel that those barrisetSet >> interfaces >>> have nothing with c1_LIRAssembler. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ >>> >>> I try to build on aarch64 and x86_64 and it?s fine. >>> >>> Thanks, >>> --lx >> >> From Ningsheng.Jian at arm.com Wed Dec 25 05:52:03 2019 From: Ningsheng.Jian at arm.com (Ningsheng Jian) Date: Wed, 25 Dec 2019 05:52:03 +0000 Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> Message-ID: Hi lx, I've submitted it for you: http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982 I've also verified aarch64 and arm build locally, and it looks fine. Thanks, Ningsheng > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Liu, Xin > Sent: Wednesday, December 25, 2019 4:57 AM > To: Vladimir Kozlov ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler > > I update the subject and switch back the corporation email. Sorry, I didn?t > realize that there?s code of conduct. I will pay more attention on it. > > I validate the patch without PCH using --disable-precompiled-headers on both > x86_64 and aarch64. Both of them built well. > I still have difficulty to verify SPARC. I try the submit repo but I don?t have > permission to push an experimental branch. May I ask a sponsor submit it on > behalf of me? > > Thanks, > --lx > > > > > Hi Lx > > First, when you post RFR, please, include bug id in email's Subject. > > You can use jdk/submit testing to verify builds on SPARC. We are still building on > it, with warning. > Changes seems fine to me but make sure verify that it builds without PCH. > > Regards, > Vladimir > > On 12/20/19 9:24 AM, Liu Xin wrote: > > Martin? > > > > Thank you very much. May I know how to validate Sparc? I don't have > > any SPARC machine to access. > > > > Thanks, > > --lx > > > > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin > > wrote: > > > >> Hi lx, > >> > >> PPC and s390 parts are ok. > >> > >> Best regards, > >> Martin > >> > >> > >>> -----Original Message----- > >>> From: hotspot-compiler-dev >>> bounces at openjdk.java.net> On Behalf > >>> Of Liu, Xin > >>> Sent: Freitag, 20. Dezember 2019 04:48 > >>> To: > >>> 'hotspot-compiler-dev at openjdk.java.net >>> penjdk.java.net>' >>> dev at openjdk.java.net> > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > >>> c1_LIRAssembler > >>> > >>> Hi, Reviewers, > >>> > >>> Could you take a look at my webrev? I feel that those barrisetSet > >> interfaces > >>> have nothing with c1_LIRAssembler. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > >>> > >>> I try to build on aarch64 and x86_64 and it?s fine. > >>> > >>> Thanks, > >>> --lx > >> > >> From xxinliu at amazon.com Fri Dec 27 03:04:30 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 27 Dec 2019 03:04:30 +0000 Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> Message-ID: <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com> Hi, Ningsheng, Thank you for submitting the trial repo. I got the result email and it has passed all 80 tests. Here is the updated webrev. I updated the reviewers. https://cr.openjdk.java.net/~xliu/8236228/01/webrev/ This is a low-risk patch. As long as we can compile it, it won't have any side-effect at runtime. thanks, --lx ?On 12/24/19, 9:52 PM, "Ningsheng Jian" wrote: Hi lx, I've submitted it for you: http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982 I've also verified aarch64 and arm build locally, and it looks fine. Thanks, Ningsheng > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Liu, Xin > Sent: Wednesday, December 25, 2019 4:57 AM > To: Vladimir Kozlov ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler > > I update the subject and switch back the corporation email. Sorry, I didn?t > realize that there?s code of conduct. I will pay more attention on it. > > I validate the patch without PCH using --disable-precompiled-headers on both > x86_64 and aarch64. Both of them built well. > I still have difficulty to verify SPARC. I try the submit repo but I don?t have > permission to push an experimental branch. May I ask a sponsor submit it on > behalf of me? > > Thanks, > --lx > > > > > Hi Lx > > First, when you post RFR, please, include bug id in email's Subject. > > You can use jdk/submit testing to verify builds on SPARC. We are still building on > it, with warning. > Changes seems fine to me but make sure verify that it builds without PCH. > > Regards, > Vladimir > > On 12/20/19 9:24 AM, Liu Xin wrote: > > Martin? > > > > Thank you very much. May I know how to validate Sparc? I don't have > > any SPARC machine to access. > > > > Thanks, > > --lx > > > > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin > > wrote: > > > >> Hi lx, > >> > >> PPC and s390 parts are ok. > >> > >> Best regards, > >> Martin > >> > >> > >>> -----Original Message----- > >>> From: hotspot-compiler-dev >>> bounces at openjdk.java.net> On Behalf > >>> Of Liu, Xin > >>> Sent: Freitag, 20. Dezember 2019 04:48 > >>> To: > >>> 'hotspot-compiler-dev at openjdk.java.net >>> penjdk.java.net>' >>> dev at openjdk.java.net> > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > >>> c1_LIRAssembler > >>> > >>> Hi, Reviewers, > >>> > >>> Could you take a look at my webrev? I feel that those barrisetSet > >> interfaces > >>> have nothing with c1_LIRAssembler. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > >>> > >>> I try to build on aarch64 and x86_64 and it?s fine. > >>> > >>> Thanks, > >>> --lx > >> > >> From Alan.Bateman at oracle.com Sat Dec 28 08:22:05 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sat, 28 Dec 2019 08:22:05 +0000 Subject: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64 In-Reply-To: <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> References: <6563F381B547594081EF9DE181D07912B2D6B7A7@fmsmsx123.amr.corp.intel.com> <70b5bff1-a2ad-6fea-24ee-f0748e2ccae6@oracle.com> <3914890d-74e3-c69e-6688-55cf9f0c8551@oracle.com> <8919a5ce-727a-fb25-e739-4c14da108a7a@oracle.com> Message-ID: On 20/12/2019 22:19, Vladimir Kozlov wrote: > We should have added core-libs to review since you modified > BigInteger.java. > This adds Objects.checkFromToIndex checks in the middle of several supporting methods. Is IOOBE really possible in these cases or are these stand in for always-on asserts to ensure the instrinic is never used when the preconditions aren't satisfied? -Alan From jatin.bhateja at intel.com Mon Dec 30 03:11:25 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Mon, 30 Dec 2019 03:11:25 +0000 Subject: [14] RFR(S): 8236443 : Issues with specializing vector register type for phi operand with generic operands Message-ID: Hi All, Please find the patch at following link:- JBS : https://bugs.openjdk.java.net/browse/JDK-8236443 Webrev: http://cr.openjdk.java.net/~jbhateja/8236443/webrev.02/ Generic operand processing has been skipped for non-machine nodes (e.g. Phi) since they are skipped by the matcher. Non-definition operand resolution of a machine node will be able to pull the type information from non-machine node. Re-organized the code by adding a target specific routine which can be used to return the operand types for special ideal nodes e.g. RShiftCntV/LShiftCntV. For such nodes, definition machine operand vary for different targets and vector lengths, a type based generic operand resolution is not possible in such cases. Best Regards, Jatin From hohensee at amazon.com Mon Dec 30 19:16:55 2019 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 30 Dec 2019 19:16:55 +0000 Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com> References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com> Message-ID: <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com> Lgtm. Seems very low risk to me too. Paul ?On 12/26/19, 7:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Ningsheng, Thank you for submitting the trial repo. I got the result email and it has passed all 80 tests. Here is the updated webrev. I updated the reviewers. https://cr.openjdk.java.net/~xliu/8236228/01/webrev/ This is a low-risk patch. As long as we can compile it, it won't have any side-effect at runtime. thanks, --lx On 12/24/19, 9:52 PM, "Ningsheng Jian" wrote: Hi lx, I've submitted it for you: http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982 I've also verified aarch64 and arm build locally, and it looks fine. Thanks, Ningsheng > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Liu, Xin > Sent: Wednesday, December 25, 2019 4:57 AM > To: Vladimir Kozlov ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler > > I update the subject and switch back the corporation email. Sorry, I didn?t > realize that there?s code of conduct. I will pay more attention on it. > > I validate the patch without PCH using --disable-precompiled-headers on both > x86_64 and aarch64. Both of them built well. > I still have difficulty to verify SPARC. I try the submit repo but I don?t have > permission to push an experimental branch. May I ask a sponsor submit it on > behalf of me? > > Thanks, > --lx > > > > > Hi Lx > > First, when you post RFR, please, include bug id in email's Subject. > > You can use jdk/submit testing to verify builds on SPARC. We are still building on > it, with warning. > Changes seems fine to me but make sure verify that it builds without PCH. > > Regards, > Vladimir > > On 12/20/19 9:24 AM, Liu Xin wrote: > > Martin? > > > > Thank you very much. May I know how to validate Sparc? I don't have > > any SPARC machine to access. > > > > Thanks, > > --lx > > > > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin > > wrote: > > > >> Hi lx, > >> > >> PPC and s390 parts are ok. > >> > >> Best regards, > >> Martin > >> > >> > >>> -----Original Message----- > >>> From: hotspot-compiler-dev >>> bounces at openjdk.java.net> On Behalf > >>> Of Liu, Xin > >>> Sent: Freitag, 20. Dezember 2019 04:48 > >>> To: > >>> 'hotspot-compiler-dev at openjdk.java.net >>> penjdk.java.net>' >>> dev at openjdk.java.net> > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > >>> c1_LIRAssembler > >>> > >>> Hi, Reviewers, > >>> > >>> Could you take a look at my webrev? I feel that those barrisetSet > >> interfaces > >>> have nothing with c1_LIRAssembler. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > >>> > >>> I try to build on aarch64 and x86_64 and it?s fine. > >>> > >>> Thanks, > >>> --lx > >> > >> From xxinliu at amazon.com Mon Dec 30 22:39:33 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 30 Dec 2019 22:39:33 +0000 Subject: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler In-Reply-To: <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com> References: <1982726B-9702-4087-B962-354D2B9FAD92@amazon.com> <3C45D717-C0BF-40A4-93F5-EDA78FE91350@amazon.com> <3CC88DEC-56BE-4FDA-B775-7678F44E252A@amazon.com> Message-ID: Hi, Reviewers, Thanks for reviewing it. Paul is a reviewer. Martin reviewed PPC and s390 and Ningsheng reviewed Arm&Aarch64. Is that good to push? Thanks, --lx ?On 12/30/19, 11:16 AM, "Hohensee, Paul" wrote: Lgtm. Seems very low risk to me too. Paul On 12/26/19, 7:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Ningsheng, Thank you for submitting the trial repo. I got the result email and it has passed all 80 tests. Here is the updated webrev. I updated the reviewers. https://cr.openjdk.java.net/~xliu/8236228/01/webrev/ This is a low-risk patch. As long as we can compile it, it won't have any side-effect at runtime. thanks, --lx On 12/24/19, 9:52 PM, "Ningsheng Jian" wrote: Hi lx, I've submitted it for you: http://hg.openjdk.java.net/jdk/submit/rev/9d61b00d5982 I've also verified aarch64 and arm build locally, and it looks fine. Thanks, Ningsheng > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Liu, Xin > Sent: Wednesday, December 25, 2019 4:57 AM > To: Vladimir Kozlov ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(XXS): 8236228: clean up BarrierSet headers in c1_LIRAssembler > > I update the subject and switch back the corporation email. Sorry, I didn?t > realize that there?s code of conduct. I will pay more attention on it. > > I validate the patch without PCH using --disable-precompiled-headers on both > x86_64 and aarch64. Both of them built well. > I still have difficulty to verify SPARC. I try the submit repo but I don?t have > permission to push an experimental branch. May I ask a sponsor submit it on > behalf of me? > > Thanks, > --lx > > > > > Hi Lx > > First, when you post RFR, please, include bug id in email's Subject. > > You can use jdk/submit testing to verify builds on SPARC. We are still building on > it, with warning. > Changes seems fine to me but make sure verify that it builds without PCH. > > Regards, > Vladimir > > On 12/20/19 9:24 AM, Liu Xin wrote: > > Martin? > > > > Thank you very much. May I know how to validate Sparc? I don't have > > any SPARC machine to access. > > > > Thanks, > > --lx > > > > > > On Fri, Dec 20, 2019, 7:05 AM Doerr, Martin > > wrote: > > > >> Hi lx, > >> > >> PPC and s390 parts are ok. > >> > >> Best regards, > >> Martin > >> > >> > >>> -----Original Message----- > >>> From: hotspot-compiler-dev >>> bounces at openjdk.java.net> On Behalf > >>> Of Liu, Xin > >>> Sent: Freitag, 20. Dezember 2019 04:48 > >>> To: > >>> 'hotspot-compiler-dev at openjdk.java.net >>> penjdk.java.net>' >>> dev at openjdk.java.net> > >>> Subject: [DMARC FAILURE] RFR(XXS): clean up BarrierSet headers in > >>> c1_LIRAssembler > >>> > >>> Hi, Reviewers, > >>> > >>> Could you take a look at my webrev? I feel that those barrisetSet > >> interfaces > >>> have nothing with c1_LIRAssembler. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236228 > >>> Webrev: https://cr.openjdk.java.net/~xliu/8236228/00/webrev/ > >>> > >>> I try to build on aarch64 and x86_64 and it?s fine. > >>> > >>> Thanks, > >>> --lx > >> > >>