From OGATAK at jp.ibm.com Thu Aug 1 02:55:54 2019 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Thu, 1 Aug 2019 11:55:54 +0900 Subject: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: PPC64: Support AES intrinsics on Big Endian In-Reply-To: <2d8d8d35-c781-cc0c-2673-c8f7eea057bd@redhat.com> References: <2d8d8d35-c781-cc0c-2673-c8f7eea057bd@redhat.com> Message-ID: Hi Andrew, Thank you for reviewing the webrev. Regards, Ogata Andrew John Hughes wrote on 2019/08/01 01:05:48: > From: Andrew John Hughes > To: Kazunori Ogata , hotspot-compiler- > dev at openjdk.java.net, jdk8u-dev at openjdk.java.net > Date: 2019/08/01 01:13 > Subject: [EXTERNAL] Re: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) > backport of 8188868: PPC64: Support AES intrinsics on Big Endian > > > > On 31/07/2019 10:30, Kazunori Ogata wrote: > > Ping. > > > > May I get review for the almost clean backport? > > > > Regards, > > Ogata > > > > Kazunori Ogata/Japan/IBM wrote on 2019/07/24 17:48:23: > > > >> From: Kazunori Ogata/Japan/IBM > >> To: hotspot-compiler-dev at openjdk.java.net, jdk8u-dev at openjdk.java.net > >> Date: 2019/07/24 17:48 > >> Subject: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: > > PPC64: > >> Support AES intrinsics on Big Endian > >> > >> Hi, > >> > >> May I get review for backport of 8188868: PPC64: Support AES intrinsics > > on > >> Big Endian? > >> > >> The original patch itself can be applied cleanly (besides difference of > >> the source directory structure). However, one chunk failed because the > >> code just after the patched code was modified, so I manually applied the > > > >> chunk and renewed the patch. > >> > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8188868 > >> Webrev: > > http://cr.openjdk.java.net/~ogatak/jdk8u_aes_be/8188868/webrev.02/ > >> > >> This backport is low risk and affects only PPC64 only. I verified there > >> was no degradation in "make test" results and SPECjbb 2015 ran fine. The > > > >> intrinsics added in this changeset improved max jOPS by 5% and critical > > jOPS by 4%. > >> > >> Regards, > >> Ogata > > > > Sorry, I started looking at this yesterday, but didn't get chance to finish. > > It looks fine to me. The stubGenerator_ppc.cpp changes were a little > hard to follow, but comparing the patched version with the 11u version > looked ok. > > Good to go. > -- > Andrew :) > > Senior Free Java Software Engineer > Red Hat, Inc. (http://www.redhat.com) > > PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) > Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 > https://keybase.io/gnu_andrew > > [attachment "signature.asc" deleted by Kazunori Ogata/Japan/IBM] From shade at redhat.com Thu Aug 1 11:16:05 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 1 Aug 2019 13:16:05 +0200 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <36326F5E-12CD-487A-8FE2-1049631FE908@oracle.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> Message-ID: <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> On 7/31/19 12:55 PM, Andrew Dinn wrote: >> So if pre wbsync is no-op, why do we need to handle it everywhere? We seem to be falling through all >> the way to the stub to do nothing there, maybe we should instead cut much earlier, e.g. when >> inlining Unsafe.writeBackPresync0? Would it be better to not emit CacheWBPreSync at all? > > The pre sync is definitely not needed at present. However, I put it > there because I didn't know for sure if some future port of this > capability (e.g. to ppc) might need to sync prior writes before writing > back cache lines. [Indeed, the old Intel documentation stated that > pre-sync was needed on x86 for clflush to be safe but it is definitely > not needed.] I am more concerned that the writeback call enters the pre sync stub unnecessarily. I had the idea to do this more efficiently, and simplify code at the same time: how about emitting CacheWBPreSync nodes, but emitting nothing for them in .ad matchers? That would leave generic code generic, and architectures would then be able to avoid the stub altogether for pre sync code. This would simplify current stub generators too, I think: you don't need to pass arguments to them. This leaves calling via Unsafe. I believe pulling up the isPre choice to the stub generation time would be beneficial. That is, generate *two* stubs: StubRoutines::data_cache_writeback_pre_sync() and StubRoutines::data_cache_writeback_post_sync(). If arch does not need the pre_sync, generate nop version of pre_sync(). This is not a strong requirement from my side. I do believe it would make code a bit more straight-forward. >> === src/hotspot/cpu/x86/assembler_x86.cpp >> >> It feels like these comments are redundant, especially L8630 and L8646 which mention magic values >> "6" and "7", not present in the code: ... > 8624 // 0x66 is instruction prefix > > 8627 // 0x0f 0xAE is opcode family > > 8630 // rdi == 7 is extended opcode byte > . . . > > Given that the code is simply stuffing numbers (whether supplied as > literals or as symbolic constants) into a byte stream I think these > comments are a help when it comes to cross-checking each specific > assembly against the corresponding numbers declared in the Intel > manuals. So, I don't really want to remove them. Would you prefer me to > reverse the wording as above? I was merely commenting on the style: the rest of the file does not have comments like that. The positions of prefixes, opcode families, etc is kinda implied by the code shape. >> === src/hotspot/cpu/x86/macroAssembler_x86.cpp > // prefer clwb (potentially parallel writeback without evict) > // otherwise prefer clflushopt (potentially parallel writeback > // with evict) > // otherwise fallback on clflush (serial writeback with evict) > > In the second case the comment is redundant because the need for an > sfence is covered by the existing comment inside the if: > > // need an sfence for post flush when using clflushopt or clwb > // otherwise no no need for any synchroniaztion Yes, this would be good to add. >> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp >> >> Is it really "cmpl" here, not "cmpb"? I think aarch64 code tests for byte. >> >> 2942 __ cmpl(is_pre, 0); > > This is a Java boolean input. I believe that means the value will be > loaded into c_arg0 as an int so this test ought to be adequate. Okay. >> === src/hotspot/share/opto/c2compiler.cpp >> >> Why inject new cases here, instead of at the end of switch? Saves sudden "break": >> >> 578 break; >> 579 case vmIntrinsics::_writeback0: >> 580 if (!Matcher::match_rule_supported(Op_CacheWB)) return false; >> 581 break; >> 582 case vmIntrinsics::_writebackPreSync0: >> 583 if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false; >> 584 break; >> 585 case vmIntrinsics::_writebackPostSync0: >> 586 if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return false; >> 587 break; > > I placed them here so they were close to the other Unsafe intrinsics. In > particular they precede _allocateInstance, an ordering which is also the > case in the declarations in vmSymbols.hpp. > > In what sense do you mean that an extra 'break' is saved? That would be > true as regards the textual layout. It wouldn't affect the logic of > folding different ranges of values into branching range tests (which is > only determined by the numeric values of the intrinsics). If you are > concerned about the former then I would argue that placing the values in > declaration order seems to me to be the more important concern. I don't think we have to follow whatever ordering mess in vmSymbols.hpp. New code cuts into the last case block in that switch, which is mostly about "we know about these symbols, they are falling-through to the break". Adding cases with Matcher::match_rule_supported seems odd there. If anything, those new cases should be moved upwards to other cases, e.g. after vmIntrinsics::_minD. >> === src/hotspot/share/prims/unsafe.cpp >> >> Do we really need this function pointer mess? >> >> 457 void (*wb)(void *); >> 458 void *a = addr_from_java(line); >> 459 wb = (void (*)(void *)) StubRoutines::data_cache_writeback(); >> 460 assert(wb != NULL, "generate writeback stub!"); >> 461 (*wb)(a); >> >> Seems easier to: >> >> assert(StubRoutines::data_cache_writeback() != NULL, "sanity"); >> StubRoutines::data_cache_writeback()(addr_from_java(line)); > Hmm, "that whole brevity thing" again? Well, I guess you must now call > me "El Duderino". Well, that is, like, your opinion, man. C++ is messy if we allow it to be! > Are you sure that all the compilers used to build openJDK will happily > eat the second line of your replacement? If you can guarantee that I'll > happily remove the type declarations. I think they do: there are uses like that in the same file already, for example: if (StubRoutines::unsafe_arraycopy() != NULL) { StubRoutines::UnsafeArrayCopy_stub()(src, dst, sz); } else { Copy::conjoint_memory_atomic(src, dst, sz); } -Aleksey From adinn at redhat.com Thu Aug 1 11:48:37 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 1 Aug 2019 12:48:37 +0100 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <36326F5E-12CD-487A-8FE2-1049631FE908@oracle.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> Message-ID: <04ac9cb2-4d7b-d316-9031-b4ce68c3b58e@redhat.com> Hi Boris, On 31/07/2019 13:01, Boris Ulasevich wrote: > I did a quick check of the change across our platforms. Arm32 and x86_64 > built successfully. But I see it fails due to minor issues on aarch64 > and x86_32 with webrev.09. > Can you please have a look at this? > >> src/hotspot/cpu/aarch64/aarch64.ad:2202:1: error: expected ?;? before > ?}? token >> src/hotspot/cpu/x86/macroAssembler_x86.cpp:9925: undefined reference > to `Assembler::clflush(Address)' The AArch64 error was simply a missing semi-colon. With that corrected AArch64 now builds and runs as expected (i.e. it fails the PMem support test with an UnsupportedOperationException). The second error is happening because the calling method MacroAssembler::cache_wb has not been guarded with #ifdef _LP64 (the same applies for MacroAssembler::cache_wbsync). Note that cache line writeback via Unsafe.writeBackMemory is only expected to work on Intel x86_64 so these two methods only get called from x86_64-specific code (x86_64.ad and stuGenerator_x86_64.cpp). So, the solution to this specific problem is to add #ifdef _LP64 around the declaration and implementation of these two methods. At the same time it would be helpful to remove the redundant #ifdef _LP64/#endif that I for some strange reason inserted around the definitions, but not the declarations, of clflushopt and clwb (that didn't help when I was trying to work out what was going wrong). However, a related problem also needs fixing. The Java code for method Unsafe.writebackMemory only proceeds when the data cache line writeback unit size (value of field UnsafeConstants.DATA_CACHE_LINE_FLUSH_SIZE) is non-zero. Otherwise it throws an exception. On x86_32 that field /must/ be zero. The native methods which Unsafe calls out to and the intrinsics which replace the native calls are not implemented on x86_32. The field from which the value of the Java constant is derived is currently initialised using CPU-specific information in vm_version_x86.cpp as follows if (os::supports_map_sync()) { // publish data cache line flush size to generic field, otherwise // let if default to zero thereby disabling writeback _data_cache_line_flush_size = _cpuid_info.std_cpuid1_ebx.bits.clflush_size * 8; } i.e. writeback is enabled on x86 when the operating is known to be capable of supporting MAP_SYNC. os_linux.cpp returns true for that call, irrespective of whether this is 32 or 64 bit linux. The rationale is that any Linux is capable of supporting map_sync (by contrast Windows, Solaris, AIX etc currently return false). So, the above assignment also needs to be guarded by #ifdef _LP64 in order to ensure that writeback is never attempted on x86_32. Thank you for spotting these errors. I will add the relevant fixes to the next patch and add you as a reviewer. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From christian.hagedorn at oracle.com Fri Aug 2 07:06:27 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 2 Aug 2019 09:06:27 +0200 Subject: [14] RFR(S): 6394013: C2: Remove VerifyOpto In-Reply-To: <342271D9-8342-465C-ACAE-E45A8AAC78E1@oracle.com> References: <535cb7e6-ac4c-f1ed-9e41-42dfae9f3d18@oracle.com> <13b5e407-41fc-617a-a3e1-863d2c07dfb2@oracle.com> <342271D9-8342-465C-ACAE-E45A8AAC78E1@oracle.com> Message-ID: <3a2fd848-d14a-7d35-d6d2-57716ce04d44@oracle.com> Thank you Vladimir. I created a new RFE [1]. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8229015 On 31.07.19 19:00, Vladimir Kozlov wrote: > > >> On Jul 31, 2019, at 12:27 AM, Christian Hagedorn wrote: >> >> Hi Vladimir >> >> Thanks for taking a look at it and pointing that out. I just thought this line was there due to the VerifyOpto flag. But in that case I rather undo the change and keep the code there as before and change the comment. I updated the webrev: >> http://cr.openjdk.java.net/~thartmann/6394013/webrev.01/ > > Good. > >> >> Should I file a new RFE for investigating the effect of removing that line? > > Yes please. > > Thanks > Vladimir > >> >> Best regards, >> Christian >> >> >>> On 29.07.19 18:39, Vladimir Kozlov wrote: >>> Hi Christian >>> I am not sure about removing code in loopopts.cpp. Yes, comment have to be adjusted but we need investigate more how removing this code can affect optimization. >>> The issue here is not false-positive report from VerifyOpto but that optimization could be undone. >>> We do run _igvn.optimize() at the end of each loop opts iteration. >>> Thanks, >>> Vladimir >>>> On 7/29/19 6:18 AM, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following enhancement: >>>> https://bugs.openjdk.java.net/browse/JDK-6394013 >>>> http://cr.openjdk.java.net/~thartmann/6394013/webrev.00/ >>>> >>>> This kills the VerifyOpto flag. >>>> >>>> Thanks! >>>> >>>> Best regards, >>>> Christian > From thomas.schatzl at oracle.com Sat Aug 3 19:27:07 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sat, 3 Aug 2019 12:27:07 -0700 Subject: RFR (XS): Optimize branch frequency of G1's write post-barrier in C2 In-Reply-To: References: Message-ID: <41520e0a-671a-de55-24ab-6615fc456459@oracle.com> ping at compiler team to have a quick look. Thanks, Thomas On 11.07.19 16:35, Man Cao wrote: > Thanks Thomas for the review and running experiments! > > > - can you share the code changes to generate the statistics? It would > > be nice to confirm these on a few more applications and play around > > with them a bit :) > > I would like to confirm some very old numbers we have for other older > > benchmarks that this is indeed the best probabibility distribution. > > Particularly I do not understand that from these numbers we did not > > change the probabilities as you suggested :( There were other changes > > mostly related to barrier elision in that time frame, but it seems > > likelihood changes were not attempted. > > It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/ > I also added a comment in > https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the methodology. > > > - these numbers (and yours) also indicate that the not-young check is > > very likely to be not taken (i.e. you jump over the storeload). Did you > > also perform some experiments changing the order a bit? > > It might be detrimental for this particular case where the StoreLoad is > > expensive, and the xor/non-null filter out at least some additional of > > those, but maybe > > if (young) -> exit > > if (different-region) -> exit > > if (non-null) -> exit > > StoreLoad > > ... > > may be better to do? I am aware that the "young" check adds a load, > > which is also expensive (but not as much as the StoreLoad), but it > > seems to be an interesting case to look at. > > > > In our old results (as far as I can interpret them) it did not seem to > > have any advantage/disadvantage, so I am just curious whether you did > > such tests and their conclusion. > > Yes, I did this experiment. The load from card table on the fast path > turns out to be expensive for several benchmarks: > https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html > For this experiment, I was setting 4G heap with -XX:NewRatio=1, so most > writes happen to young object, and GC happens very infrequently. > The implementation had some bug that some benchmarks crashed while > running. I didn't look into fixing the bug, as this direction does not > seem worthwhile. > > > - internal (quick) perf testing showed no overall score changes, except > > that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time > > for very few experiments at this time, will rerun, so there is some > > chance that this has been a fluke) which is definitely nice. > > Good to hear that! > -Man From tobias.hartmann at oracle.com Mon Aug 5 06:09:21 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Aug 2019 08:09:21 +0200 Subject: [14] RFR(T): 8224957: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <9b9b9df8-d5d0-2851-dc15-74786b6cd5a8@oracle.com> References: <6cb331b3-b303-89fa-205a-4cbd7900b068@oracle.com> <87v9vp0wys.fsf@redhat.com> <7c04dfc9-8373-0535-731c-60fbd8470ade@oracle.com> <9b9b9df8-d5d0-2851-dc15-74786b6cd5a8@oracle.com> Message-ID: <1d04560d-d472-d3ab-a86c-8b7430c95890@oracle.com> Hi, here's an updated webrev for JDK 14 that re-enables AggressiveUnboxing: http://cr.openjdk.java.net/~thartmann/8224957/webrev.01/ I'll push once the fix for 8228772 is in. Thanks, Tobias On 29.07.19 09:14, Tobias Hartmann wrote: > Roland, Vladimir, thanks for the reviews! > > Unfortunately, testing revealed another spurious crash (non-schedulable graph) that only happens > with my patch. I suspect that this issue is unrelated but only triggers due to the changes to > Node::dominates. > > Since I need more time to investigate and due to concerns about other potential issues that might be > triggered by this fix, I've decided to disable AggressiveUnboxing for JDK 13. I've filed 8228710 [1] > and will send a RFR soon. > > Best regards, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8228710 > > On 27.07.19 03:36, Vladimir Kozlov wrote: >> +1 >> >> Vladimir >> >> On 7/26/19 6:03 AM, Roland Westrelin wrote: >>> >>>> http://cr.openjdk.java.net/~thartmann/8224957/webrev.00/ >>> >>> Looks good to me. >>> >>> Roland. >>> From tobias.hartmann at oracle.com Mon Aug 5 06:23:52 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Aug 2019 08:23:52 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> Message-ID: Hi Vladimir, thanks for the review. On 31.07.19 18:42, Vladimir Kozlov wrote: > Please compare with casted node mstore. > The test needs IgnoreUrecognizedVMOption for cases when C2 flags are used. Updated webrev: http://cr.openjdk.java.net/~thartmann/8228772/webrev.01/ > And please run it with Graal. I did. It passed with normal execution time (5-6s). Thanks, Tobias From tobias.hartmann at oracle.com Mon Aug 5 07:25:50 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Aug 2019 09:25:50 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> Message-ID: <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> Hi Nils, On 31.07.19 11:21, Nils Eliasson wrote: > Your patch looks good, and I seek no change to the patch, but I would like to take the opportunity > to gain some additional insight. Thanks for the review! > Doesn't the memory graph look a bit strange? If the load is control dependent on the membar, and the > membar creates a new memory state, shouldn't the load actually use that state? Yes, that's the case before EA: http://cr.openjdk.java.net/~thartmann/8228772/8228772_graph_before_EA.png But once EA figures out that the array allocation does not escape, it creates a memory Phi for that slice (6006), and re-wires the load to use that memory: http://cr.openjdk.java.net/~thartmann/8228772/8228772_graph_after_EA.png I think the code that does this is ConnectionGraph::find_inst_mem which steps through membars. > The basic assumption is that for all memory ops using the same memory state, all loads must precede > any store. This case breaks that. We should investigate why the IR ends up like this and if it is > actually correct. Right. Semantically, the IR seems correct to me though. Best regards, Tobias From nils.eliasson at oracle.com Mon Aug 5 14:51:07 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 5 Aug 2019 16:51:07 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> Message-ID: On 2019-08-05 09:25, Tobias Hartmann wrote: > Hi Nils, > > On 31.07.19 11:21, Nils Eliasson wrote: >> Your patch looks good, and I seek no change to the patch, but I would like to take the opportunity >> to gain some additional insight. > Thanks for the review! > >> Doesn't the memory graph look a bit strange? If the load is control dependent on the membar, and the >> membar creates a new memory state, shouldn't the load actually use that state? > Yes, that's the case before EA: > http://cr.openjdk.java.net/~thartmann/8228772/8228772_graph_before_EA.png > > But once EA figures out that the array allocation does not escape, it creates a memory Phi for that > slice (6006), and re-wires the load to use that memory: > http://cr.openjdk.java.net/~thartmann/8228772/8228772_graph_after_EA.png > > I think the code that does this is ConnectionGraph::find_inst_mem which steps through membars. ok, but then the store and the load doesn't alias, and no anti-dependence edge is actually needed. Perhaps that info isn't handled correctly by the anti-dep-checker. That is something we need to revisit for 14. // N > >> The basic assumption is that for all memory ops using the same memory state, all loads must precede >> any store. This case breaks that. We should investigate why the IR ends up like this and if it is >> actually correct. > Right. Semantically, the IR seems correct to me though. > > Best regards, > Tobias From tobias.hartmann at oracle.com Mon Aug 5 13:45:29 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Aug 2019 15:45:29 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> Message-ID: On 05.08.19 16:51, Nils Eliasson wrote: > ok, but then the store and the load doesn't alias, and no anti-dependence edge is actually needed. It's a membar and they are handled as being anti-dependent on everything. See this comment: http://hg.openjdk.java.net/jdk/jdk/file/90dcbeb8455e/src/hotspot/share/opto/gcm.cpp#l635 > Perhaps that info isn't handled correctly by the anti-dep-checker. That is something we need to > revisit for 14. Are you okay with filing a JDK 14 RFE for that? Thanks, Tobias From tobias.hartmann at oracle.com Mon Aug 5 13:48:46 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 5 Aug 2019 15:48:46 +0200 Subject: [14] RFR(S): 8229016: C2 scalarization crashes with assert(node->Opcode() == Op_CastP2X) failed: ConvP2XNode required Message-ID: <4cf5ec00-782f-64bb-7abe-90f8fd342617@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8229016 http://cr.openjdk.java.net/~thartmann/8229016/webrev.00/ When processing safepoint uses of a non-escaping array allocation during scalar replacement, we try to determine array element values from memory. In this case, a copy to the array is replaced by individual loads from the source [1]. Because the array copy has src == dst, we end up adding new loads from the to-be-eliminated array which confuses/crashes the following removal code. We should detect this case and try to determine the value from memory instead of adding a new load. Thanks, Tobias [1] see PhaseMacroExpand::scalar_replacement -> PhaseMacroExpand::value_from_mem -> PhaseMacroExpand::make_arraycopy_load From dean.long at oracle.com Mon Aug 5 20:04:09 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 5 Aug 2019 13:04:09 -0700 Subject: RFR (XS): Optimize branch frequency of G1's write post-barrier in C2 In-Reply-To: <41520e0a-671a-de55-24ab-6615fc456459@oracle.com> References: <41520e0a-671a-de55-24ab-6615fc456459@oracle.com> Message-ID: <70d36c8e-4730-e58a-a186-57bd4ad2728d@oracle.com> Looks OK to me dl On 8/3/19 12:27 PM, Thomas Schatzl wrote: > ping at compiler team to have a quick look. > > Thanks, > ? Thomas > > On 11.07.19 16:35, Man Cao wrote: >> Thanks Thomas for the review and running experiments! >> >> ?> - can you share the code changes to generate the statistics? It would >> ?> be nice to confirm these on a few more applications and play around >> ?> with them a bit :) >> ?> I would like to confirm some very old numbers we have for other older >> ?> benchmarks that this is indeed the best probabibility distribution. >> ?> Particularly I do not understand that from these numbers we did not >> ?> change the probabilities as you suggested :( There were other changes >> ?> mostly related to barrier elision in that time frame, but it seems >> ?> likelihood changes were not attempted. >> >> It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/ >> I also added a comment in >> https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the >> methodology. >> >> ?> - these numbers (and yours) also indicate that the not-young check is >> ?> very likely to be not taken (i.e. you jump over the storeload). >> Did you >> ?> also perform some experiments changing the order a bit? >> ?> It might be detrimental for this particular case where the >> StoreLoad is >> ?> expensive, and the xor/non-null filter out at least some >> additional of >> ?> those, but maybe >> ?> if (young) -> exit >> ?> if (different-region) -> exit >> ?> if (non-null) -> exit >> ?> StoreLoad >> ?> ... >> ?> may be better to do? I am aware that the "young" check adds a load, >> ?> which is also expensive (but not as much as the StoreLoad), but it >> ?> seems to be an interesting case to look at. >> ?> >> ?> In our old results (as far as I can interpret them) it did not >> seem to >> ?> have any advantage/disadvantage, so I am just curious whether you did >> ?> such tests and their conclusion. >> >> Yes, I did this experiment. The load from card table on the fast path >> turns out to be expensive for several benchmarks: >> https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html >> >> For this experiment, I was setting 4G heap with -XX:NewRatio=1, so >> most writes happen to young object, and GC happens very infrequently. >> The implementation had some bug that some benchmarks crashed while >> running. I didn't look into fixing the bug, as this direction does >> not seem worthwhile. >> >> ?> - internal (quick) perf testing showed no overall score changes, >> except >> ?> that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time >> ?> for very few experiments at this time, will rerun, so there is some >> ?> chance that this has been a fluke) which is definitely nice. >> >> Good to hear that! >> -Man > From manc at google.com Mon Aug 5 20:15:58 2019 From: manc at google.com (Man Cao) Date: Mon, 5 Aug 2019 13:15:58 -0700 Subject: RFR (XS): Optimize branch frequency of G1's write post-barrier in C2 In-Reply-To: <70d36c8e-4730-e58a-a186-57bd4ad2728d@oracle.com> References: <41520e0a-671a-de55-24ab-6615fc456459@oracle.com> <70d36c8e-4730-e58a-a186-57bd4ad2728d@oracle.com> Message-ID: Thanks for the reviews! -Man On Mon, Aug 5, 2019 at 1:04 PM wrote: > Looks OK to me > > dl > > On 8/3/19 12:27 PM, Thomas Schatzl wrote: > > ping at compiler team to have a quick look. > > > > Thanks, > > Thomas > > > > On 11.07.19 16:35, Man Cao wrote: > >> Thanks Thomas for the review and running experiments! > >> > >> > - can you share the code changes to generate the statistics? It would > >> > be nice to confirm these on a few more applications and play around > >> > with them a bit :) > >> > I would like to confirm some very old numbers we have for other older > >> > benchmarks that this is indeed the best probabibility distribution. > >> > Particularly I do not understand that from these numbers we did not > >> > change the probabilities as you suggested :( There were other changes > >> > mostly related to barrier elision in that time frame, but it seems > >> > likelihood changes were not attempted. > >> > >> It is here: http://cr.openjdk.java.net/~manc/8225776/branch_profiling/ > >> I also added a comment in > >> https://bugs.openjdk.java.net/browse/JDK-8225776 to clarify the > >> methodology. > >> > >> > - these numbers (and yours) also indicate that the not-young check is > >> > very likely to be not taken (i.e. you jump over the storeload). > >> Did you > >> > also perform some experiments changing the order a bit? > >> > It might be detrimental for this particular case where the > >> StoreLoad is > >> > expensive, and the xor/non-null filter out at least some > >> additional of > >> > those, but maybe > >> > if (young) -> exit > >> > if (different-region) -> exit > >> > if (non-null) -> exit > >> > StoreLoad > >> > ... > >> > may be better to do? I am aware that the "young" check adds a load, > >> > which is also expensive (but not as much as the StoreLoad), but it > >> > seems to be an interesting case to look at. > >> > > >> > In our old results (as far as I can interpret them) it did not > >> seem to > >> > have any advantage/disadvantage, so I am just curious whether you did > >> > such tests and their conclusion. > >> > >> Yes, I did this experiment. The load from card table on the fast path > >> turns out to be expensive for several benchmarks: > >> > https://cr.openjdk.java.net/~manc/8225776/20190516-jdk11G1WriteBarrier-dacapoDefault4G-YoungCheckFirst.html > >> > >> For this experiment, I was setting 4G heap with -XX:NewRatio=1, so > >> most writes happen to young object, and GC happens very infrequently. > >> The implementation had some bug that some benchmarks crashed while > >> running. I didn't look into fixing the bug, as this direction does > >> not seem worthwhile. > >> > >> > - internal (quick) perf testing showed no overall score changes, > >> except > >> > that maxJOPS on SpecJBB2015 seemed to improve by ~1.2% (only had time > >> > for very few experiments at this time, will rerun, so there is some > >> > chance that this has been a fluke) which is definitely nice. > >> > >> Good to hear that! > >> -Man > > > > From vladimir.kozlov at oracle.com Mon Aug 5 23:26:59 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Aug 2019 16:26:59 -0700 Subject: [14] RFR(S): 8229016: C2 scalarization crashes with assert(node->Opcode() == Op_CastP2X) failed: ConvP2XNode required In-Reply-To: <4cf5ec00-782f-64bb-7abe-90f8fd342617@oracle.com> References: <4cf5ec00-782f-64bb-7abe-90f8fd342617@oracle.com> Message-ID: <9812243a-63f2-7d77-1007-7e08d4881314@oracle.com> Good. Thanks, Vladimir On 8/5/19 6:48 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8229016 > http://cr.openjdk.java.net/~thartmann/8229016/webrev.00/ > > When processing safepoint uses of a non-escaping array allocation during scalar replacement, we try > to determine array element values from memory. In this case, a copy to the array is replaced by > individual loads from the source [1]. Because the array copy has src == dst, we end up adding new > loads from the to-be-eliminated array which confuses/crashes the following removal code. > > We should detect this case and try to determine the value from memory instead of adding a new load. > > Thanks, > Tobias > > [1] see PhaseMacroExpand::scalar_replacement -> PhaseMacroExpand::value_from_mem -> > PhaseMacroExpand::make_arraycopy_load > From vladimir.kozlov at oracle.com Mon Aug 5 23:31:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 5 Aug 2019 16:31:33 -0700 Subject: [14] RFR(T): 8224957: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <1d04560d-d472-d3ab-a86c-8b7430c95890@oracle.com> References: <6cb331b3-b303-89fa-205a-4cbd7900b068@oracle.com> <87v9vp0wys.fsf@redhat.com> <7c04dfc9-8373-0535-731c-60fbd8470ade@oracle.com> <9b9b9df8-d5d0-2851-dc15-74786b6cd5a8@oracle.com> <1d04560d-d472-d3ab-a86c-8b7430c95890@oracle.com> Message-ID: Looks good. Vladimir On 8/4/19 11:09 PM, Tobias Hartmann wrote: > Hi, > > here's an updated webrev for JDK 14 that re-enables AggressiveUnboxing: > http://cr.openjdk.java.net/~thartmann/8224957/webrev.01/ > > I'll push once the fix for 8228772 is in. > > Thanks, > Tobias > > On 29.07.19 09:14, Tobias Hartmann wrote: >> Roland, Vladimir, thanks for the reviews! >> >> Unfortunately, testing revealed another spurious crash (non-schedulable graph) that only happens >> with my patch. I suspect that this issue is unrelated but only triggers due to the changes to >> Node::dominates. >> >> Since I need more time to investigate and due to concerns about other potential issues that might be >> triggered by this fix, I've decided to disable AggressiveUnboxing for JDK 13. I've filed 8228710 [1] >> and will send a RFR soon. >> >> Best regards, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8228710 >> >> On 27.07.19 03:36, Vladimir Kozlov wrote: >>> +1 >>> >>> Vladimir >>> >>> On 7/26/19 6:03 AM, Roland Westrelin wrote: >>>> >>>>> http://cr.openjdk.java.net/~thartmann/8224957/webrev.00/ >>>> >>>> Looks good to me. >>>> >>>> Roland. >>>> From tobias.hartmann at oracle.com Tue Aug 6 05:41:03 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 6 Aug 2019 07:41:03 +0200 Subject: [14] RFR(T): 8224957: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: References: <6cb331b3-b303-89fa-205a-4cbd7900b068@oracle.com> <87v9vp0wys.fsf@redhat.com> <7c04dfc9-8373-0535-731c-60fbd8470ade@oracle.com> <9b9b9df8-d5d0-2851-dc15-74786b6cd5a8@oracle.com> <1d04560d-d472-d3ab-a86c-8b7430c95890@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 06.08.19 01:31, Vladimir Kozlov wrote: > Looks good. > > Vladimir > > On 8/4/19 11:09 PM, Tobias Hartmann wrote: >> Hi, >> >> here's an updated webrev for JDK 14 that re-enables AggressiveUnboxing: >> http://cr.openjdk.java.net/~thartmann/8224957/webrev.01/ >> >> I'll push once the fix for 8228772 is in. >> >> Thanks, >> Tobias >> >> On 29.07.19 09:14, Tobias Hartmann wrote: >>> Roland, Vladimir, thanks for the reviews! >>> >>> Unfortunately, testing revealed another spurious crash (non-schedulable graph) that only happens >>> with my patch. I suspect that this issue is unrelated but only triggers due to the changes to >>> Node::dominates. >>> >>> Since I need more time to investigate and due to concerns about other potential issues that might be >>> triggered by this fix, I've decided to disable AggressiveUnboxing for JDK 13. I've filed 8228710 [1] >>> and will send a RFR soon. >>> >>> Best regards, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8228710 >>> >>> On 27.07.19 03:36, Vladimir Kozlov wrote: >>>> +1 >>>> >>>> Vladimir >>>> >>>> On 7/26/19 6:03 AM, Roland Westrelin wrote: >>>>> >>>>>> http://cr.openjdk.java.net/~thartmann/8224957/webrev.00/ >>>>> >>>>> Looks good to me. >>>>> >>>>> Roland. >>>>> From tobias.hartmann at oracle.com Tue Aug 6 05:41:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 6 Aug 2019 07:41:31 +0200 Subject: [14] RFR(S): 8229016: C2 scalarization crashes with assert(node->Opcode() == Op_CastP2X) failed: ConvP2XNode required In-Reply-To: <9812243a-63f2-7d77-1007-7e08d4881314@oracle.com> References: <4cf5ec00-782f-64bb-7abe-90f8fd342617@oracle.com> <9812243a-63f2-7d77-1007-7e08d4881314@oracle.com> Message-ID: <19f7ba8d-5eaf-e1a7-b262-6a5c05bd85bb@oracle.com> Thanks Vladimir. Best regards, Tobias On 06.08.19 01:26, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 8/5/19 6:48 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8229016 >> http://cr.openjdk.java.net/~thartmann/8229016/webrev.00/ >> >> When processing safepoint uses of a non-escaping array allocation during scalar replacement, we try >> to determine array element values from memory. In this case, a copy to the array is replaced by >> individual loads from the source [1]. Because the array copy has src == dst, we end up adding new >> loads from the to-be-eliminated array which confuses/crashes the following removal code. >> >> We should detect this case and try to determine the value from memory instead of adding a new load. >> >> Thanks, >> Tobias >> >> [1] see PhaseMacroExpand::scalar_replacement -> PhaseMacroExpand::value_from_mem -> >> PhaseMacroExpand::make_arraycopy_load >> From christian.hagedorn at oracle.com Tue Aug 6 12:04:18 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 6 Aug 2019 14:04:18 +0200 Subject: [14] RFR(XS): 8229158: make UseSwitchProfiling non-experimental or false by-default In-Reply-To: <39490af4-3fcf-7849-b8a7-a394a3ee5c61@oracle.com> References: <3eda5642-17f2-75d5-62b7-e118eaed2e90@oracle.com> <39490af4-3fcf-7849-b8a7-a394a3ee5c61@oracle.com> Message-ID: <67fe2238-66bb-3464-32c3-685571b18be4@oracle.com> Hi David On 06.08.19 13:17, David Holmes wrote: > Hi Christian, > > On 6/08/2019 8:28 pm, Christian Hagedorn wrote: >> Hi >> >> Please review the following enhancement: >> https://bugs.openjdk.java.net/browse/JDK-8229158 >> http://cr.openjdk.java.net/~thartmann/8229158/webrev.00/ >> >> This just changes the flag UseSwitchProfiling from experimental to >> diagnostic. > > hotspot-compiler-dev seems like a better list for this change as that is > where the original code was reviewed. In particular I think you need to > get buy-in from the people that provided this code in the first place - > who would be Aleksey Shipilev and Roland Westrelin. > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028595.html Yes, you are right, would have been the better list! > To be clear, the rationale for this change is that there is a general > expectation that an experimental flag of the form UseX is used to turn > on an experimental feature X (that would be off by default). Here we > appear to have functionality that should always be on but we want a flag > to allow it to be turned off "just in case". Such a flag (if not > product) would therefore be better as diagnostic. This sounds reasonable, thanks for the detailed explanation! Best regards, Christian From adinn at redhat.com Tue Aug 6 12:12:48 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 6 Aug 2019 13:12:48 +0100 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <36326F5E-12CD-487A-8FE2-1049631FE908@oracle.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> Message-ID: Hi Aleksey/Boris, This is a response to both your last review posts. New webrev link is at the end. On 01/08/2019 12:16, Aleksey Shipilev wrote: > On 7/31/19 12:55 PM, Andrew Dinn wrote: . . . > I am more concerned that the writeback call enters the pre sync stub unnecessarily. The stub? I hope you mean when executing the native call as opposed to the JITted intrinsic? The stub is only called on a cold path when a native call proper happens. By contrast, the intrinsic translation for CacheWBPreSync on AArch64 and x86_64 does not insert any instructions into the generated code (and most especially not a call to the stub). Here it is: instruct cacheWBPreSync() %{ predicate(VM_Version::supports_data_cache_line_flush()); match(CacheWBPreSync); ins_cost(100); format %{"cache wb presync" %} ins_encode %{ __ cache_wbsync(true); %} ins_pipe(pipe_slow); // XXX %} void MacroAssembler::cache_wbsync(bool is_pre) { assert(VM_Version::supports_clflush(), "clflush should be available"); bool optimized = VM_Version::supports_clflushopt(); bool no_evict = VM_Version::supports_clwb(); // pick the correct implementation if (!is_pre && (optimized || no_evict)) { // need an sfence for post flush when using clflushopt or clwb // otherwise no no need for any synchroniaztion sfence(); } } > I had the idea to do this more efficiently, and simplify code at the same time: how about emitting > CacheWBPreSync nodes, but emitting nothing for them in .ad matchers? That would leave generic code > generic, and architectures would then be able to avoid the stub altogether for pre sync code. This > would simplify current stub generators too, I think: you don't need to pass arguments to them. I believe the intrinsic behaviour you are asking for is effectively what is implemented (as shown above). The .ad match rules for the PreSync and PostSync nodes both call MacroAssembler::cache_wbsync. For pre-sync it emits nothing. For post-sync it emits sfence when the writeback is implemented using clwb or clflushopt and nothing if writeback relies on clflush. > This leaves calling via Unsafe. I believe pulling up the isPre choice to the stub generation time > would be beneficial. That is, generate *two* stubs: StubRoutines::data_cache_writeback_pre_sync() > and StubRoutines::data_cache_writeback_post_sync(). If arch does not need the pre_sync, generate nop > version of pre_sync(). I don't really see any point to doing this. We can generate two stubs, one executing a nop and one executing a nop/sfence according to need. Or we can have one stub with a branch on the sync type and branch targets that execute either a nop or a nop/sfence as needed. The difference in performance of the stub is minor and irrelevant. The difference in generation time and memory use is minor and irrelevant. What are you trying to gain here? > This is not a strong requirement from my side. I do believe it would make code a bit more > straight-forward. Am I missing something here? Or did you simply miss that the intrinsic translation inserts no code for the presync? >>> === src/hotspot/cpu/x86/assembler_x86.cpp >>> >>> It feels like these comments are redundant, especially L8630 and L8646 which mention magic values >>> "6" and "7", not present in the code: > > ... > >> 8624 // 0x66 is instruction prefix >> >> 8627 // 0x0f 0xAE is opcode family >> >> 8630 // rdi == 7 is extended opcode byte >> . . . >> >> Given that the code is simply stuffing numbers (whether supplied as >> literals or as symbolic constants) into a byte stream I think these >> comments are a help when it comes to cross-checking each specific >> assembly against the corresponding numbers declared in the Intel >> manuals. So, I don't really want to remove them. Would you prefer me to >> reverse the wording as above? > > I was merely commenting on the style: the rest of the file does not have comments like that. The > positions of prefixes, opcode families, etc is kinda implied by the code shape. Yes, I too noticed that the rest of the file does not have any such comments :-] Given the highly variable shape of x86 machine code, I don't see any reason not to start remedying that omission, even if the remedy is only piecemeal. Commenting may not be a great help to maintainers who know the code and ISA really well but they are not the only audience. Even in that specific case the comments provide a sanity check. >>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp >> // prefer clwb (potentially parallel writeback without evict) >> // otherwise prefer clflushopt (potentially parallel writeback >> // with evict) >> // otherwise fallback on clflush (serial writeback with evict) >> >> In the second case the comment is redundant because the need for an >> sfence is covered by the existing comment inside the if: >> >> // need an sfence for post flush when using clflushopt or clwb >> // otherwise no no need for any synchroniaztion > > Yes, this would be good to add. Ok, done. >>> === src/hotspot/share/opto/c2compiler.cpp >>> >>> Why inject new cases here, instead of at the end of switch? Saves sudden "break": >>> >>> 578 break; >>> 579 case vmIntrinsics::_writeback0: >>> 580 if (!Matcher::match_rule_supported(Op_CacheWB)) return false; >>> 581 break; >>> 582 case vmIntrinsics::_writebackPreSync0: >>> 583 if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false; >>> 584 break; >>> 585 case vmIntrinsics::_writebackPostSync0: >>> 586 if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return false; >>> 587 break; >> >> I placed them here so they were close to the other Unsafe intrinsics. In >> particular they precede _allocateInstance, an ordering which is also the >> case in the declarations in vmSymbols.hpp. >> >> In what sense do you mean that an extra 'break' is saved? That would be >> true as regards the textual layout. It wouldn't affect the logic of >> folding different ranges of values into branching range tests (which is >> only determined by the numeric values of the intrinsics). If you are >> concerned about the former then I would argue that placing the values in >> declaration order seems to me to be the more important concern. > > I don't think we have to follow whatever ordering mess in vmSymbols.hpp. New code cuts into the last > case block in that switch, which is mostly about "we know about these symbols, they are > falling-through to the break". Adding cases with Matcher::match_rule_supported seems odd there. If > anything, those new cases should be moved upwards to other cases, e.g. after vmIntrinsics::_minD. As you wish. I have moved them to immediately preceding the large unbroken block. >>> === src/hotspot/share/prims/unsafe.cpp >>> >>> Do we really need this function pointer mess? >>> >>> 457 void (*wb)(void *); >>> 458 void *a = addr_from_java(line); >>> 459 wb = (void (*)(void *)) StubRoutines::data_cache_writeback(); >>> 460 assert(wb != NULL, "generate writeback stub!"); >>> 461 (*wb)(a); >>> >>> Seems easier to: >>> >>> assert(StubRoutines::data_cache_writeback() != NULL, "sanity"); >>> StubRoutines::data_cache_writeback()(addr_from_java(line)); >> Hmm, "that whole brevity thing" again? Well, I guess you must now call >> me "El Duderino". > > Well, that is, like, your opinion, man. C++ is messy if we allow it to be! > >> Are you sure that all the compilers used to build openJDK will happily >> eat the second line of your replacement? If you can guarantee that I'll >> happily remove the type declarations. > > I think they do: there are uses like that in the same file already, for example: > > if (StubRoutines::unsafe_arraycopy() != NULL) { > StubRoutines::UnsafeArrayCopy_stub()(src, dst, sz); > } else { > Copy::conjoint_memory_atomic(src, dst, sz); > } Hmm, your suggested replacement does not in fact compile. Indeed, the example you cite is comparing apples with pears -- note that the getter employed in call differs from the getter employed in the check. The following macro magic is provided for that example in stubRoutines.hpp: static address unsafe_arraycopy() { return _unsafe_arraycopy; } typedef void (*UnsafeArrayCopyStub)(const void* src, void* dst, size_t count); static UnsafeArrayCopyStub UnsafeArrayCopy_stub() { return CAST_TO_FN_PTR(UnsafeArrayCopyStub, _unsafe_arraycopy); } I have provided similar magic to hide the function pointer details for the writeback and writeback_sync stubs. Latest webrev against current jdk/jdk including changes agreed for the review discussion above fixes for AArch64/x86_32 build issues reported by Boris Ulasevich the new test promised for Boris Ulasevich http://cr.openjdk.java.net/~adinn/8224974/webrev.11 Testing: The x86_32 and aarch64 builds now build and run ok. The pmem-specific tests (PmemTest, MapFail) pass with the expected outcomes on x86_64 and AArch64. PmemTest is skipped on x86_32 (as expected). MapFail passes on x86_32 (it expects the map to be unsupported). I have passed the patch on for more thorough testing on simulated and real NVRAM using our middleware stack. I am still waiting for confirmation of a submit job. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From shade at redhat.com Tue Aug 6 12:44:19 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Aug 2019 14:44:19 +0200 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <36326F5E-12CD-487A-8FE2-1049631FE908@oracle.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> Message-ID: On 8/6/19 2:12 PM, Andrew Dinn wrote: > On 01/08/2019 12:16, Aleksey Shipilev wrote: >> On 7/31/19 12:55 PM, Andrew Dinn wrote: >> I am more concerned that the writeback call enters the pre sync stub unnecessarily. > > The stub? I hope you mean when executing the native call as opposed to > the JITted intrinsic? The stub is only called on a cold path when a > native call proper happens. By contrast, the intrinsic translation for > CacheWBPreSync on AArch64 and x86_64 does not insert any instructions > into the generated code (and most especially not a call to the stub). Ah, that is exactly what I wanted. Good then, scratch the rest of my comments. >> This is not a strong requirement from my side. I do believe it would make code a bit more >> straight-forward. > > Am I missing something here? Or did you simply miss that the intrinsic > translation inserts no code for the presync? I thought that translating two separate (and statically bound) Unsafe calls, hooking them up to separate Unsafe leaf entries, and then suddenly going into a single StubRoutine call with dynamic argument that dispatches at runtime is a bit awkward. I would have expected it to end up with two separate StubRoutines as well. Again, I have no strong opinion about this. >>>> === src/hotspot/share/prims/unsafe.cpp >>>> >>>> Do we really need this function pointer mess? >>>> >>>> 457 void (*wb)(void *); >>>> 458 void *a = addr_from_java(line); >>>> 459 wb = (void (*)(void *)) StubRoutines::data_cache_writeback(); >>>> 460 assert(wb != NULL, "generate writeback stub!"); >>>> 461 (*wb)(a); >>>> >>>> Seems easier to: ... > static address unsafe_arraycopy() { return _unsafe_arraycopy; } > typedef void (*UnsafeArrayCopyStub)(const void* src, void* dst, size_t > count); > static UnsafeArrayCopyStub UnsafeArrayCopy_stub() { return > CAST_TO_FN_PTR(UnsafeArrayCopyStub, _unsafe_arraycopy); } > > I have provided similar magic to hide the function pointer details for > the writeback and writeback_sync stubs. Yes, this looks cleaner. The declarations can be a bit less crowded: static address data_cache_writeback() { return _data_cache_writeback; } static address data_cache_writeback_sync() { return _data_cache_writeback_sync; } typedef void (*DataCacheWritebackStub)(void *); static DataCacheWritebackStub DataCacheWriteback_stub() { return ... typedef void (*DataCacheWritebackSyncStub)(bool); static DataCacheWritebackSyncStub DataCacheWritebackSync_stub() { return ... > http://cr.openjdk.java.net/~adinn/8224974/webrev.11 Looks good. Minor nits (no need for another webrev): *) Not sure if the only copyright line change is needed in src/hotspot/cpu/aarch64/globals_aarch64.hpp. *) Indenting is a bit off at L109 in src/hotspot/cpu/aarch64/vm_version_aarch64.hpp: 108 static int cpu_revision() { return _revision; } 109 static bool supports_dcpop() { return _dcpop; } *) Excess new line added at the end of src/hotspot/os/bsd/os_bsd.cpp? *) Indenting is off in backslashes in src/hotspot/share/runtime/globals.hpp: 2444 \ 2445 develop(bool, TraceMemoryWriteback, false, \ 2446 "Trace memory writeback operations") \ 2447 \ *) Unnecessary newline at L827 in src/hotspot/share/runtime/os.hpp? 826 // support for mapping non-volatile memory using MAP_SYNC 827 828 static bool supports_map_sync(); *) These declarations are too dense in src/java.base/share/classes/jdk/internal/misc/Unsafe.java: 998 /** 999 * primitive operation forcing writeback of a single cache line. 1000 * 1001 * @param address 1002 * the start address of the cache line to be written back 1003 */ 1004 // native used to write back an individual cache line starting at 1005 // the supplied address 1006 @HotSpotIntrinsicCandidate 1007 private native void writeback0(long address); 1008 // native used to serialise writeback operations relative to 1009 // preceding memory writes 1010 @HotSpotIntrinsicCandidate 1011 private native void writebackPreSync0(); 1012 // native used to serialise writeback operations relative to 1013 // following memory writes 1014 @HotSpotIntrinsicCandidate 1015 private native void writebackPostSync0(); Suggestion: /** * Force write back an individual cache line. * * @param address * the start address of the cache line to be written back */ @HotSpotIntrinsicCandidate private native void writeback0(long address); /** * Serialize writeback operations relative to preceding memory writes. */ @HotSpotIntrinsicCandidate private native void writebackPreSync0(); /** * Serialize writeback operations relative to following memory writes. */ @HotSpotIntrinsicCandidate private native void writebackPostSync0(); -- Thanks, -Aleksey From adinn at redhat.com Tue Aug 6 13:57:05 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 6 Aug 2019 14:57:05 +0100 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <36326F5E-12CD-487A-8FE2-1049631FE908@oracle.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> Message-ID: <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> On 06/08/2019 13:44, Aleksey Shipilev wrote: > Ah, that is exactly what I wanted. Good then, scratch the rest of my comments. > . . . > I thought that translating two separate (and statically bound) Unsafe calls, hooking them up to > separate Unsafe leaf entries, and then suddenly going into a single StubRoutine call with dynamic > argument that dispatches at runtime is a bit awkward. I would have expected it to end up with two > separate StubRoutines as well. Again, I have no strong opinion about this. Ok, thanks for clarifying. Inertia dictates I leave the stubs as is :-) > Yes, this looks cleaner. The declarations can be a bit less crowded: > > static address data_cache_writeback() { return _data_cache_writeback; } > static address data_cache_writeback_sync() { return _data_cache_writeback_sync; } > > typedef void (*DataCacheWritebackStub)(void *); > static DataCacheWritebackStub DataCacheWriteback_stub() { return ... > > typedef void (*DataCacheWritebackSyncStub)(bool); > static DataCacheWritebackSyncStub DataCacheWritebackSync_stub() { return ... > . . . >> http://cr.openjdk.java.net/~adinn/8224974/webrev.11 > > Looks good. Ok, I'll fold this and the other format errors you identified into the next patch. If I could please get a nod from Alan Bateman (and assuming I don't receive further comments from other reviewers) I'll push that next patch. Any more for any more ... ? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From dmitry.chuyko at bell-sw.com Tue Aug 6 14:25:40 2019 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 6 Aug 2019 17:25:40 +0300 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <8B1A8EDC-F9D7-4085-A34F-69100DBD7D5C@oracle.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> Message-ID: Hi Andrew, One quick question about synchronization in unmappers. One of preliminary steps for Loom was to replace monitor usage by j.u.c locks for I/O to let fibers release carrier threads. For instance see JDK-8222774. Does it make sense to do the same in your new unmappers code? -Dmitry [1] https://bugs.openjdk.java.net/browse/JDK-8222774 On 8/6/19 4:57 PM, Andrew Dinn wrote: > On 06/08/2019 13:44, Aleksey Shipilev wrote: >> Ah, that is exactly what I wanted. Good then, scratch the rest of my comments. >> . . . >> I thought that translating two separate (and statically bound) Unsafe calls, hooking them up to >> separate Unsafe leaf entries, and then suddenly going into a single StubRoutine call with dynamic >> argument that dispatches at runtime is a bit awkward. I would have expected it to end up with two >> separate StubRoutines as well. Again, I have no strong opinion about this. > Ok, thanks for clarifying. Inertia dictates I leave the stubs as is :-) > >> Yes, this looks cleaner. The declarations can be a bit less crowded: >> >> static address data_cache_writeback() { return _data_cache_writeback; } >> static address data_cache_writeback_sync() { return _data_cache_writeback_sync; } >> >> typedef void (*DataCacheWritebackStub)(void *); >> static DataCacheWritebackStub DataCacheWriteback_stub() { return ... >> >> typedef void (*DataCacheWritebackSyncStub)(bool); >> static DataCacheWritebackSyncStub DataCacheWritebackSync_stub() { return ... >> . . . >>> http://cr.openjdk.java.net/~adinn/8224974/webrev.11 >> Looks good. > Ok, I'll fold this and the other format errors you identified into the > next patch. > > > If I could please get a nod from Alan Bateman (and assuming I don't > receive further comments from other reviewers) I'll push that next patch. > > Any more for any more ... ? > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Tue Aug 6 16:09:46 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 6 Aug 2019 17:09:46 +0100 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <5c78b5d7-dc9e-fbbd-41a9-5139ed6ee32c@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> Message-ID: <3fb6e11f-4c95-8d88-3144-0c46c455a5db@redhat.com> Hello Dmitry, On 06/08/2019 15:25, Dmitry Chuyko wrote: > One quick question about synchronization in unmappers. One of > preliminary steps for Loom was to replace monitor usage by j.u.c locks > for I/O to let fibers release carrier threads. For instance see > JDK-8222774. Does it make sense to do the same in your new unmappers code? > . . . > [1] https://bugs.openjdk.java.net/browse/JDK-8222774 The unmapper code is not strictly 'new' as regards its reliance on synchronization. It merely follows and repeats the pattern employed in the prior code that it has generalized (by splitting the original Unmapper into two distinct flavours of subclass). If this poses a problem for Loom then it is a separate issue form the one this JEP addresses. I think you should raise a new issue for that change (just as you would have had to do before this change). I am sure Alan Bateman will be happy to consider your proposal. Indeed, I would be happy to implement it given his approval -- or leave it to you to do so if you prefer. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From vladimir.kozlov at oracle.com Tue Aug 6 21:12:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Aug 2019 14:12:40 -0700 Subject: RFR(XS): 8227384: C2 compilation fails with "graph should be schedulable" when running with -XX:-EliminateLocks In-Reply-To: <871ryf3c37.fsf@redhat.com> References: <87lfx657u8.fsf@redhat.com> <288a892d-5486-6547-df77-cc39337b1a29@oracle.com> <87a7dl57vu.fsf@redhat.com> <871ryf3c37.fsf@redhat.com> Message-ID: <2264efd4-a7e4-25ee-493b-fb058a3d3bf9@oracle.com> > The mark word load for unlocking is created after the release lock > membar with both control and memory set to projections of the > membar. Because the allocation is non escaping, when the load is later > processed, its memory is changed to be above the membar while the > control is unchanged. A precedence edge is added by anti dependence > checking to force the load above the membar while its control is > below. As a result, the graph is not schedulable. This reminds me bug Tobias is working on 8228772 [1]. On 7/24/19 4:29 AM, Roland Westrelin wrote: > > Here is a new fix: > > http://cr.openjdk.java.net/~roland/8227384/webrev.01/ Expanding Locks before Allocations is good idea. We do eliminate Locks before eliminating Allocations. Will a load after IGVN optimization folds with load generated in PhaseMacroExpand::initialize_object() ? I don't see offset check in is_new_object_mark_load(). How it known it is load from *mark word*? > > This time the fix makes sure the load of the mark word that causes the > graph to be unschedulable is properly eliminated. This is achieved by > proceeding with macro expansions in 2 steps: first all macro nodes > except allocations and then only allocations. A pass of igvn is perfomed > between the 2 steps. That's where the load from a newly allocated object > is eliminated by a new Ideal or Value transformation. > > Roland. > Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8228772 From vladimir.kozlov at oracle.com Tue Aug 6 21:58:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Aug 2019 14:58:39 -0700 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> Message-ID: <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> Hi Jie Very interesting observation. I am concern that webrev.01 does check for general loop which may not be vectorized. Even if your optimization helps in particular case it may make some loop regress due to executing more branches. On 7/11/19 1:20 AM, Jie Fu wrote: > Hi all, > > With more experiments, the loop's trip_count seems a good feature to detect over loop unrolling. > And on some platforms, the branch-miss rate had been observed increasing dramatically with small > loop trip count. Why? With more unrolling you should have less number of branches. > It seems that we shouldn't unroll if the trip count becomes too small. May be there is different explanation for this. May be big loop body does not fit into code buffer in X86 cpu - or something like that. End we should watch for body size instead. Thanks, Vladimir > > I've updated the webrev here: http://cr.openjdk.java.net/~jiefu/8227505/webrev.01/ > > Please review it and give me some advice. > > Thanks a lot. > Best regards, > Jie > > On 2019/7/10 ??4:38, Jie Fu wrote: >> Hi all, >> >> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8227505 >> Webrev: http://cr.openjdk.java.net/~jiefu/8227505/webrev.00/ >> >> The patch fix the over loop unrolling problem caused by SuperWordLoopUnrollAnalysis. >> For more info., please refer to the JBS. >> >> Could you please review it and give me some advice? >> >> Thanks a lot. >> Best regards, >> Jie >> >> > From fujie at loongson.cn Wed Aug 7 01:59:44 2019 From: fujie at loongson.cn (Jie Fu) Date: Wed, 7 Aug 2019 09:59:44 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> Message-ID: Hi Vladimir, Thanks for your review and valuable comments. Please see comments below. On 2019/8/7 ??5:58, Vladimir Kozlov wrote: > Hi Jie > > ?Even if your optimization helps in particular case it may make some > loop regress due to executing more branches. Yes, I agree. > > On 7/11/19 1:20 AM, Jie Fu wrote: >> Hi all, >> >> With more experiments, the loop's trip_count seems a good feature to >> detect over loop unrolling. >> And on some platforms, the branch-miss rate had been observed >> increasing dramatically with small loop trip count. > > Why? With more unrolling you should have less number of branches. I had just asked my kernel colleagues to explain the strange perf results. They found that I had used an old version of perf which had some bugs in it. So I'm sorry for that noise. > >> It seems that we shouldn't unroll if the trip count becomes too small. > > May be there is different explanation for this. May be big loop body > does not fit into code buffer in X86 cpu - or something like that. End > we should watch for body size instead. OK. I will try to find a better solution. Thanks a lot. Best regards, Jie From tobias.hartmann at oracle.com Wed Aug 7 08:10:58 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Aug 2019 10:10:58 +0200 Subject: [13] RFR(T): 8229219: C2 compilation fails with assert: Bad graph detected in build_loop_late Message-ID: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> Hi, please review the following patch that backs out the fix for JDK-8173196 [1] due to intermittent C2 crashes in loopopts with a JCK test (details are in the bug comments): https://bugs.openjdk.java.net/browse/JDK-8229219 http://cr.openjdk.java.net/~thartmann/8229219/webrev.00/ I'll file a REDO enhancement for JDK 14 once this is in. Thanks, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/rev/9691a169f1dd From shade at redhat.com Wed Aug 7 08:16:48 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 7 Aug 2019 10:16:48 +0200 Subject: [13] RFR(T): 8229219: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> References: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> Message-ID: <4e7b92e4-216b-ebb8-90eb-818e4ddcbf52@redhat.com> On 8/7/19 10:10 AM, Tobias Hartmann wrote: > please review the following patch that backs out the fix for JDK-8173196 [1] due to intermittent C2 > crashes in loopopts with a JCK test (details are in the bug comments): > https://bugs.openjdk.java.net/browse/JDK-8229219 > http://cr.openjdk.java.net/~thartmann/8229219/webrev.00/ Looks like clean reversal. Looks good and trivial. -- Thanks, -Aleksey From tobias.hartmann at oracle.com Wed Aug 7 08:33:18 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Aug 2019 10:33:18 +0200 Subject: [13] RFR(T): 8229219: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <4e7b92e4-216b-ebb8-90eb-818e4ddcbf52@redhat.com> References: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> <4e7b92e4-216b-ebb8-90eb-818e4ddcbf52@redhat.com> Message-ID: Hi Aleksey, thanks for the review! Best regards, Tobias On 07.08.19 10:16, Aleksey Shipilev wrote: > On 8/7/19 10:10 AM, Tobias Hartmann wrote: >> please review the following patch that backs out the fix for JDK-8173196 [1] due to intermittent C2 >> crashes in loopopts with a JCK test (details are in the bug comments): >> https://bugs.openjdk.java.net/browse/JDK-8229219 >> http://cr.openjdk.java.net/~thartmann/8229219/webrev.00/ > > Looks like clean reversal. Looks good and trivial. > From dmitry.chuyko at bell-sw.com Wed Aug 7 09:44:51 2019 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 7 Aug 2019 12:44:51 +0300 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: <3fb6e11f-4c95-8d88-3144-0c46c455a5db@redhat.com> References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> <3fb6e11f-4c95-8d88-3144-0c46c455a5db@redhat.com> Message-ID: <83466f7a-0436-940c-ca87-2a8b4a7c3e7d@bell-sw.com> On 8/6/19 7:09 PM, Andrew Dinn wrote: > Hello Dmitry, > > On 06/08/2019 15:25, Dmitry Chuyko wrote: >> One quick question about synchronization in unmappers. One of >> preliminary steps for Loom was to replace monitor usage by j.u.c locks >> for I/O to let fibers release carrier threads. For instance see >> JDK-8222774. Does it make sense to do the same in your new unmappers code? >> . . . >> [1] https://bugs.openjdk.java.net/browse/JDK-8222774 > The unmapper code is not strictly 'new' as regards its reliance on > synchronization. It merely follows and repeats the pattern employed in > the prior code that it has generalized (by splitting the original > Unmapper into two distinct flavours of subclass). > > If this poses a problem for Loom then it is a separate issue form the > one this JEP addresses. I think you should raise a new issue for that > change (just as you would have had to do before this change). I am sure > Alan Bateman will be happy to consider your proposal. Indeed, I would be > happy to implement it given his approval -- or leave it to you to do so > if you prefer. Agree, Loom has a long road to go. So I suppose such a change will be a part of larger work in sun.nio, and I or one of my colleagues will be happy to participate. Changes will probably be straightforward (e.g. JDK-8222882) but this synchronization is not covered by regression tests so I believe in this case you'll help to retry some of your ad-hoc testing or maybe some application tests you know about. -Dmitry > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From rickard.backman at oracle.com Wed Aug 7 09:51:08 2019 From: rickard.backman at oracle.com (Rickard =?utf-8?Q?B=C3=A4ckman?=) Date: Wed, 7 Aug 2019 11:51:08 +0200 Subject: [13] RFR(T): 8229219: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> References: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> Message-ID: <20190807095108.yipritwzb4c5sm3z@rbackman> Looks good. /R On 08/07, Tobias Hartmann wrote: > Hi, > > please review the following patch that backs out the fix for JDK-8173196 [1] due to intermittent C2 > crashes in loopopts with a JCK test (details are in the bug comments): > https://bugs.openjdk.java.net/browse/JDK-8229219 > http://cr.openjdk.java.net/~thartmann/8229219/webrev.00/ > > I'll file a REDO enhancement for JDK 14 once this is in. > > Thanks, > Tobias > > [1] http://hg.openjdk.java.net/jdk/jdk/rev/9691a169f1dd From tobias.hartmann at oracle.com Wed Aug 7 09:52:43 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Aug 2019 11:52:43 +0200 Subject: [13] RFR(T): 8229219: C2 compilation fails with assert: Bad graph detected in build_loop_late In-Reply-To: <20190807095108.yipritwzb4c5sm3z@rbackman> References: <472dc269-0585-56b1-36ae-4428a2e2af5f@oracle.com> <20190807095108.yipritwzb4c5sm3z@rbackman> Message-ID: <621da4d7-3960-7080-d64f-bc936a828da0@oracle.com> Thanks Rickard! Best regards, Tobias On 07.08.19 11:51, Rickard B?ckman wrote: > Looks good. > > /R > > On 08/07, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch that backs out the fix for JDK-8173196 [1] due to intermittent C2 >> crashes in loopopts with a JCK test (details are in the bug comments): >> https://bugs.openjdk.java.net/browse/JDK-8229219 >> http://cr.openjdk.java.net/~thartmann/8229219/webrev.00/ >> >> I'll file a REDO enhancement for JDK 14 once this is in. >> >> Thanks, >> Tobias >> >> [1] http://hg.openjdk.java.net/jdk/jdk/rev/9691a169f1dd From Alan.Bateman at oracle.com Wed Aug 7 10:21:31 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 7 Aug 2019 11:21:31 +0100 Subject: RFR: 8224974: Implement JEP 352 In-Reply-To: <3fb6e11f-4c95-8d88-3144-0c46c455a5db@redhat.com> References: <80da32b2-7acb-7b94-b82c-5dcd5cf95539@redhat.com> <86d000f3-9eca-6089-5f7e-5698444d99ce@oracle.com> <7fe8d17d-6170-fcc9-b87a-eccfda2bb546@redhat.com> <7b39f68d-5698-4079-425c-c86ec161e361@oracle.com> <72810832-eb88-3418-f336-b95af95d9dcc@redhat.com> <20c8bbcb-2cd3-77af-cd40-29f7bd752166@oracle.com> <68566847-de80-82af-a928-d30c75f2b1b2@redhat.com> <892f3b0b-55f1-872d-15f4-5af907fa8437@redhat.com> <8f72d3b5-3a01-a7de-3b09-35571266a87d@redhat.com> <671c7ba1-9597-a990-190b-145d9cb5349e@redhat.com> <81098801-31b2-e3a8-5c25-043ab47c80bc@redhat.com> <3a3f9422-f81d-8ee2-5c07-687f2abe18df@redhat.com> <67c3888c-9058-9e5c-273e-7810b1c66e29@redhat.com> <3fb6e11f-4c95-8d88-3144-0c46c455a5db@redhat.com> Message-ID: On 06/08/2019 09:09, Andrew Dinn wrote: > : > The unmapper code is not strictly 'new' as regards its reliance on > synchronization. It merely follows and repeats the pattern employed in > the prior code that it has generalized (by splitting the original > Unmapper into two distinct flavours of subclass). > > If this poses a problem for Loom then it is a separate issue form the > one this JEP addresses. I think you should raise a new issue for that > change (just as you would have had to do before this change). I am sure > Alan Bateman will be happy to consider your proposal. Indeed, I would be > happy to implement it given his approval -- or leave it to you to do so > if you prefer. > I don't think we need to be concerned with any of this at this time. The unmapper is run by the reference handler thread. Also the synchronization here is for the counters so not the same thing as doing a blocking I/O operation while holding a monitor. At some point we'll examine all the file I/O operations as some of these are candidates for managed blockers, others are candidates for alternative implementations - there are bigger issues to resolve first and we've been trying to avoid carrying too many changes due to the complexity and effort needed to keep them in sync with the main line. -Alan From tobias.hartmann at oracle.com Wed Aug 7 14:13:44 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 7 Aug 2019 16:13:44 +0200 Subject: [14] RFR(S): 8228888: C2 compilation fails with assert "m has strange control" Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8228888 http://cr.openjdk.java.net/~thartmann/8228888/webrev.00/ I found this while trying to write a regression test for another bug. The assert triggers when OSR compiling an infinite loop with two back branches (see StrangeControl.jasm): http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png PhaseIdealLoop::has_local_phi_input() tries to determine if all inputs of n (118 Phi) are block local phis. When looking at input m (108 StoreI), the assert fires because m is not a Phi and control of m (102 IfFalse) does not dominate control of n (83 Region). I think the assert which was added by [1] is too strong. If n is a Phi itself, control of all its inputs does not need to dominate its own control. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8187822 From tobias.hartmann at oracle.com Thu Aug 8 07:38:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 8 Aug 2019 09:38:01 +0200 Subject: RFR(XS): 8227384: C2 compilation fails with "graph should be schedulable" when running with -XX:-EliminateLocks In-Reply-To: <2264efd4-a7e4-25ee-493b-fb058a3d3bf9@oracle.com> References: <87lfx657u8.fsf@redhat.com> <288a892d-5486-6547-df77-cc39337b1a29@oracle.com> <87a7dl57vu.fsf@redhat.com> <871ryf3c37.fsf@redhat.com> <2264efd4-a7e4-25ee-493b-fb058a3d3bf9@oracle.com> Message-ID: On 06.08.19 23:12, Vladimir Kozlov wrote: >> The mark word load for unlocking is created after the release lock >> membar with both control and memory set to projections of the >> membar. Because the allocation is non escaping, when the load is later >> processed, its memory is changed to be above the membar while the >> control is unchanged. A precedence edge is added by anti dependence >> checking to force the load above the membar while its control is >> below. As a result, the graph is not schedulable. > > This reminds me bug Tobias is working on 8228772 [1]. Yes, I think it's the same problem (Roland's regression test does not crash anymore with my fix for 8228772). However, with 8228772, the load is not from the mark word but from the array contents and therefore Roland's fix does not help in this case. But in any case, it's nice to fold such mark word loads from newly allocated objects. Please add a comment to macro.cpp:2594 that describes why that additional IGVN run was added. Best regards, Tobias From rahul.v.raghavan at oracle.com Thu Aug 8 09:51:33 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Thu, 8 Aug 2019 15:21:33 +0530 Subject: [14]RFR: 8227439: Turn off AOT by default In-Reply-To: <221042DC-6997-4595-80BF-B0F69207AC13@oracle.com> References: <5470c108-92ae-f5d8-5cdc-319b2e646ee5@oracle.com> <4938ce5a-232b-0ef5-8982-e5bfb0e32a1e@oracle.com> <4713f2c4-8de6-cb10-61e3-b15e05a4b146@oracle.com> <221042DC-6997-4595-80BF-B0F69207AC13@oracle.com> Message-ID: <6d71067d-822f-71bd-2c8a-cd1900a861e6@oracle.com> Thanks for review. Pushed latest webrev - http://cr.openjdk.java.net/~rraghavan/8227439/webrev.02/ -Rahul On 27/07/19 8:00 AM, Igor Ignatyev wrote: > > >> On Jul 26, 2019, at 6:56 PM, Vladimir Kozlov >> > wrote: >> >> On 7/25/19 12:11 AM, Rahul Raghavan wrote: >>> Hi, >>> Thanks Igor, Vladimir for the review comments. >>> Please review following updates. >>> >> - http://cr.openjdk.java.net/~rraghavan/8227439/webrev.01/ >>> > You don't need to add -J-XX:+UnlockExperimentalVMOptions to JAOTC_OPTS >>> > in aot/scripts/ scripts because you added it to jaotc launcher >>> > (Launcher-jdk.aot.gmk). >>> > >>> Okay, will removed -J-XX:+UnlockExperimentalVMOptions added to >>> JAOTC_OPTS in - >>> aot/scripts/ - build-bootmodules.sh, build-jdk.vm-modules.sh and >>> test-jaotc.sh >>> > As UseAOT used to be true by default, >>> > you should add to all places where AOTLibrary is used, >>> > e.g. in make/RunTests.gmk, >>> > and as it seems like an easy error (for our users) to make, >>> > I think we need to check that AOTLibrary has value >>> > only if UseAOT is true >>> > and generate an error at initialization time if it's not a case. >>> > >>> Understood the review points by Igor. >>> Found following current implementation of AOTLoader::initialize() >>> http://hg.openjdk.java.net/jdk/jdk/file/9b6d4e64778c/src/hotspot/share/aot/aotLoader.cpp#l110 >>> void AOTLoader::initialize() { >>> ...... >>> if (FLAG_IS_DEFAULT(UseAOT) && AOTLibrary != NULL) { >>> // Don't need to set UseAOT on command line when AOTLibrary is specified >>> FLAG_SET_DEFAULT(UseAOT, true); >> >> Please keep current implementation. No need to change it. >> Examples in AOT JEP use AOTLibrary flag without UseAOT. Lets keep it >> this way. > > Agree, I somehow overlooked this piece in AOTLoader::initialize. > > -- Igor >> >> Vladimir >> >>> } >>> if (UseAOT) { >>> ....... >>> warning("EagerInitialization is not compatible with AOT (switching >>> AOT off)"); >>> ....... >>> warning("JVMTI capability to post breakpoint is not compatible with >>> AOT (switching AOT off)"); >>> ....... >>> warning("-Xint is not compatible with AOT (switching AOT off)"); >>> // Scan the AOTLibrary option. >>> if (AOTLibrary != NULL) { >>> ....... >>> } >>> // Load well-know AOT libraries from Java installation directory. >>> ....... >>> } >>> } >>> } >>> So current design with my webrev.01 is to make UseAOT automatically >>> true if some AOTLibrary is specified and no explicit -UseAOT. >>> Should we change this to - >>> [src/hotspot/share/aot/aotLoader.cpp] >>> if (FLAG_IS_DEFAULT(UseAOT) && AOTLibrary != NULL) { >>> -??? // Don't need to set UseAOT on command line when AOTLibrary is >>> specified >>> -??? FLAG_SET_DEFAULT(UseAOT, true); >>> +??? if (UseAOT == tue) { >>> +????? // Don't need to set UseAOT on command line when AOTLibrary is >>> specified >>> +????? FLAG_SET_DEFAULT(UseAOT, true); >>> +??? } >>> +??? else { >>> +????? warning("AOTLibrary specified without explicitly switching on >>> UseAOT (ignoring AOTLibrary)"); >>> +??? } >>> OR instead of new warning above, throw error - >>> +??? else { >>> +????? fatal("AOTLibrary specified without explicitly switching on >>> UseAOT"); >>> +????? vm_exit(1); >>> +??? } >>> And then also make sure +UseAOT is added to all places where >>> AOTLibrary is used. >>> Thanks, >>> Rahul >>> On 19/07/19 10:02 PM, Igor Ignatyev wrote: >>>> Hi Rahul, >>>> >>>> thanks for taking care of this and all the tests, appreciate that. >>>> As UseAOT used to be true by default, you should add to all places >>>> where AOTLibrary is used, e.g. in make/RunTests.gmk, and as it seems >>>> like an easy error (for our users) to make, I think we need to check >>>> that AOTLibrary has value only if UseAOT is true and generate an >>>> error at initialization time if it's not a case. this will help us >>>> and all the users to spot places where AOT was expected to kick in. >>>> >>>> (I have not looked at all the changed files yet) >>>> >>>> Thanks, >>>> -- Igor >>>> >>>>> On Jul 19, 2019, at 12:52 AM, Rahul Raghavan >>>>> > >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Please review the following fix changeset and >>>>> related release-note task (8228418). >>>>> >>>>> >>>>> - http://cr.openjdk.java.net/~rraghavan/8227439/webrev.01/ >>>>> >>>>> >>>>> # https://bugs.openjdk.java.net/browse/JDK-8227439 >>>>> (Turn off AOT by default) >>>>> >>>>> # CSR - https://bugs.openjdk.java.net/browse/JDK-8227833 >>>>> >>>>> # RN - https://bugs.openjdk.java.net/browse/JDK-8228418 >>>>> >>>>> >>>>> -- AOT support related flags `UseAOT`, `PrintAOT` and `AOTLibrary` >>>>> are made experimental; >>>>> and `UseAOT` flag is turned off by default. >>>>> Also added required -XX:+UnlockExperimentalVMOptions, related >>>>> changes in tests. >>>>> >>>>> -- Got approval for related CSR - 8227833 >>>>> and created Release-Note task as commented - 8228418. >>>>> >>>>> -- tried tests --job hs-tier4,hs-tier4-graal,hs-tier6,hs-tier6-graal. >>>>> Could not find any issues due to proposed changes. >>>>> >>>>> >>>>> Please let me know if missed any changes or testing. >>>>> >>>>> >>>>> Thanks, >>>>> Rahul > From christian.hagedorn at oracle.com Thu Aug 8 11:38:10 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 8 Aug 2019 13:38:10 +0200 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" Message-ID: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8225670 http://cr.openjdk.java.net/~thartmann/8225670/webrev.00/ As I reproduced the bug with [1], recv still pointed to an InstanceKlass object at some point which was not translated to a ciInstanceKlass and therefore the is_klass() check was false which made the assert fail. The problem can be traced back to the concurrent forced clearing of method data [6] using the whitebox API while compilation uses this profile data to create a ciProfileData wrapper. The profile data, lets say 'pd', is first completely copied [2] and then translated [3] to a ci version 'cipd'. Therefore, 'cipd' initially only contains InstanceKlass entries. During this translation, we read instanceKlasses from 'pd' [4] and translate them into ciInstanceKlasses to store them in 'cipd'. However, while translating, [6] can delete a non-NULL InstanceKlass entry in 'pd' by setting it to NULL. As a result, the non-NULL check [5] fails and nothing is updated in 'cipd'. 'cipd' still contains a non-translated non-NULL InstanceKlass entry from 'pd' which later triggers the assertion failure. The fix is straight forward to also clear an entry in the ciProfileData object if a klass is NULL in the ProfileData object. One question remains that I could not figure out yet: Can the method profile data be cleared while a method is compiled or is this only a problem specific to this test using a forced clear through the whitebox API? Thanks! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java [2] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l219 [3] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l229 [4] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l260 [5] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l261 [6] http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java#l172 From erik.osterlund at oracle.com Fri Aug 9 09:21:17 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 9 Aug 2019 11:21:17 +0200 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> References: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> Message-ID: <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> Hi Christian, Looks good - well spotted. To answer your question - yes the GC (ZGC in particular, and probably soon Shenandoah when they hook in to the concurrent class unloading framework) cleans the extra data section of MDOs concurrently, under the extra data lock. However, they clear whole rows from the extra data section, under the extra data lock of the MDOs; they never write that the Klass is NULL. So I believe this bug only relates to the use of the WhiteBox API. The row clearing of concurrent GCs synchronizes with a metadata preparation phase for unpacking MDOs to ciMDOs. The preparation phase will in a fixed-point iteration try to create ci handles for all encountered metadata under the extradata lock. Every time it encounters an uncached metadata instance, it has to release the lock due to ranking issues, and may also run into safepoints then. Such situations are detected, triggering a restart of the fixed-pont iteration. Once the fixed-point iteration has finished, we know that we under the lock walked all metadata in the extra data section without ever releasing the lock, have ci handles keeping all metadata alive, and can't have gotten any safepoints due to being in VM state. After that, the rows are copied and translated, and now we are guaranteed that the translation will always already have the ci handles cached. There is some random original copy of the raw MDO extra data that is performed before preparing the metadata. I don't think it is really used or needed. Might be interesting to remove in a future RFE. It gets overwritten by the subsequent row-by-row processing after metadata preparation. Thanks, /Erik On 2019-08-08 13:38, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8225670 > http://cr.openjdk.java.net/~thartmann/8225670/webrev.00/ > > As I reproduced the bug with [1], recv still pointed to an InstanceKlass > object at some point which was not translated to a ciInstanceKlass and > therefore the is_klass() check was false which made the assert fail. The > problem can be traced back to the concurrent forced clearing of method > data [6] using the whitebox API while compilation uses this profile data > to create a ciProfileData wrapper. The profile data, lets say 'pd', is > first completely copied [2] and then translated [3] to a ci version > 'cipd'. Therefore, 'cipd' initially only contains InstanceKlass entries. > During this translation, we read instanceKlasses from 'pd' [4] and > translate them into ciInstanceKlasses to store them in 'cipd'. However, > while translating, [6] can delete a non-NULL InstanceKlass entry in 'pd' > by setting it to NULL. As a result, the non-NULL check [5] fails and > nothing is updated in 'cipd'. 'cipd' still contains a non-translated > non-NULL InstanceKlass entry from 'pd' which later triggers the > assertion failure. > > The fix is straight forward to also clear an entry in the ciProfileData > object if a klass is NULL in the ProfileData object. One question > remains that I could not figure out yet: Can the method profile data be > cleared while a method is compiled or is this only a problem specific to > this test using a forced clear through the whitebox API? > > > Thanks! > > Best regards, > Christian > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java > > [2] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l219 > > [3] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l229 > > [4] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l260 > > [5] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/src/hotspot/share/ci/ciMethodData.cpp#l261 > > [6] > http://hg.openjdk.java.net/jdk/jdk/file/41f2f2829a09/test/hotspot/jtreg/compiler/types/correctness/CorrectnessTest.java#l172 > From martin.doerr at sap.com Fri Aug 9 09:57:56 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 9 Aug 2019 09:57:56 +0000 Subject: RFR(XS): 8229236: CriticalJNINatives: dll handling should be done in native thread state In-Reply-To: <237b0665-0896-2ac5-3ac3-151fa7f6849f@oracle.com> References: <423c927d-f894-8c15-5000-90c204149b99@oracle.com> <14fb7a79-5a80-d0b7-4261-4a03203403f7@oracle.com> <237b0665-0896-2ac5-3ac3-151fa7f6849f@oracle.com> Message-ID: Hi David, thank you for reviewing. > Okay. I'm not certain if this is really runtime code or compiler code, > but it seems okay to me, so if Dean is okay with it then that's fine. I would consider the lookup part to belong to runtime and the wrapper code generation to belong to compiler. I've CC'ed hotspot-compiler-dev, but I think your reviews are sufficient. > Aside: I'm wondering why ARM does not use this critical lookup > functionality? I was wondering, too. It's an optional functionality. I believe it's rarely used. My answer to "Potentially unnecessarily" was a little short in my previous email. With the latest webrev, the lookup is still unnecessarily done when somebody beat us after acquiring the lock. But that's an unavoidable consequence of the decision to perform the lookup outside of the lock. And it's unnecessarily done when the code cache is full and we can't allocate a BufferBlob. I think these cases are ok. Best regards, Martin From tobias.hartmann at oracle.com Fri Aug 9 10:52:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 9 Aug 2019 12:52:19 +0200 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> References: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> Message-ID: <1211175c-022a-dd34-015c-fd1e783c590c@oracle.com> On 09.08.19 11:21, Erik ?sterlund wrote: > Looks good - well spotted. +1 > There is some random original copy of the raw MDO extra data that is performed before preparing the > metadata. I don't think it is really used or needed. Might be interesting to remove in a future RFE. > It gets overwritten by the subsequent row-by-row processing after metadata preparation. Yes, Christian please file an RFE for that. Best regards, Tobias From christian.hagedorn at oracle.com Fri Aug 9 11:55:57 2019 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 9 Aug 2019 13:55:57 +0200 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> References: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> Message-ID: <47a506e5-f5c6-40ce-4c89-2a06853eff71@oracle.com> Hi Erik, hi Tobias Thank you for your reviews! On 09.08.19 11:21, Erik ?sterlund wrote: > Hi Christian, > > Looks good - well spotted. > > To answer your question - yes the GC (ZGC in particular, and probably > soon Shenandoah when they hook in to the concurrent class unloading > framework) cleans the extra data section of MDOs concurrently, under the > extra data lock. > > However, they clear whole rows from the extra data section, under the > extra data lock of the MDOs; they never write that the Klass is NULL. So > I believe this bug only relates to the use of the WhiteBox API. That was my guess, too. Thanks for clearing that up and answering the question. > The row clearing of concurrent GCs synchronizes with a metadata > preparation phase for unpacking MDOs to ciMDOs. The preparation phase > will in a fixed-point iteration try to create ci handles for all > encountered metadata under the extradata lock. Every time it encounters > an uncached metadata instance, it has to release the lock due to ranking > issues, and may also run into safepoints then. Such situations are > detected, triggering a restart of the fixed-pont iteration. > > Once the fixed-point iteration has finished, we know that we under the > lock walked all metadata in the extra data section without ever > releasing the lock, have ci handles keeping all metadata alive, and > can't have gotten any safepoints due to being in VM state. After that, > the rows are copied and translated, and now we are guaranteed that the > translation will always already have the ci handles cached. Thanks for the detailed explanation! > There is some random original copy of the raw MDO extra data that is > performed before preparing the metadata. I don't think it is really used > or needed. Might be interesting to remove in a future RFE. It gets > overwritten by the subsequent row-by-row processing after metadata > preparation. I created a new RFE [1] and referenced this conversation. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8229353 From dean.long at oracle.com Fri Aug 9 22:37:03 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 9 Aug 2019 15:37:03 -0700 Subject: RFR: 8195600: [Graal] jdi tests timeouts with Graal because debuggee vm is not resumed In-Reply-To: <95eb05da-cf13-f14b-74a6-9e8bf604b29b@oracle.com> References: <855244c9-014c-59d1-cda0-b5f38057f588@oracle.com> <8df66d4e-df9d-c502-3510-30c22ba58445@oracle.com> <15699850-37f5-93f6-6a55-525a4d099bd3@oracle.com> <95eb05da-cf13-f14b-74a6-9e8bf604b29b@oracle.com> Message-ID: <3c292c47-2962-d79b-e99d-7c3b7376ac5e@oracle.com> Good question? When we have libgraal, there will still be an option (at least for debugging) to turn it off and use Graal the same way we do now, so it seems like the @requires would need to take that into account once we have libgraal.? Maybe we will need a new "vm.libgraal.enabled" or make "vm.graal.enabled" be false for libgraal? It does seem a little backwards to require tests to know about the OOM handling details of different JVM features.? Instead, how about if we let the test assert that it requires "vm.no-background-oom" or whatever, and let the JVM decide if it supports it. CC'ing hotspot-compiler-dev. dl On 8/8/19 7:42 PM, Chris Plummer wrote: > Actually looking at JDK-8207267 a little closer, it looks like it's > job is to re-enable tests that have been disabled with @requires > !vm.graal.enabled, so it looks like we have two different approaches > going in here. Which is preferred? If the preference is to problem > list, do we want to undo JDK-8207261 (except use JDK-8196611 as the CR). > > Chris > > On 8/8/19 5:08 PM, Chris Plummer wrote: >> That sounds like a better approach to me. >> >> thanks, >> >> Chris >> >> On 8/8/19 4:33 PM, dean.long at oracle.com wrote: >>> This is the kind of failure that is expected to go away with >>> libgraal. You can add the tests to the Graal-specific problem list >>> (see JDK-8196611) and they should be re-enabled with libgraal (see >>> JDK-JDK-8207267). >>> >>> dl >>> >>> On 8/8/19 10:21 AM, Chris Plummer wrote: >>>> Hi Daniil, >>>> >>>> My only objection is at some point it seems we need to be able to >>>> run these tests with graal (and other tests that have been disabled >>>> due to graal) because graal might be the only compiler, and we'll >>>> lose test coverage without these tests. Currently we have 260 jtreg >>>> tests disabled due to graal. I'm not sure to what extent they are >>>> waiting on graal fixes or otherwise have a bug filed to eventually >>>> fix them. Would be nice if we had a process in place to make sure >>>> these issues are eventually addressed. That fact that tests that >>>> exhaust memory in general seem to be incompatible with graal would >>>> to be the bigger issue that needs to be addressed. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 8/7/19 3:38 PM, Daniil Titov wrote: >>>>> Please review the change that fixes the failing tests when running >>>>> with Graal. The issue originally >>>>> included several vmTestbase/nsk/jdi tests but only 2 of them still >>>>> fail: >>>>> - >>>>> vmTestbase/nsk/jdi/VirtualMachine/instanceCounts/instancecounts003/instancecounts003.java >>>>> - >>>>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects002/referringObjects002.java >>>>> >>>>> The problem with these two tests is that they consume all memory >>>>> to force the class unloading that >>>>> results in the exception during JVMCI compiler initialization and >>>>> the test failure. >>>>> ? The fix filters these tests out to not run with Graal compiler. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~dtitov/8195600/webrev.01/ >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8195600 >>>>> >>>>> Thanks, >>>>> Daniil >>>>> >>>>> >>>> >>> >> >> > > From gromero at linux.vnet.ibm.com Sun Aug 11 23:34:35 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Sun, 11 Aug 2019 20:34:35 -0300 Subject: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: PPC64: Support AES intrinsics on Big Endian In-Reply-To: References: <2d8d8d35-c781-cc0c-2673-c8f7eea057bd@redhat.com> Message-ID: <163f9fd2-95ed-512f-c67e-55de4c4f51d1@linux.vnet.ibm.com> Hi, Thanks Ogata for adjusting the change and evaluating performance on 8u + Power BE. Thanks Andrew for reviewing and approving it. Pushed to jdk8u-dev: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/42118db355f5 Best regards, Gustavo On 07/31/2019 11:55 PM, Kazunori Ogata wrote: > Hi Andrew, > > Thank you for reviewing the webrev. > > Regards, > Ogata > > Andrew John Hughes wrote on 2019/08/01 01:05:48: > >> From: Andrew John Hughes >> To: Kazunori Ogata , hotspot-compiler- >> dev at openjdk.java.net, jdk8u-dev at openjdk.java.net >> Date: 2019/08/01 01:13 >> Subject: [EXTERNAL] Re: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) >> backport of 8188868: PPC64: Support AES intrinsics on Big Endian >> >> >> >> On 31/07/2019 10:30, Kazunori Ogata wrote: >>> Ping. >>> >>> May I get review for the almost clean backport? >>> >>> Regards, >>> Ogata >>> >>> Kazunori Ogata/Japan/IBM wrote on 2019/07/24 17:48:23: >>> >>>> From: Kazunori Ogata/Japan/IBM >>>> To: hotspot-compiler-dev at openjdk.java.net, jdk8u-dev at openjdk.java.net >>>> Date: 2019/07/24 17:48 >>>> Subject: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: >>> PPC64: >>>> Support AES intrinsics on Big Endian >>>> >>>> Hi, >>>> >>>> May I get review for backport of 8188868: PPC64: Support AES > intrinsics >>> on >>>> Big Endian? >>>> >>>> The original patch itself can be applied cleanly (besides difference > of >>>> the source directory structure). However, one chunk failed because > the >>>> code just after the patched code was modified, so I manually applied > the >>> >>>> chunk and renewed the patch. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8188868 >>>> Webrev: >>> http://cr.openjdk.java.net/~ogatak/jdk8u_aes_be/8188868/webrev.02/ >>>> >>>> This backport is low risk and affects only PPC64 only. I verified > there >>>> was no degradation in "make test" results and SPECjbb 2015 ran fine. > The >>> >>>> intrinsics added in this changeset improved max jOPS by 5% and > critical >>> jOPS by 4%. >>>> >>>> Regards, >>>> Ogata >>> >> >> Sorry, I started looking at this yesterday, but didn't get chance to > finish. >> >> It looks fine to me. The stubGenerator_ppc.cpp changes were a little >> hard to follow, but comparing the patched version with the 11u version >> looked ok. >> >> Good to go. >> -- >> Andrew :) >> >> Senior Free Java Software Engineer >> Red Hat, Inc. (http://www.redhat.com) >> >> PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) >> Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 >> https://keybase.io/gnu_andrew >> >> [attachment "signature.asc" deleted by Kazunori Ogata/Japan/IBM] > From fujie at loongson.cn Mon Aug 12 09:26:37 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 12 Aug 2019 17:26:37 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> Message-ID: <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> Hi Vladimir and all, Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.02/ *Analysis* The performance drop is caused by the over loop unrolling of SuperWordLoopUnrollAnalysis, which do not consider the negative effect of pre/post-loop at all. The following is the perf stat data for different loop unrolling factors. Please note that the number of branches increased by ~47% (from 19,854,995,391 to 29,229,714,991) when the unroll-factor increased from 8 to 16. And the total instructions increased by ~58% (from 108,849,151,185 to 171,334,733,346), which was even worse. perf stat for unroll-factor=8: ---------------------------------------------------------------------------- ?????? 5429.117030????? task-clock (msec)???????? #??? 1.006 CPUs utilized ?????????????? 620????? context-switches????????? #??? 0.114 K/sec ??????????????? 11????? cpu-migrations??????????? #??? 0.002 K/sec ??????????? 41,905????? page-faults?????????????? #??? 0.008 M/sec ??? 24,176,919,686????? cycles??????????????????? #??? 4.453 GHz ?? 108,849,151,185????? instructions????????????? #??? 4.50? insn per cycle ??? 19,854,995,391????? branches????????????????? # 3657.132 M/sec ??????? 17,788,819????? branch-misses???????????? #??? 0.09% of all branches ?????? 5.396099347 seconds time elapsed ---------------------------------------------------------------------------- perf stat for unroll-factor=16: ---------------------------------------------------------------------------- ?????? 9158.323771????? task-clock (msec)???????? #??? 1.005 CPUs utilized ?????????????? 763????? context-switches????????? #??? 0.083 K/sec ??????????????? 16????? cpu-migrations??????????? #??? 0.002 K/sec ??????????? 41,884????? page-faults?????????????? #??? 0.005 M/sec ??? 40,831,102,837????? cycles??????????????????? #??? 4.458 GHz ?? 171,334,733,346????? instructions????????????? #??? 4.20? insn per cycle ??? 29,229,714,991????? branches????????????????? # 3191.601 M/sec ??????? 16,455,010????? branch-misses???????????? #??? 0.06% of all branches ?????? 9.115309970 seconds time elapsed ---------------------------------------------------------------------------- The increment of branches and total instructions was mainly introduced by the pre- and post-loop. 1) Higher unroll-factor may lead to more iterations in the pre-loop due to the alignment requirement in the main-loop. ?? For example, with unroll-factor=16, 16 iterations may be required in the pre-loop since 16-byte vector instructions were used in the main-loop. ?? However, no more than 8 iterations when unroll-factor=8. 2) Higher unroll-factor may lead to more iterations in the post-loop since the range of it is [0, unroll-factor - 1). As for the particular case, the distribution of iterations for the pre-/main-/post- loops seem to be: ----------------------------------------------------------------------- ????????? | pre-lp iters | main-lp iters | post-lp iters | total iters ----------------------------------------------------------------------- unroll(8) |????? 8?????? |?????? 6?????? |??????? 5????? |???? 19 ----------------------------------------------------------------------- unroll(16)|???? 16?????? |?????? 2?????? |?????? 13????? |???? 31 ----------------------------------------------------------------------- So it's harmful to unroll with 16. *Fix* The loop body size seems unable to detect this case. When the VM tries to decide whether to unroll 16, the loop body size is just 64, which seems quite reasonable for SuperWordLoopUnrollAnalysis. And the generated loop body in assembly is small enough with unroll 16. ---------------------------------------------------------------------------- ?;; B18: #????? out( B18 B19 ) <- in( B17 B18 ) Loop( B18-B18 inner main of N59) Freq: 891835 ? 0x00007f62187bfba0:?? movslq %r8d,%r11 ? 0x00007f62187bfba3:?? vmovdqu 0x10(%r9,%r11,1),%xmm0 ? 0x00007f62187bfbaa:?? vmovdqu %xmm0,0x10(%rsi,%r11,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 56 (line 21) ? 0x00007f62187bfbb1:?? add??? $0x10,%r8d?????????????????? ;*iinc {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 57 (line 20) ? 0x00007f62187bfbb5:?? cmp??? $0x2f,%r8d ? 0x00007f62187bfbb9:?? jl???? 0x00007f62187bfba0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 38 (line 20) ---------------------------------------------------------------------------- To fix it, the possible negative effect of pre-/post-loop should be considered. The unrolling may increase the performance if the total iterations of pre/main/post loops could be decreased. However, the precise number of iterations in the pre-/post-loop is really hard to predict since it depends on many factors such as align requirement, number of data&type, object layout, and allocated addresses. To simplify the problem, the number of the pre&post-loop iterations is just assumed to be the same with the unroll-factor. A heuristic is introduced to protect against over-unrolling with SuperWordLoopUnrollAnalysis: ? - Let's assume the unroll-factor is x and the main-loop iteration is y in the previous unrolling round, then the total iterations of pre/main/post loops is y + x. ? - In the next round, the unroll-factor becomes 2x and the main-loop iteration is y/2, then the total iterations of pre/main/post loops is y/2 + 2x. ? - We'd better not unroll if: y/2 + 2x? > y + x, that is 2x > y ---------------------------------------------------------------------------- ????????? | unroll-factor | main-lp iters | pre&post-lp iters | total iters ---------------------------------------------------------------------------- pre-round |?????? x?????? |?????? y?????? |??????? x????????? | y + x ---------------------------------------------------------------------------- next-round|????? 2x?????? |????? y/2????? |?????? 2x????????? | y/2 + 2x ---------------------------------------------------------------------------- *Testing* No performance regression in SPECjvm2008 Any comments? Thanks a lot. Best regards, Jie On 2019/8/7 ??5:58, Vladimir Kozlov wrote: > Hi Jie > > Very interesting observation. I am concern that webrev.01 does check > for general loop which may not be vectorized. Even if your > optimization helps in particular case it may make some loop regress > due to executing more branches. > > On 7/11/19 1:20 AM, Jie Fu wrote: >> Hi all, >> >> With more experiments, the loop's trip_count seems a good feature to >> detect over loop unrolling. >> And on some platforms, the branch-miss rate had been observed >> increasing dramatically with small loop trip count. > > Why? With more unrolling you should have less number of branches. > >> It seems that we shouldn't unroll if the trip count becomes too small. > > May be there is different explanation for this. May be big loop body > does not fit into code buffer in X86 cpu - or something like that. End > we should watch for body size instead. > > Thanks, > Vladimir > >> >> I've updated the webrev here: >> http://cr.openjdk.java.net/~jiefu/8227505/webrev.01/ >> >> Please review it and give me some advice. >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2019/7/10 ??4:38, Jie Fu wrote: >>> Hi all, >>> >>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8227505 >>> Webrev: http://cr.openjdk.java.net/~jiefu/8227505/webrev.00/ >>> >>> The patch fix the over loop unrolling problem caused by >>> SuperWordLoopUnrollAnalysis. >>> For more info., please refer to the JBS. >>> >>> Could you please review it and give me some advice? >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> >>> >> From adinn at redhat.com Mon Aug 12 10:30:17 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 12 Aug 2019 11:30:17 +0100 Subject: Weird behaviour with tests for: JDK-8213134 AArch64: vector shift failed with MaxVectorSize=8 Message-ID: I am currently trying to test a backport of the above patch for JDK11 on AArch64 and I noticed that tests Test{Short/Int/...}Vect etc all run much slower than TestByteVect. Waaaay slower. The difference is roughly 10-20 seconds vs 10-20 minutes. I also noticed when I ran 'top' in thread view that the C1 Compiler thread runs flat out while the Main thread eats only a few percent of CPU. This is weird because all the tests have essentially the same code structure: Call a whole slew of sub-test methods (there really are a lot of them) in a loop to ensure they are C2 compiled. Then call each sub-test method and loop over the resultant array verifying that each entry is as expected. I reran with -XX:+PrintCompilation and observed a significant difference in the compiler behaviour. In the Byte test method TestByteVect::test and the many sub-test methods it calls in its warmup loop get compiled at level 3 and then level 4. They *stay* that way as the main routine goes on to call each sub-test method in the block that follows the loop. By contrast: in the Short test method TestShortVect::test still gets compiled at level 3 then level 4 in the warmup loop as does each sub-test method called in the loop but then ... Between successive sub-test calls in the following block the level 4 and level 3 versions of TestShortVect::test are repeatedly made not entrant. The method is deoptimized and then re-optimized first to level 3 and then to level 4 (you can tell because after each call a message is printed). Finally, towards the end of the Short Test test there is a whole series of 'made zombie' notices for the many different level 3 and level 4 versions of TestShortVect::test as well as a pair of 'made zombie' notices for the level 3 versions of each of the sub-test methods. So, it seemed that this weird oscillating behaviour constitutes repeated OSR compilation then deopt of the main test method. It was not clear to me why the de-opt is happening (also why it is not happening in the Byte case). The way the top level test routine is routine seems to pose a severe challenge to some of the assumptions the OSR compiler is making. Anyway, it is bizarre as well as extremely inconvenient that each test takes such a very, very long time to run. Especially inconvenient given that each test has to be run for each of the 4 available MaxVectorSize settings. In order to verify that it is the top level test method causing the problem I reran the TestShortVect test with a compiler restriction as follows -XX:CompileCommand=compileonly,TestShortVect::test_* i.e. only compile the sub-test methods called from the main loop (which all start with prefix test_). As expected, the run time came back down into the expected 10-20 second range. Clearly, the fact that these tests constitute a basket case for the compiler needs further investigation. I will look into that -- unless anyone already knows why this will be happening. Meanwhile, there may well be some easy way to avoid the compiler issue by refactoring the code (e.g. installing the loops in a separate verify routine). If not then a CompileCommand added to the @run arguments for the test would make the tests much more useful (well, it would make them useful). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Mon Aug 12 15:23:35 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 12 Aug 2019 16:23:35 +0100 Subject: Weird behaviour with tests for: JDK-8213134 AArch64: vector shift failed with MaxVectorSize=8 In-Reply-To: References: Message-ID: <683b1257-65c9-5d2d-57ab-9073149e6018@redhat.com> On 12/08/2019 11:30, Andrew Dinn wrote: > I am currently trying to test a backport of the above patch for JDK11 on > AArch64 and I noticed that tests Test{Short/Int/...}Vect etc all run > much slower than TestByteVect. Waaaay slower. The difference is roughly > 10-20 seconds vs 10-20 minutes. I forgot to mention the figures cited above were obtained running with a fastdebug build. The problem is less severe with a release build although there is still quite a noticeable slowdown (from 3 seconds to 33 seconds). It turns out that the problem when executing the TestShortVect relates to the sequence of loops at the end of the top level method. Each of them iterates over a call to a different sub-test method (there are 72 of these methods!). When each loop is entered the top-level method first gets C1 OSR compiled and then C2 OSR compiled. The C2 compile includes only the code for the loop. After the loop completes it terminates very quickly with an unconditional uncommon trap that reverts to interpreted. The C1 code runs particularly slowly as it includes lots of profiling (and for non-product code also includes various debug checks). The C1 compile also takes a long time as it compiles the whole method every time rather than just the loop code. So, the slow down seems to result from a combination of taking a long time to deliver not very well optimized C1 code to replace interpreted execution and very little gain when the C1 code finally gets run because of the associated profiling (and debug verify) costs with the assumed gain from doing that compilation foiled by an almost immediate reversion to interpreted once the C2 code is delivered. Anyway, the important thing is that it doesn't appear to be a problem with the patch side-effecting the test which was what I really had to check. The puzzling this is why this same problem does not cause a slow down for TestByteVect? There are a similar 72 loops at the end of the top level method but they don't lead to OSR compiles of the top level method. Given that the code is pretty much identical except for using byte[] in place of short[] that's still something of a mystery. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From OGATAK at jp.ibm.com Mon Aug 12 17:45:28 2019 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Tue, 13 Aug 2019 02:45:28 +0900 Subject: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: PPC64: Support AES intrinsics on Big Endian In-Reply-To: <163f9fd2-95ed-512f-c67e-55de4c4f51d1@linux.vnet.ibm.com> References: <2d8d8d35-c781-cc0c-2673-c8f7eea057bd@redhat.com> <163f9fd2-95ed-512f-c67e-55de4c4f51d1@linux.vnet.ibm.com> Message-ID: Hi Gustavo, Thank you for pushing the code. Regards, Ogata "Gustavo Romero" wrote on 2019/08/12 08:34:35: > From: "Gustavo Romero" > To: Kazunori Ogata/Japan/IBM at IBMJP, "Andrew John Hughes" > Cc: hotspot-compiler-dev at openjdk.java.net, jdk8u-dev at openjdk.java.net > Date: 2019/08/12 08:34 > Subject: Re: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) backport of > 8188868: PPC64: Support AES intrinsics on Big Endian > > Hi, > > Thanks Ogata for adjusting the change and evaluating performance on 8u + Power BE. > > Thanks Andrew for reviewing and approving it. > > Pushed to jdk8u-dev: > http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/42118db355f5 > > Best regards, > Gustavo > > On 07/31/2019 11:55 PM, Kazunori Ogata wrote: > > Hi Andrew, > > > > Thank you for reviewing the webrev. > > > > Regards, > > Ogata > > > > Andrew John Hughes wrote on 2019/08/01 01:05:48: > > > >> From: Andrew John Hughes > >> To: Kazunori Ogata , hotspot-compiler- > >> dev at openjdk.java.net, jdk8u-dev at openjdk.java.net > >> Date: 2019/08/01 01:13 > >> Subject: [EXTERNAL] Re: [Ping] Re: [8u-dev, ppc] RFR for (almost clean) > >> backport of 8188868: PPC64: Support AES intrinsics on Big Endian > >> > >> > >> > >> On 31/07/2019 10:30, Kazunori Ogata wrote: > >>> Ping. > >>> > >>> May I get review for the almost clean backport? > >>> > >>> Regards, > >>> Ogata > >>> > >>> Kazunori Ogata/Japan/IBM wrote on 2019/07/24 17:48:23: > >>> > >>>> From: Kazunori Ogata/Japan/IBM > >>>> To: hotspot-compiler-dev at openjdk.java.net, jdk8u-dev at openjdk.java.net > >>>> Date: 2019/07/24 17:48 > >>>> Subject: [8u-dev, ppc] RFR for (almost clean) backport of 8188868: > >>> PPC64: > >>>> Support AES intrinsics on Big Endian > >>>> > >>>> Hi, > >>>> > >>>> May I get review for backport of 8188868: PPC64: Support AES > > intrinsics > >>> on > >>>> Big Endian? > >>>> > >>>> The original patch itself can be applied cleanly (besides difference > > of > >>>> the source directory structure). However, one chunk failed because > > the > >>>> code just after the patched code was modified, so I manually applied > > the > >>> > >>>> chunk and renewed the patch. > >>>> > >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8188868 > >>>> Webrev: > >>> http://cr.openjdk.java.net/~ogatak/jdk8u_aes_be/8188868/webrev.02/ > >>>> > >>>> This backport is low risk and affects only PPC64 only. I verified > > there > >>>> was no degradation in "make test" results and SPECjbb 2015 ran fine. > > The > >>> > >>>> intrinsics added in this changeset improved max jOPS by 5% and > > critical > >>> jOPS by 4%. > >>>> > >>>> Regards, > >>>> Ogata > >>> > >> > >> Sorry, I started looking at this yesterday, but didn't get chance to > > finish. > >> > >> It looks fine to me. The stubGenerator_ppc.cpp changes were a little > >> hard to follow, but comparing the patched version with the 11u version > >> looked ok. > >> > >> Good to go. > >> -- > >> Andrew :) > >> > >> Senior Free Java Software Engineer > >> Red Hat, Inc. (http://www.redhat.com) > >> > >> PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) > >> Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 > >> https://keybase.io/gnu_andrew > >> > >> [attachment "signature.asc" deleted by Kazunori Ogata/Japan/IBM] > > From tobias.hartmann at oracle.com Tue Aug 13 06:17:25 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 13 Aug 2019 08:17:25 +0200 Subject: [14] RFR(T): 8229447: Problem list compiler/unsafe/UnsafeGetConstantField.java on Sparc until JDK-8229446 is fixed Message-ID: <63bafc57-a719-50e0-c880-15c62581f01f@oracle.com> Hi, please review the following patch that problem lists UnsafeGetConstantField.java on Sparc until JDK-8229446 is fixed: https://bugs.openjdk.java.net/browse/JDK-8229447 http://cr.openjdk.java.net/~thartmann/8229447/webrev.00/ Thanks, Tobias From david.holmes at oracle.com Tue Aug 13 06:20:01 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 13 Aug 2019 16:20:01 +1000 Subject: [14] RFR(T): 8229447: Problem list compiler/unsafe/UnsafeGetConstantField.java on Sparc until JDK-8229446 is fixed In-Reply-To: <63bafc57-a719-50e0-c880-15c62581f01f@oracle.com> References: <63bafc57-a719-50e0-c880-15c62581f01f@oracle.com> Message-ID: <632a27f5-e3b7-9f19-586d-935b095a930f@oracle.com> > Hi, > > please review the following patch that problem lists UnsafeGetConstantField.java on Sparc until > JDK-8229446 is fixed: > https://bugs.openjdk.java.net/browse/JDK-8229447 > http://cr.openjdk.java.net/~thartmann/8229447/webrev.00/ Looks good. Thanks for dealing with this. David > Thanks, > Tobias From tobias.hartmann at oracle.com Tue Aug 13 06:24:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 13 Aug 2019 08:24:19 +0200 Subject: [14] RFR(T): 8229447: Problem list compiler/unsafe/UnsafeGetConstantField.java on Sparc until JDK-8229446 is fixed In-Reply-To: <632a27f5-e3b7-9f19-586d-935b095a930f@oracle.com> References: <63bafc57-a719-50e0-c880-15c62581f01f@oracle.com> <632a27f5-e3b7-9f19-586d-935b095a930f@oracle.com> Message-ID: <3a65eff0-6af3-e139-7331-8cdf2dd04fd0@oracle.com> Thanks for the quick review! Best regards, Tobias On 13.08.19 08:20, David Holmes wrote: >> Hi, >> >> please review the following patch that problem lists UnsafeGetConstantField.java on Sparc until >> JDK-8229446 is fixed: >> https://bugs.openjdk.java.net/browse/JDK-8229447 >> http://cr.openjdk.java.net/~thartmann/8229447/webrev.00/ > > Looks good. > > Thanks for dealing with this. > > David > >> Thanks, >> Tobias > From nils.eliasson at oracle.com Tue Aug 13 07:26:25 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 13 Aug 2019 09:26:25 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> Message-ID: ok, lets follow up on that in another bug. Consider this one reviewed! Regards, Nils On 2019-08-05 15:45, Tobias Hartmann wrote: > On 05.08.19 16:51, Nils Eliasson wrote: >> ok, but then the store and the load doesn't alias, and no anti-dependence edge is actually needed. > It's a membar and they are handled as being anti-dependent on everything. See this comment: > http://hg.openjdk.java.net/jdk/jdk/file/90dcbeb8455e/src/hotspot/share/opto/gcm.cpp#l635 > >> Perhaps that info isn't handled correctly by the anti-dep-checker. That is something we need to >> revisit for 14. > Are you okay with filing a JDK 14 RFE for that? > > Thanks, > Tobias From tobias.hartmann at oracle.com Tue Aug 13 07:50:27 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 13 Aug 2019 09:50:27 +0200 Subject: [14] RFR(S): 8228772: C2 compilation fails due to unschedulable graph if DominatorSearchLimit is reached In-Reply-To: References: <2db8d336-cefd-ab1a-6283-4496b3607700@oracle.com> <64153d28-13b0-818e-599e-30ebfee07728@oracle.com> <01518816-a632-48d0-b375-c2d4aa7a8034@oracle.com> Message-ID: <7706fec0-1fe7-3127-ba63-daefe7fd8279@oracle.com> Thanks Nils, I've filed JDK-8229449 [1]. Best regards, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8229449 On 13.08.19 09:26, Nils Eliasson wrote: > ok, lets follow up on that in another bug. > > Consider this one reviewed! > > Regards, > > Nils > > On 2019-08-05 15:45, Tobias Hartmann wrote: >> On 05.08.19 16:51, Nils Eliasson wrote: >>> ok, but then the store and the load doesn't alias, and no anti-dependence edge is actually needed. >> It's a membar and they are handled as being anti-dependent on everything. See this comment: >> http://hg.openjdk.java.net/jdk/jdk/file/90dcbeb8455e/src/hotspot/share/opto/gcm.cpp#l635 >> >>> Perhaps that info isn't handled correctly by the anti-dep-checker. That is something we need to >>> revisit for 14. >> Are you okay with filing a JDK 14 RFE for that? >> >> Thanks, >> Tobias From vladimir.x.ivanov at oracle.com Tue Aug 13 15:40:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 13 Aug 2019 18:40:31 +0300 Subject: [14] RFR(XS): 8227236: assert(singleton != __null && singleton != declared_interface) failed Message-ID: <4a7f557d-e212-f2ab-e580-b9aef0b291a6@oracle.com> http://cr.openjdk.java.net/~vlivanov/8227236/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8227236 There's a race between ciInstanceKlass::nof_implementors()/implementor() and class loading for shared classes which may manifest as inconsistency between consecutive calls to nof_implementors() and implementor() (ciInstanceKlass::implementor() doesn't cache InstanceKlass::implementor() for shared classes [1]). That's what triggers the assert: the check sees declared_interface->nof_implementors() == 1, but concurrent class loading introduces new implementor class before the assert. Assert hits singleton == declared_interface case since declared_interface->implementor() == declared_interface when declared_interface->nof_implementors() > 1 [2]. The race has been there long before JDK-6986483 [3] (same sequence of nof_implementors()/implementor() calls and the assert in C1 code), but it seems recent JDK changes made it more likely to occur. Proposed fix is to check for unique implementor using a single implementor() call. If there's any concurrent class loading happening which introduces more implementors, corresponding nmethod dependency will invalidate the nmethod during installation attempt. Testing: hs-precheckin-comp, tier1, tier2 Thanks! Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.cpp#l617 [2] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.hpp#l72 [3] https://jbs.oracle.com/browse/JDK-6986483 From dean.long at oracle.com Tue Aug 13 22:36:52 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 13 Aug 2019 15:36:52 -0700 Subject: [14] RFR(XS): 8227236: assert(singleton != __null && singleton != declared_interface) failed In-Reply-To: <4a7f557d-e212-f2ab-e580-b9aef0b291a6@oracle.com> References: <4a7f557d-e212-f2ab-e580-b9aef0b291a6@oracle.com> Message-ID: <9983f77d-c14d-ddc6-aed2-d9ac36c2a447@oracle.com> Looks OK, but what's the harm in memoizing for shared classes? dl On 8/13/19 8:40 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8227236/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8227236 > > There's a race between ciInstanceKlass::nof_implementors()/implementor() > and class loading for shared classes which may manifest as > inconsistency between consecutive calls to nof_implementors() and > implementor() (ciInstanceKlass::implementor() doesn't cache > InstanceKlass::implementor() for shared classes [1]). That's what > triggers the assert: the check sees > declared_interface->nof_implementors() == 1, > but concurrent class loading introduces new implementor class before the > assert. Assert hits singleton == declared_interface case since > declared_interface->implementor() == declared_interface when > declared_interface->nof_implementors() > 1 [2]. > > The race has been there long before JDK-6986483 [3] (same sequence of > nof_implementors()/implementor() calls and the assert in C1 code), but > it seems recent JDK changes made it more likely to occur. > > Proposed fix is to check for unique implementor using a single > implementor() call. If there's any concurrent class loading happening > which introduces more implementors, corresponding nmethod dependency > will invalidate the nmethod during installation attempt. > > Testing: hs-precheckin-comp, tier1, tier2 > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.cpp#l617 > > [2] > http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.hpp#l72 > > [3] https://jbs.oracle.com/browse/JDK-6986483 From igor.veresov at oracle.com Wed Aug 14 20:49:09 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 14 Aug 2019 13:49:09 -0700 Subject: [14] RFR(XS): 8227236: assert(singleton != __null && singleton != declared_interface) failed In-Reply-To: <9983f77d-c14d-ddc6-aed2-d9ac36c2a447@oracle.com> References: <4a7f557d-e212-f2ab-e580-b9aef0b291a6@oracle.com> <9983f77d-c14d-ddc6-aed2-d9ac36c2a447@oracle.com> Message-ID: It?d be nice if CI provided a consistent snapshot of the world. Having inconsistencies like that is a bit scary. Is it because we don?t want to eagerly snapshot the whole hierarchy of the given class? igor > On Aug 13, 2019, at 3:36 PM, dean.long at oracle.com wrote: > > Looks OK, but what's the harm in memoizing for shared classes? > > dl > > On 8/13/19 8:40 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8227236/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8227236 >> >> There's a race between ciInstanceKlass::nof_implementors()/implementor() >> and class loading for shared classes which may manifest as >> inconsistency between consecutive calls to nof_implementors() and >> implementor() (ciInstanceKlass::implementor() doesn't cache >> InstanceKlass::implementor() for shared classes [1]). That's what >> triggers the assert: the check sees declared_interface->nof_implementors() == 1, >> but concurrent class loading introduces new implementor class before the >> assert. Assert hits singleton == declared_interface case since >> declared_interface->implementor() == declared_interface when >> declared_interface->nof_implementors() > 1 [2]. >> >> The race has been there long before JDK-6986483 [3] (same sequence of >> nof_implementors()/implementor() calls and the assert in C1 code), but >> it seems recent JDK changes made it more likely to occur. >> >> Proposed fix is to check for unique implementor using a single >> implementor() call. If there's any concurrent class loading happening >> which introduces more implementors, corresponding nmethod dependency >> will invalidate the nmethod during installation attempt. >> >> Testing: hs-precheckin-comp, tier1, tier2 >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.cpp#l617 >> >> [2] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.hpp#l72 >> >> [3] https://jbs.oracle.com/browse/JDK-6986483 > From bob.vandette at oracle.com Thu Aug 15 18:19:28 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Thu, 15 Aug 2019 14:19:28 -0400 Subject: RFR: 8229699 - [Graal] jck tests fail on windows with AOTed Graal Message-ID: <7AA0E78C-A603-416A-AA28-944D0325342C@oracle.com> BUG: https://bugs.openjdk.java.net/browse/JDK-8229699 A recent change (https://bugs.openjdk.java.net/browse/JDK-8227439) disabled UseAOT by default. UseAOT is automatically enabled if an AOTLibrary is specified but the check for UseAOT in os:init_2 occurs before this happens. This fix will install the exception handler if a library is added and the user hasn?t manually disabled AOT. diff --git a/src/hotspot/os/windows/os_windows.cpp b/src/hotspot/os/windows/os_windows.cpp --- a/src/hotspot/os/windows/os_windows.cpp +++ b/src/hotspot/os/windows/os_windows.cpp @@ -4122,7 +4122,7 @@ // in order to forward implicit exceptions from code in AOT // generated DLLs. This is necessary since these DLLs are not // registered for structured exceptions like codecache methods are. - if (UseAOT) { + if (AOTLibrary != NULL && (UseAOT || FLAG_IS_DEFAULT(UseAOT))) { topLevelVectoredExceptionHandler = AddVectoredExceptionHandler( 1, topLevelVectoredExceptionFilter); } #endif Bob. From vladimir.kozlov at oracle.com Thu Aug 15 18:59:29 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Aug 2019 11:59:29 -0700 Subject: RFR: 8229699 - [Graal] jck tests fail on windows with AOTed Graal In-Reply-To: <7AA0E78C-A603-416A-AA28-944D0325342C@oracle.com> References: <7AA0E78C-A603-416A-AA28-944D0325342C@oracle.com> Message-ID: Looks good. Thanks Vladimir > On Aug 15, 2019, at 11:19 AM, Bob Vandette wrote: > > BUG: https://bugs.openjdk.java.net/browse/JDK-8229699 > > A recent change (https://bugs.openjdk.java.net/browse/JDK-8227439) disabled UseAOT by default. > UseAOT is automatically enabled if an AOTLibrary is specified but the check for UseAOT in os:init_2 > occurs before this happens. > > This fix will install the exception handler if a library is added and the user hasn?t manually disabled AOT. > > > diff --git a/src/hotspot/os/windows/os_windows.cpp b/src/hotspot/os/windows/os_windows.cpp > --- a/src/hotspot/os/windows/os_windows.cpp > +++ b/src/hotspot/os/windows/os_windows.cpp > @@ -4122,7 +4122,7 @@ > // in order to forward implicit exceptions from code in AOT > // generated DLLs. This is necessary since these DLLs are not > // registered for structured exceptions like codecache methods are. > - if (UseAOT) { > + if (AOTLibrary != NULL && (UseAOT || FLAG_IS_DEFAULT(UseAOT))) { > topLevelVectoredExceptionHandler = AddVectoredExceptionHandler( 1, topLevelVectoredExceptionFilter); > } > #endif > > Bob. > From vladimir.x.ivanov at oracle.com Fri Aug 16 15:06:31 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 16 Aug 2019 18:06:31 +0300 Subject: [14] RFR(XS): 8227236: assert(singleton != __null && singleton != declared_interface) failed In-Reply-To: References: <4a7f557d-e212-f2ab-e580-b9aef0b291a6@oracle.com> <9983f77d-c14d-ddc6-aed2-d9ac36c2a447@oracle.com> Message-ID: I'm not sure how practical would it be to provide a consistent snapshot of the entire world through CI. I agree that it's ugly when CI exposes contradictory info, but compilers can't fully trust the info anyway (irrespective of whether it is globally consistent or not) and all the observations should be validated either using nmethod dependencies or runtime checks in generated code. Speaking of the particular problem, there is a set of ciInstanceKlass instances which are shared across all compilations and there's no good place to update the cached state once they are instantiated. So, on CI level caching is disabled for them and queries always go into runtime. Frankly speaking, I'd prefer to see shared classes go away: it looks like a pure optimization with questionable results. But there may be some bootstrapping subtleties (e.g., see ciObjectFactory::initialize()) I'm missing. Best regards, Vladimir Ivanov PS: this particular bug was exposed after Iterator was turned into a well-known [1] class (and hence became shared). [1] http://hg.openjdk.java.net/jdk/jdk/rev/a0d4e61acb6b#l1.8 On 14/08/2019 13:49, Igor Veresov wrote: > It?d be nice if CI provided a consistent snapshot of the world. Having inconsistencies like that is a bit scary. > Is it because we don?t want to eagerly snapshot the whole hierarchy of the given class? > > igor > > > >> On Aug 13, 2019, at 3:36 PM, dean.long at oracle.com wrote: >> >> Looks OK, but what's the harm in memoizing for shared classes? >> >> dl >> >> On 8/13/19 8:40 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8227236/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8227236 >>> >>> There's a race between ciInstanceKlass::nof_implementors()/implementor() >>> and class loading for shared classes which may manifest as >>> inconsistency between consecutive calls to nof_implementors() and >>> implementor() (ciInstanceKlass::implementor() doesn't cache >>> InstanceKlass::implementor() for shared classes [1]). That's what >>> triggers the assert: the check sees declared_interface->nof_implementors() == 1, >>> but concurrent class loading introduces new implementor class before the >>> assert. Assert hits singleton == declared_interface case since >>> declared_interface->implementor() == declared_interface when >>> declared_interface->nof_implementors() > 1 [2]. >>> >>> The race has been there long before JDK-6986483 [3] (same sequence of >>> nof_implementors()/implementor() calls and the assert in C1 code), but >>> it seems recent JDK changes made it more likely to occur. >>> >>> Proposed fix is to check for unique implementor using a single >>> implementor() call. If there's any concurrent class loading happening >>> which introduces more implementors, corresponding nmethod dependency >>> will invalidate the nmethod during installation attempt. >>> >>> Testing: hs-precheckin-comp, tier1, tier2 >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.cpp#l617 >>> >>> [2] http://hg.openjdk.java.net/jdk/jdk/file/9c0715c5bbf3/src/hotspot/share/ci/ciInstanceKlass.hpp#l72 >>> >>> [3] https://jbs.oracle.com/browse/JDK-6986483 >> > From navy.xliu at gmail.com Mon Aug 19 08:26:44 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Mon, 19 Aug 2019 01:26:44 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed Message-ID: Hi, Maillist, Please review my change to fix IfSplitBlocks for a corner case. JBS: https://bugs.openjdk.java.net/browse/JDK-8229450 webrev: https://cr.openjdk.java.net/~xliu/8229450/webrev/ As JDK1894.before shown, if prevdom is 959 IfFalse === 958 [[ 964 ]] and 964 is a safepoint node in a CountedLoop, PhaseIdealLoop::dominated_by will modify it to 959 IfFalse === 958 [[ 964 905 ]]. After then, It can't pass LoopNode::verify_strip_mined. I still have 2 questions. Could reviewers help me out? 1. I'd like to add a testcase, but JVM won't hit it even though I repeat invoke that function thousands of times. I can see that function is compiled by C2, but C2 succeeds to compile it. The only way to reproduce that problem is using replay file. Is that possible to build a testcase from a replay file? or What kinda of information should I pull out of the replay file? 2. I found you guys sometimes paste a sub-graph of Ideal nodes in JBS issues. Do you have a script to render a IdealLoopTree? So far, I only have idealgraphvisualizer. It renders the whole function.Too big to understand. thanks, --lx From rwestrel at redhat.com Mon Aug 19 15:28:00 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 19 Aug 2019 17:28:00 +0200 Subject: [14] RFR(S): 8228888: C2 compilation fails with assert "m has strange control" In-Reply-To: References: Message-ID: <87pnl1rx8f.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8228888/webrev.00/ Looks good to me. Roland. From vladimir.kozlov at oracle.com Mon Aug 19 16:50:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Aug 2019 09:50:21 -0700 Subject: [14] RFR(S): 8228888: C2 compilation fails with assert "m has strange control" In-Reply-To: References: Message-ID: <81277d90-f64f-88c2-8d8a-116c8074e80f@oracle.com> On 8/7/19 7:13 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8228888 > http://cr.openjdk.java.net/~thartmann/8228888/webrev.00/ Good. > > I found this while trying to write a regression test for another bug. The assert triggers when OSR > compiling an infinite loop with two back branches (see StrangeControl.jasm): > http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png > > PhaseIdealLoop::has_local_phi_input() tries to determine if all inputs of n (118 Phi) are block > local phis. When looking at input m (108 StoreI), the assert fires because m is not a Phi and > control of m (102 IfFalse) does not dominate control of n (83 Region). > > I think the assert which was added by [1] is too strong. If n is a Phi itself, control of all its > inputs does not need to dominate its own control. Correct - loops are examples. Thanks, Vladimir > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8187822 > From vivek.r.deshpande at intel.com Tue Aug 20 03:44:30 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 20 Aug 2019 03:44:30 +0000 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> Hi All I tested this patch with small test which adds byte arrays. for (int i = 0; i < NUM; i++) { data[i] = (byte)(data2[i] + data3[i]); } Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits. Also the loop did not get unrolled after vectorization. I have given the generated code below. Please find the small test attached with the mail. Rgards, Vivek Previous code: 0x00007f833c8691d6: vmovdqu32 0x10(%rsi,%r14,1),%zmm3 0x00007f833c8691e1: vpaddb 0x10(%rdx,%r14,1),%zmm3,%zmm3 0x00007f833c8691ec: vmovdqu32 %zmm3,0x10(%rcx,%r14,1) 0x00007f833c8691f7: vmovdqu32 0x50(%rsi,%rbp,1),%zmm3 0x00007f833c869202: vpaddb 0x50(%rdx,%rbp,1),%zmm3,%zmm3 0x00007f833c86920d: vmovdqu32 %zmm3,0x50(%rcx,%rbp,1) 0x00007f833c869218: vmovdqu32 0x90(%rsi,%rbp,1),%zmm3 0x00007f833c869223: vpaddb 0x90(%rdx,%rbp,1),%zmm3,%zmm3 0x00007f833c86922e: vmovdqu32 %zmm3,0x90(%rcx,%rbp,1) 0x00007f833c869239: vmovdqu32 0xd0(%rsi,%rbp,1),%zmm3 0x00007f833c869244: vpaddb 0xd0(%rdx,%rbp,1),%zmm3,%zmm3 0x00007f833c86924f: vmovdqu32 %zmm3,0xd0(%rcx,%rbp,1) After applying the patch: 0x00007f833c8660a3: vmovdqu 0x10(%rdi,%r11,1),%ymm0 0x00007f833c8660aa: vpaddb 0x10(%rsi,%r11,1),%ymm0,%ymm0 0x00007f833c8660b1: vmovdqu %ymm0,0x10(%rdx,%r11,1) -----Original Message----- From: Jie Fu [mailto:fujie at loongson.cn] Sent: Monday, August 12, 2019 2:27 AM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Cc: Deshpande, Vivek R Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling Hi Vladimir and all, Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.02/ *Analysis* The performance drop is caused by the over loop unrolling of SuperWordLoopUnrollAnalysis, which do not consider the negative effect of pre/post-loop at all. The following is the perf stat data for different loop unrolling factors. Please note that the number of branches increased by ~47% (from 19,854,995,391 to 29,229,714,991) when the unroll-factor increased from 8 to 16. And the total instructions increased by ~58% (from 108,849,151,185 to 171,334,733,346), which was even worse. perf stat for unroll-factor=8: ---------------------------------------------------------------------------- ?????? 5429.117030????? task-clock (msec)???????? #??? 1.006 CPUs utilized ?????????????? 620????? context-switches????????? #??? 0.114 K/sec ??????????????? 11????? cpu-migrations??????????? #??? 0.002 K/sec ??????????? 41,905????? page-faults?????????????? #??? 0.008 M/sec ??? 24,176,919,686????? cycles??????????????????? #??? 4.453 GHz ?? 108,849,151,185????? instructions????????????? #??? 4.50? insn per cycle ??? 19,854,995,391????? branches????????????????? # 3657.132 M/sec ??????? 17,788,819????? branch-misses???????????? #??? 0.09% of all branches ?????? 5.396099347 seconds time elapsed ---------------------------------------------------------------------------- perf stat for unroll-factor=16: ---------------------------------------------------------------------------- ?????? 9158.323771????? task-clock (msec)???????? #??? 1.005 CPUs utilized ?????????????? 763????? context-switches????????? #??? 0.083 K/sec ??????????????? 16????? cpu-migrations??????????? #??? 0.002 K/sec ??????????? 41,884????? page-faults?????????????? #??? 0.005 M/sec ??? 40,831,102,837????? cycles??????????????????? #??? 4.458 GHz ?? 171,334,733,346????? instructions????????????? #??? 4.20? insn per cycle ??? 29,229,714,991????? branches????????????????? # 3191.601 M/sec ??????? 16,455,010????? branch-misses???????????? #??? 0.06% of all branches ?????? 9.115309970 seconds time elapsed ---------------------------------------------------------------------------- The increment of branches and total instructions was mainly introduced by the pre- and post-loop. 1) Higher unroll-factor may lead to more iterations in the pre-loop due to the alignment requirement in the main-loop. ?? For example, with unroll-factor=16, 16 iterations may be required in the pre-loop since 16-byte vector instructions were used in the main-loop. ?? However, no more than 8 iterations when unroll-factor=8. 2) Higher unroll-factor may lead to more iterations in the post-loop since the range of it is [0, unroll-factor - 1). As for the particular case, the distribution of iterations for the pre-/main-/post- loops seem to be: ----------------------------------------------------------------------- ????????? | pre-lp iters | main-lp iters | post-lp iters | total iters ----------------------------------------------------------------------- unroll(8) |????? 8?????? |?????? 6?????? |??????? 5????? |???? 19 ----------------------------------------------------------------------- unroll(16)|???? 16?????? |?????? 2?????? |?????? 13????? |???? 31 ----------------------------------------------------------------------- So it's harmful to unroll with 16. *Fix* The loop body size seems unable to detect this case. When the VM tries to decide whether to unroll 16, the loop body size is just 64, which seems quite reasonable for SuperWordLoopUnrollAnalysis. And the generated loop body in assembly is small enough with unroll 16. ---------------------------------------------------------------------------- ?;; B18: #????? out( B18 B19 ) <- in( B17 B18 ) Loop( B18-B18 inner main of N59) Freq: 891835 ? 0x00007f62187bfba0:?? movslq %r8d,%r11 ? 0x00007f62187bfba3:?? vmovdqu 0x10(%r9,%r11,1),%xmm0 ? 0x00007f62187bfbaa:?? vmovdqu %xmm0,0x10(%rsi,%r11,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 56 (line 21) ? 0x00007f62187bfbb1:?? add??? $0x10,%r8d?????????????????? ;*iinc {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 57 (line 20) ? 0x00007f62187bfbb5:?? cmp??? $0x2f,%r8d ? 0x00007f62187bfbb9:?? jl???? 0x00007f62187bfba0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ??????????????????????????????????????????????????????????? ; - TestSuperWordOverunrolling::execute at 38 (line 20) ---------------------------------------------------------------------------- To fix it, the possible negative effect of pre-/post-loop should be considered. The unrolling may increase the performance if the total iterations of pre/main/post loops could be decreased. However, the precise number of iterations in the pre-/post-loop is really hard to predict since it depends on many factors such as align requirement, number of data&type, object layout, and allocated addresses. To simplify the problem, the number of the pre&post-loop iterations is just assumed to be the same with the unroll-factor. A heuristic is introduced to protect against over-unrolling with SuperWordLoopUnrollAnalysis: ? - Let's assume the unroll-factor is x and the main-loop iteration is y in the previous unrolling round, then the total iterations of pre/main/post loops is y + x. ? - In the next round, the unroll-factor becomes 2x and the main-loop iteration is y/2, then the total iterations of pre/main/post loops is y/2 + 2x. ? - We'd better not unroll if: y/2 + 2x? > y + x, that is 2x > y ---------------------------------------------------------------------------- ????????? | unroll-factor | main-lp iters | pre&post-lp iters | total iters ---------------------------------------------------------------------------- pre-round |?????? x?????? |?????? y?????? |??????? x????????? | y + x ---------------------------------------------------------------------------- next-round|????? 2x?????? |????? y/2????? |?????? 2x????????? | y/2 + 2x ---------------------------------------------------------------------------- *Testing* No performance regression in SPECjvm2008 Any comments? Thanks a lot. Best regards, Jie On 2019/8/7 ??5:58, Vladimir Kozlov wrote: > Hi Jie > > Very interesting observation. I am concern that webrev.01 does check > for general loop which may not be vectorized. Even if your > optimization helps in particular case it may make some loop regress > due to executing more branches. > > On 7/11/19 1:20 AM, Jie Fu wrote: >> Hi all, >> >> With more experiments, the loop's trip_count seems a good feature to >> detect over loop unrolling. >> And on some platforms, the branch-miss rate had been observed >> increasing dramatically with small loop trip count. > > Why? With more unrolling you should have less number of branches. > >> It seems that we shouldn't unroll if the trip count becomes too small. > > May be there is different explanation for this. May be big loop body > does not fit into code buffer in X86 cpu - or something like that. End > we should watch for body size instead. > > Thanks, > Vladimir > >> >> I've updated the webrev here: >> http://cr.openjdk.java.net/~jiefu/8227505/webrev.01/ >> >> Please review it and give me some advice. >> >> Thanks a lot. >> Best regards, >> Jie >> >> On 2019/7/10 ??4:38, Jie Fu wrote: >>> Hi all, >>> >>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8227505 >>> Webrev: http://cr.openjdk.java.net/~jiefu/8227505/webrev.00/ >>> >>> The patch fix the over loop unrolling problem caused by >>> SuperWordLoopUnrollAnalysis. >>> For more info., please refer to the JBS. >>> >>> Could you please review it and give me some advice? >>> >>> Thanks a lot. >>> Best regards, >>> Jie >>> >>> >> From tobias.hartmann at oracle.com Tue Aug 20 05:43:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Aug 2019 07:43:10 +0200 Subject: [14] RFR(S): 8228888: C2 compilation fails with assert "m has strange control" In-Reply-To: <87pnl1rx8f.fsf@redhat.com> References: <87pnl1rx8f.fsf@redhat.com> Message-ID: <091fb7ab-ac93-08f2-2ec3-ce52713dda82@oracle.com> Hi Roland, thanks for the review! Best regards, Tobias On 19.08.19 17:28, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8228888/webrev.00/ > > Looks good to me. > > Roland. > From tobias.hartmann at oracle.com Tue Aug 20 05:43:33 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Aug 2019 07:43:33 +0200 Subject: [14] RFR(S): 8228888: C2 compilation fails with assert "m has strange control" In-Reply-To: <81277d90-f64f-88c2-8d8a-116c8074e80f@oracle.com> References: <81277d90-f64f-88c2-8d8a-116c8074e80f@oracle.com> Message-ID: <73d8dcc1-a3e0-e8c2-e48d-b3ec54a13f38@oracle.com> Hi Vladimir, thanks for the review! Best regards, Tobias On 19.08.19 18:50, Vladimir Kozlov wrote: > On 8/7/19 7:13 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8228888 >> http://cr.openjdk.java.net/~thartmann/8228888/webrev.00/ > > Good. > >> >> I found this while trying to write a regression test for another bug. The assert triggers when OSR >> compiling an infinite loop with two back branches (see StrangeControl.jasm): >> http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png >> >> PhaseIdealLoop::has_local_phi_input() tries to determine if all inputs of n (118 Phi) are block >> local phis. When looking at input m (108 StoreI), the assert fires because m is not a Phi and >> control of m (102 IfFalse) does not dominate control of n (83 Region). >> >> I think the assert which was added by [1] is too strong. If n is a Phi itself, control of all its >> inputs does not need to dominate its own control. > > Correct - loops are examples. > > Thanks, > Vladimir > >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8187822 >> From tobias.hartmann at oracle.com Tue Aug 20 06:46:05 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Aug 2019 08:46:05 +0200 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: Message-ID: On 19.08.19 10:26, Liu Xin wrote: > I still have 2 questions. Could reviewers help me out? > > 1. I'd like to add a testcase, but JVM won't hit it even though I repeat > invoke that function thousands of times. I can see that function is > compiled by C2, but C2 succeeds to compile it. > The only way to reproduce that problem is using replay file. Is that > possible to build a testcase from a replay file? or What kinda of > information should I pull out of the replay file? Generating a simple regression test is a science in itself but such tests are extremely valuable. If a replay compilation file reproduces the failure, I usually attempt the following to create a simple regression test: - Try to simplify the replay compilation file as far as possible by removing inline statements - Disable as many C1/C2 optimizations [1] as possible so that the issue still reproduces (this will simplify the graph) - Debug the issue using replay compilation until you get a good understanding of the root cause and the required conditions (inlining, optimistic optimizations based on profiling, IR shape, ...) - Look at the Java code that is compiled and the way it is invoked/profiled. Not being able to reproduce with a normal run usually suggests that the profiling is different. - Write a test that invokes the same method in the same way (same arguments, same inlining). Use CompileCommands to control inling if necessary. Compare the IR generated for your test to the IR generated with the replay file (flags like -XX:+TraceLoopOpts also help) and modify your test to get closer. Sometimes this takes days but it's worth it. - Once the failure triggers, simplify the test as good as possible. - If you don't have a fix yet, use the simple regression test to debug further. > 2. I found you guys sometimes paste a sub-graph of Ideal nodes in JBS > issues. Do you have a script to render a IdealLoopTree? So far, I only have > idealgraphvisualizer. It renders the whole function.Too big to understand. Do you mean sub-graphs like [2]? That graph is created by the IdealGraphVisualizer. You can search for nodes or apply filters to find the interesting parts of the usually large graph and then click on nodes to expand/collapse other parts of the graph. I usually print the ids of offending nodes by modifying the sources and then look at the surrounding nodes with the IdealGraphVisualizer or directly at the -XX:+PrintIdeal output. Stepping through the optimization phases might help as well. Best regards, Tobias [1] Here are some C2 optimizations you might want to try to disable -XX:-OptimizePtrCompare -XX:-OptoPeephole -XX:LoopUnrollLimit=0 -XX:LoopMaxUnroll=0 -XX:-SuperWordLoopUnrollAnalysis -XX:-UseCountedLoopSafepoints -XX:-UseLoopPredicate -XX:-PartialPeelAtUnsignedTests -XX:-LoopUnswitching -XX:-UseSuperWord -XX:-SubsumeLoads -XX:-OptimizeStringConcat -XX:-SplitIfBlocks -XX:-RangeCheckElimination [2] http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png From navy.xliu at gmail.com Tue Aug 20 07:52:12 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Tue, 20 Aug 2019 00:52:12 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: Message-ID: hi, Tobias, Thank you for providing this information. I will work on it with this guideline. thanks, --lx On Mon, Aug 19, 2019 at 11:46 PM Tobias Hartmann wrote: > > On 19.08.19 10:26, Liu Xin wrote: > > I still have 2 questions. Could reviewers help me out? > > > > 1. I'd like to add a testcase, but JVM won't hit it even though I > repeat > > invoke that function thousands of times. I can see that function is > > compiled by C2, but C2 succeeds to compile it. > > The only way to reproduce that problem is using replay file. Is that > > possible to build a testcase from a replay file? or What kinda of > > information should I pull out of the replay file? > > Generating a simple regression test is a science in itself but such tests > are extremely valuable. > > If a replay compilation file reproduces the failure, I usually attempt the > following to create a > simple regression test: > - Try to simplify the replay compilation file as far as possible by > removing inline statements > - Disable as many C1/C2 optimizations [1] as possible so that the issue > still reproduces (this will > simplify the graph) > - Debug the issue using replay compilation until you get a good > understanding of the root cause and > the required conditions (inlining, optimistic optimizations based on > profiling, IR shape, ...) > - Look at the Java code that is compiled and the way it is > invoked/profiled. Not being able to > reproduce with a normal run usually suggests that the profiling is > different. > - Write a test that invokes the same method in the same way (same > arguments, same inlining). Use > CompileCommands to control inling if necessary. Compare the IR generated > for your test to the IR > generated with the replay file (flags like -XX:+TraceLoopOpts also help) > and modify your test to get > closer. Sometimes this takes days but it's worth it. > - Once the failure triggers, simplify the test as good as possible. > - If you don't have a fix yet, use the simple regression test to debug > further. > > > 2. I found you guys sometimes paste a sub-graph of Ideal nodes in JBS > > issues. Do you have a script to render a IdealLoopTree? So far, I only > have > > idealgraphvisualizer. It renders the whole function.Too big to > understand. > > Do you mean sub-graphs like [2]? That graph is created by the > IdealGraphVisualizer. You can search > for nodes or apply filters to find the interesting parts of the usually > large graph and then click > on nodes to expand/collapse other parts of the graph. > > I usually print the ids of offending nodes by modifying the sources and > then look at the surrounding > nodes with the IdealGraphVisualizer or directly at the -XX:+PrintIdeal > output. Stepping through the > optimization phases might help as well. > > Best regards, > Tobias > > [1] Here are some C2 optimizations you might want to try to disable > -XX:-OptimizePtrCompare -XX:-OptoPeephole -XX:LoopUnrollLimit=0 > -XX:LoopMaxUnroll=0 > -XX:-SuperWordLoopUnrollAnalysis -XX:-UseCountedLoopSafepoints > -XX:-UseLoopPredicate > -XX:-PartialPeelAtUnsignedTests -XX:-LoopUnswitching -XX:-UseSuperWord > -XX:-SubsumeLoads > -XX:-OptimizeStringConcat -XX:-SplitIfBlocks -XX:-RangeCheckElimination > [2] http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png > From fujie at loongson.cn Tue Aug 20 08:44:02 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 20 Aug 2019 16:44:02 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> Message-ID: <19cea482-4a23-cd66-31b0-39872d403baa@loongson.cn> Hi Vivek, Thanks for your review and valuable comments. On 2019/8/20 ??11:44, Deshpande, Vivek R wrote: > Hi All > > I tested this patch with small test which adds byte arrays. > for (int i = 0; i < NUM; i++) { > data[i] = (byte)(data2[i] + data3[i]); > } > > Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits. As for your particular case, the following is the performance results on my i7-8700 machine, which can support vector-256 at most. (Running with: time java SmallByteAdd) Original Code: ? --------------------- ? 26641963.050 iter/sec ? real? 0m0.553s ? user? 0m0.595s ? sys?? 0m0.008s ? --------------------- After the patch: ? --------------------- ? 29152473.435 iter/sec ? real? 0m0.521s ? user? 0m0.562s ? sys?? 0m0.012s ? --------------------- It seems that the patched version is a little better. I don't have a machine which supports AVX-512 at hand. And I'm trying to find one to analyze your case. Could please show me the performance of your case with vector-512 and vector-256 on your machine? > Also the loop did not get unrolled after vectorization. > Yes. I can reproduce it on my computer. I'm surprised that the unrolled version seems a bit slow (on my computer) again. I'll investigate it soon. Thanks a lot. Best regards, Jie From rwestrel at redhat.com Tue Aug 20 11:47:45 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 20 Aug 2019 13:47:45 +0200 Subject: RFR(XS): 8227384: C2 compilation fails with "graph should be schedulable" when running with -XX:-EliminateLocks In-Reply-To: <2264efd4-a7e4-25ee-493b-fb058a3d3bf9@oracle.com> References: <87lfx657u8.fsf@redhat.com> <288a892d-5486-6547-df77-cc39337b1a29@oracle.com> <87a7dl57vu.fsf@redhat.com> <871ryf3c37.fsf@redhat.com> <2264efd4-a7e4-25ee-493b-fb058a3d3bf9@oracle.com> Message-ID: <87ftlwrrby.fsf@redhat.com> Hi Vladimir, Thanks for looking at this. > Expanding Locks before Allocations is good idea. We do eliminate Locks before eliminating > Allocations. Will a load after IGVN optimization folds with load generated in > PhaseMacroExpand::initialize_object() ? Can you clarify your concern? Are you talking about a load of the mark word? Or is it the load of the prototype header in PhaseMacroExpand::initialize_object()? > I don't see offset check in is_new_object_mark_load(). How it known it is load from *mark word*? It checks that the address input of the load is an allocation. If it is not at offset 0, there is an AddP between the load and the allocation and is_new_object_mark_load() returns false on this graph shape. Roland. From tobias.hartmann at oracle.com Tue Aug 20 13:03:20 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Aug 2019 15:03:20 +0200 Subject: [14] RFR(S): 8224624: Inefficiencies in CodeStrings::add_comment cause timeouts Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8224624 http://cr.openjdk.java.net/~thartmann/8224624/webrev.00/ The test that was added with the fix for JDK-8207355 [1] forces C1 to generate hundreds of exception handlers. If CommentedAssembly is enabled, LIR_Assembler::emit_exception_entries adds comments to the assembly code that is generated for these exception adapter blocks [2]. To add the comments to the right offset, CodeStrings::add_comment searches through *all* _code_strings in the CodeBuffer until it finds the last comment with that offset. This is extremely slow with a large amount of code strings (see compile times for the single test method [3]) and is repeated for every new comment that is added. I've fixed this by changing CodeStrings to a doubly-linked-list and searching for the comment with the right offset in reverse (because especially for these exception handlers, we add comments with increasing offset). In addition, the code now maintains the invariant that comments in the linked list a sorted by increasing offset. When reverse-searching the list, we can therefore bail out if we encounter a comment with offset <= the offset we are searching for. This improves C1 compiled time of the test method dramatically from 92s to 1,8s [4]. Thanks, Tobias [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-August/029973.html [2] Commented assembly code for exception adapter blocks: [...] ;; Exception adapter block 0x00007f2aed9885de: mov 0x248(%rsp),%rsi ;; branch [AL] [B61] 0x00007f2aed9885e6: jmpq 0x00007f2aed974f0c ;; Exception adapter block 0x00007f2aed9885eb: mov 0x240(%rsp),%rsi ;; branch [AL] [B63] 0x00007f2aed9885f3: jmpq 0x00007f2aed974f94 ;; Exception adapter block [...] [3] Compile times without patch: Individual compiler times (for compiled methods only) ------------------------------------------------ C1 {speed: 0 bytes/s; standard: 0,000 s, 0 bytes, 0 methods; osr: 0,000 s, 0 bytes, 0 methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} C1 Compile Time: 92,966 s Setup time: 0,000 s Build HIR: 1,651 s Parse: 0,039 s Optimize blocks: 1,558 s GVN: 0,034 s Null checks elim: 0,002 s Range checks elim: 0,000 s Other: 0,019 s Emit LIR: 0,237 s LIR Gen: 0,002 s Linear Scan: 0,235 s Other: 0,000 s Code Emission: 91,078 s Code Installation: 0,000 s Other: 0,000 s JVMCI code install time: 0,000 s [4] Compile times with patch: Individual compiler times (for compiled methods only) ------------------------------------------------ C1 {speed: 0 bytes/s; standard: 0,000 s, 0 bytes, 0 methods; osr: 0,000 s, 0 bytes, 0 methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} C1 Compile Time: 1,849 s Setup time: 0,000 s Build HIR: 1,575 s Parse: 0,038 s Optimize blocks: 1,483 s GVN: 0,033 s Null checks elim: 0,002 s Range checks elim: 0,000 s Other: 0,019 s Emit LIR: 0,228 s LIR Gen: 0,002 s Linear Scan: 0,226 s Other: 0,000 s Code Emission: 0,047 s Code Installation: 0,000 s Other: 0,000 s JVMCI code install time: 0,000 s From vladimir.kozlov at oracle.com Tue Aug 20 15:24:09 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Aug 2019 08:24:09 -0700 Subject: [14] RFR(S): 8224624: Inefficiencies in CodeStrings::add_comment cause timeouts In-Reply-To: References: Message-ID: <74ca5b89-c0b4-1270-3aa9-f496f677f1fe@oracle.com> Good. thanks, Vladimir On 8/20/19 6:03 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8224624 > http://cr.openjdk.java.net/~thartmann/8224624/webrev.00/ > > The test that was added with the fix for JDK-8207355 [1] forces C1 to generate hundreds of exception > handlers. If CommentedAssembly is enabled, LIR_Assembler::emit_exception_entries adds comments to > the assembly code that is generated for these exception adapter blocks [2]. To add the comments to > the right offset, CodeStrings::add_comment searches through *all* _code_strings in the CodeBuffer > until it finds the last comment with that offset. This is extremely slow with a large amount of code > strings (see compile times for the single test method [3]) and is repeated for every new comment > that is added. > > I've fixed this by changing CodeStrings to a doubly-linked-list and searching for the comment with > the right offset in reverse (because especially for these exception handlers, we add comments with > increasing offset). In addition, the code now maintains the invariant that comments in the linked > list a sorted by increasing offset. When reverse-searching the list, we can therefore bail out if we > encounter a comment with offset <= the offset we are searching for. This improves C1 compiled time > of the test method dramatically from 92s to 1,8s [4]. > > Thanks, > Tobias > > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-August/029973.html > > [2] Commented assembly code for exception adapter blocks: > [...] > ;; Exception adapter block > 0x00007f2aed9885de: mov 0x248(%rsp),%rsi > ;; branch [AL] [B61] > 0x00007f2aed9885e6: jmpq 0x00007f2aed974f0c > ;; Exception adapter block > 0x00007f2aed9885eb: mov 0x240(%rsp),%rsi > ;; branch [AL] [B63] > 0x00007f2aed9885f3: jmpq 0x00007f2aed974f94 > ;; Exception adapter block > [...] > > [3] Compile times without patch: > > Individual compiler times (for compiled methods only) > ------------------------------------------------ > > C1 {speed: 0 bytes/s; standard: 0,000 s, 0 bytes, 0 methods; osr: 0,000 s, 0 bytes, 0 methods; > nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} > C1 Compile Time: 92,966 s > Setup time: 0,000 s > Build HIR: 1,651 s > Parse: 0,039 s > Optimize blocks: 1,558 s > GVN: 0,034 s > Null checks elim: 0,002 s > Range checks elim: 0,000 s > Other: 0,019 s > Emit LIR: 0,237 s > LIR Gen: 0,002 s > Linear Scan: 0,235 s > Other: 0,000 s > Code Emission: 91,078 s > Code Installation: 0,000 s > Other: 0,000 s > JVMCI code install time: 0,000 s > > [4] Compile times with patch: > Individual compiler times (for compiled methods only) > ------------------------------------------------ > > C1 {speed: 0 bytes/s; standard: 0,000 s, 0 bytes, 0 methods; osr: 0,000 s, 0 bytes, 0 methods; > nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} > C1 Compile Time: 1,849 s > Setup time: 0,000 s > Build HIR: 1,575 s > Parse: 0,038 s > Optimize blocks: 1,483 s > GVN: 0,033 s > Null checks elim: 0,002 s > Range checks elim: 0,000 s > Other: 0,019 s > Emit LIR: 0,228 s > LIR Gen: 0,002 s > Linear Scan: 0,226 s > Other: 0,000 s > Code Emission: 0,047 s > Code Installation: 0,000 s > Other: 0,000 s > JVMCI code install time: 0,000 s > From tobias.hartmann at oracle.com Tue Aug 20 15:35:29 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Aug 2019 17:35:29 +0200 Subject: [14] RFR(S): 8224624: Inefficiencies in CodeStrings::add_comment cause timeouts In-Reply-To: <74ca5b89-c0b4-1270-3aa9-f496f677f1fe@oracle.com> References: <74ca5b89-c0b4-1270-3aa9-f496f677f1fe@oracle.com> Message-ID: <99b0dbd7-ecc2-84e4-2ad9-674f260d1334@oracle.com> Thanks, Vladimir. Best regards, Tobias On 20.08.19 17:24, Vladimir Kozlov wrote: > Good. > > thanks, > Vladimir > > On 8/20/19 6:03 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8224624 >> http://cr.openjdk.java.net/~thartmann/8224624/webrev.00/ >> >> The test that was added with the fix for JDK-8207355 [1] forces C1 to generate hundreds of exception >> handlers. If CommentedAssembly is enabled, LIR_Assembler::emit_exception_entries adds comments to >> the assembly code that is generated for these exception adapter blocks [2]. To add the comments to >> the right offset, CodeStrings::add_comment searches through *all* _code_strings in the CodeBuffer >> until it finds the last comment with that offset. This is extremely slow with a large amount of code >> strings (see compile times for the single test method [3]) and is repeated for every new comment >> that is added. >> >> I've fixed this by changing CodeStrings to a doubly-linked-list and searching for the comment with >> the right offset in reverse (because especially for these exception handlers, we add comments with >> increasing offset). In addition, the code now maintains the invariant that comments in the linked >> list a sorted by increasing offset. When reverse-searching the list, we can therefore bail out if we >> encounter a comment with offset <= the offset we are searching for. This improves C1 compiled time >> of the test method dramatically from 92s to 1,8s [4]. >> >> Thanks, >> Tobias >> >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-August/029973.html >> >> [2] Commented assembly code for exception adapter blocks: >> [...] >> ? ;; Exception adapter block >> ?? 0x00007f2aed9885de:?? mov??? 0x248(%rsp),%rsi >> ? ;;????? branch [AL] [B61] >> ?? 0x00007f2aed9885e6:?? jmpq?? 0x00007f2aed974f0c >> ? ;; Exception adapter block >> ?? 0x00007f2aed9885eb:?? mov??? 0x240(%rsp),%rsi >> ? ;;????? branch [AL] [B63] >> ?? 0x00007f2aed9885f3:?? jmpq?? 0x00007f2aed974f94 >> ? ;; Exception adapter block >> [...] >> >> [3] Compile times without patch: >> >> Individual compiler times (for compiled methods only) >> ------------------------------------------------ >> >> ?? C1 {speed: 0 bytes/s; standard:? 0,000 s, 0 bytes, 0 methods; osr:? 0,000 s, 0 bytes, 0 methods; >> nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} >> ???? C1 Compile Time:?????? 92,966 s >> ??????? Setup time:??????????? 0,000 s >> ??????? Build HIR:???????????? 1,651 s >> ????????? Parse:???????????????? 0,039 s >> ????????? Optimize blocks:?????? 1,558 s >> ????????? GVN:?????????????????? 0,034 s >> ????????? Null checks elim:????? 0,002 s >> ????????? Range checks elim:???? 0,000 s >> ????????? Other:???????????????? 0,019 s >> ??????? Emit LIR:????????????? 0,237 s >> ????????? LIR Gen:?????????????? 0,002 s >> ????????? Linear Scan:?????????? 0,235 s >> ????????? Other:???????????????? 0,000 s >> ??????? Code Emission:??????? 91,078 s >> ??????? Code Installation:???? 0,000 s >> ??????? Other:???????????????? 0,000 s >> ??????? JVMCI code install time:???????? 0,000 s >> >> [4] Compile times with patch: >> Individual compiler times (for compiled methods only) >> ------------------------------------------------ >> >> ?? C1 {speed: 0 bytes/s; standard:? 0,000 s, 0 bytes, 0 methods; osr:? 0,000 s, 0 bytes, 0 methods; >> nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} >> ???? C1 Compile Time:??????? 1,849 s >> ??????? Setup time:??????????? 0,000 s >> ??????? Build HIR:???????????? 1,575 s >> ????????? Parse:???????????????? 0,038 s >> ????????? Optimize blocks:?????? 1,483 s >> ????????? GVN:?????????????????? 0,033 s >> ????????? Null checks elim:????? 0,002 s >> ????????? Range checks elim:???? 0,000 s >> ????????? Other:???????????????? 0,019 s >> ??????? Emit LIR:????????????? 0,228 s >> ????????? LIR Gen:?????????????? 0,002 s >> ????????? Linear Scan:?????????? 0,226 s >> ????????? Other:???????????????? 0,000 s >> ??????? Code Emission:???????? 0,047 s >> ??????? Code Installation:???? 0,000 s >> ??????? Other:???????????????? 0,000 s >> ??????? JVMCI code install time:???????? 0,000 s >> From fujie at loongson.cn Wed Aug 21 07:51:04 2019 From: fujie at loongson.cn (Jie Fu) Date: Wed, 21 Aug 2019 15:51:04 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <19cea482-4a23-cd66-31b0-39872d403baa@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <19cea482-4a23-cd66-31b0-39872d403baa@loongson.cn> Message-ID: <69ba13a0-4eb4-68ca-a90a-49f0fffdb1d5@loongson.cn> On 2019/8/20 ??4:44, Jie Fu wrote: >> Also the loop did not get unrolled after vectorization. >> > Yes. I can reproduce it on my computer. > I'm surprised that the unrolled version seems a bit slow (on my > computer) again. Not true. The difference of performance came from different jdk-images used in my testing. After vectorization, the unrolled main-loop is a performance bug caused by my previous patch. Thanks Vivek for correcting me. Will update the webrev to fix this bug later. Thanks a lot. Best regards, Jie From fujie at loongson.cn Wed Aug 21 08:59:24 2019 From: fujie at loongson.cn (Jie Fu) Date: Wed, 21 Aug 2019 16:59:24 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> Message-ID: <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> Hi Vivek, Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.03/ Please see comments inline. On 2019/8/20 ??11:44, Deshpande, Vivek R wrote: > Hi All > > I tested this patch with small test which adds byte arrays. > for (int i = 0; i < NUM; i++) { > data[i] = (byte)(data2[i] + data3[i]); > } > > Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits. I've asked my manager for an AVX-512 machine. But it will take some time to get it. So could you please share me the performance of your test case on your? AVX-512 machine? > Also the loop did not get unrolled after vectorization. I have given the generated code below. Fixed. Thanks. Any comments? Thanks a lot. Best regards, Jie From nils.eliasson at oracle.com Wed Aug 21 10:00:08 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Aug 2019 12:00:08 +0200 Subject: RFR(S): 8229970: ZGC: C2: fixup_uses_in_catch may fail when expanding many uses Message-ID: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> Hi, This is a very small patch that fixes an issue when iterating over all uses to call fixup_uses_in_catch. The out array may be reallocated which might cause the iteration to miss some uses. This patch fix point iterates over each use instead. Bug: https://bugs.openjdk.java.net/browse/JDK-8229970 Webrev: http://cr.openjdk.java.net/~neliasso/8229970/webrev.01/ Please review, // Nils From nils.eliasson at oracle.com Wed Aug 21 10:30:06 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Aug 2019 12:30:06 +0200 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection Message-ID: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> Hi, This is a workaround for the problem that sometimes non-CFG nodes have their ctrl (PhaseIdealLoop::get_ctrl(n)) set directly to calls, instead of the call's control projection. I'm working on a patch for that, but it might end up a bit intrusive. This is a workaround that simply checks and adjusts for that case. Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ Please review, // Nils From vladimir.kozlov at oracle.com Wed Aug 21 15:31:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Aug 2019 08:31:42 -0700 Subject: RFR(S): 8229970: ZGC: C2: fixup_uses_in_catch may fail when expanding many uses In-Reply-To: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> References: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> Message-ID: <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> Good. Thanks, Vladimir On 8/21/19 3:00 AM, Nils Eliasson wrote: > Hi, > > This is a very small patch that fixes an issue when iterating over all uses to call > fixup_uses_in_catch. The out array may be reallocated which might cause the iteration to miss some > uses. This patch fix point iterates over each use instead. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8229970 > > Webrev: http://cr.openjdk.java.net/~neliasso/8229970/webrev.01/ > > Please review, > > // Nils > From vladimir.kozlov at oracle.com Wed Aug 21 15:36:25 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Aug 2019 08:36:25 -0700 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection In-Reply-To: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> References: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> Message-ID: <7b254ddf-c0c0-b0b8-26f5-c30d91a15fd7@oracle.com> Hi Nils, Did you find in which cases the control edge is set to Call node directly? Since the fix for JDK 14 we have time to fix it correctly instead of workaround. Thanks, Vladimir On 8/21/19 3:30 AM, Nils Eliasson wrote: > Hi, > > This is a workaround for the problem that sometimes non-CFG nodes have their ctrl > (PhaseIdealLoop::get_ctrl(n)) set directly to calls, instead of the call's control projection. I'm > working on a patch for that, but it might end up a bit intrusive. This is a workaround that simply > checks and adjusts for that case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 > > Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ > > Please review, > > // Nils > From vivek.r.deshpande at intel.com Wed Aug 21 19:37:05 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 21 Aug 2019 19:37:05 +0000 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> Hi Jie Thanks for working on this. I tried webrev.03. The 2nd compilation(recompilation) of doit2 generates the code which does not use full vector width and also does not unroll after vectorization. Regards, Vivek -----Original Message----- From: Jie Fu [mailto:fujie at loongson.cn] Sent: Wednesday, August 21, 2019 1:59 AM To: Deshpande, Vivek R ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling Hi Vivek, Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.03/ Please see comments inline. On 2019/8/20 ??11:44, Deshpande, Vivek R wrote: > Hi All > > I tested this patch with small test which adds byte arrays. > for (int i = 0; i < NUM; i++) { > data[i] = (byte)(data2[i] + data3[i]); } > > Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits. I've asked my manager for an AVX-512 machine. But it will take some time to get it. So could you please share me the performance of your test case on your AVX-512 machine? > Also the loop did not get unrolled after vectorization. I have given the generated code below. Fixed. Thanks. Any comments? Thanks a lot. Best regards, Jie From nils.eliasson at oracle.com Wed Aug 21 19:56:02 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Aug 2019 21:56:02 +0200 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection In-Reply-To: <7b254ddf-c0c0-b0b8-26f5-c30d91a15fd7@oracle.com> References: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> <7b254ddf-c0c0-b0b8-26f5-c30d91a15fd7@oracle.com> Message-ID: <0cb0be58-e35e-c85e-72cf-179a566f559a@oracle.com> Hi Vladimir, I want the workaround in promtly because the issue causes noise in the bigapps testing which is really annoying, and might be hiding other problems. The core issue - the PhaseIdealLoop::get_ctrl problem, might turn up quite large. Many types of MultiNodes, like calls and Membars are at times set incorrectly (often in loopopts). SafepointNodes are a direct subclass of MultiNodes but doesn't seem to use a control projection anywhere. (But calls that are a subclass of SafepointNodes expect it - very confusing). It is also quite common that the node retrieved from a PhaseIdealLoop::get_ctrl(..) is set as control in another node, which propagates the problem. I have tried to sanitize the ctrl-mapping by intercepting all MultiNodes (except SafePointNodes and StartNodes) in PhaseIdealLoop::set_ctrl(). That seems to solve all downstream issues (without causing new problems). But for a permanent fix I need to go through all the call sites of set_ctrl() and find out why they don't use the projection. I also need to investigate why SafepointNodes don't use control projections, and what it would take to fix it. Regards, Nils On 2019-08-21 17:36, Vladimir Kozlov wrote: > Hi Nils, > > Did you find in which cases the control edge is set to Call node > directly? > > Since the fix for JDK 14 we have time to fix it correctly instead of > workaround. > > Thanks, > Vladimir > > On 8/21/19 3:30 AM, Nils Eliasson wrote: >> Hi, >> >> This is a workaround for the problem that sometimes non-CFG nodes >> have their ctrl (PhaseIdealLoop::get_ctrl(n)) set directly to calls, >> instead of the call's control projection. I'm working on a patch for >> that, but it might end up a bit intrusive. This is a workaround that >> simply checks and adjusts for that case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ >> >> Please review, >> >> // Nils >> From nils.eliasson at oracle.com Wed Aug 21 19:56:34 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 21 Aug 2019 21:56:34 +0200 Subject: RFR(S): 8229970: ZGC: C2: fixup_uses_in_catch may fail when expanding many uses In-Reply-To: <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> References: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> Message-ID: Thank you! // Nils On 2019-08-21 17:31, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 8/21/19 3:00 AM, Nils Eliasson wrote: >> Hi, >> >> This is a very small patch that fixes an issue when iterating over >> all uses to call fixup_uses_in_catch. The out array may be >> reallocated which might cause the iteration to miss some uses. This >> patch fix point iterates over each use instead. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8229970 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8229970/webrev.01/ >> >> Please review, >> >> // Nils >> From vladimir.kozlov at oracle.com Wed Aug 21 20:11:12 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Aug 2019 13:11:12 -0700 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection In-Reply-To: <0cb0be58-e35e-c85e-72cf-179a566f559a@oracle.com> References: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> <7b254ddf-c0c0-b0b8-26f5-c30d91a15fd7@oracle.com> <0cb0be58-e35e-c85e-72cf-179a566f559a@oracle.com> Message-ID: Thank you for explaining, Nils I am okay with this temporary workaround then. Please, file bug to fix main issue. Thanks, Vladimir On 8/21/19 12:56 PM, Nils Eliasson wrote: > Hi Vladimir, > > I want the workaround in promtly because the issue causes noise in the bigapps testing which is > really annoying, and might be hiding other problems. > > The core issue - the PhaseIdealLoop::get_ctrl problem, might turn up quite large. Many types of > MultiNodes, like calls and Membars are at times set incorrectly (often in loopopts). SafepointNodes > are a direct subclass of MultiNodes but doesn't seem to use a control projection anywhere. (But > calls that are a subclass of SafepointNodes expect it - very confusing). > > It is also quite common that the node retrieved from a PhaseIdealLoop::get_ctrl(..) is set as > control in another node, which propagates the problem. > > I have tried to sanitize the ctrl-mapping by intercepting all MultiNodes (except SafePointNodes and > StartNodes) in PhaseIdealLoop::set_ctrl(). That seems to solve all downstream issues (without > causing new problems). But for a permanent fix I need to go through all the call sites of set_ctrl() > and find out why they don't use the projection. > > I also need to investigate why SafepointNodes don't use control projections, and what it would take > to fix it. > > Regards, > > Nils > > > On 2019-08-21 17:36, Vladimir Kozlov wrote: >> Hi Nils, >> >> Did you find in which cases the control edge is set to Call node directly? >> >> Since the fix for JDK 14 we have time to fix it correctly instead of workaround. >> >> Thanks, >> Vladimir >> >> On 8/21/19 3:30 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This is a workaround for the problem that sometimes non-CFG nodes have their ctrl >>> (PhaseIdealLoop::get_ctrl(n)) set directly to calls, instead of the call's control projection. >>> I'm working on a patch for that, but it might end up a bit intrusive. This is a workaround that >>> simply checks and adjusts for that case. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ >>> >>> Please review, >>> >>> // Nils >>> From fujie at loongson.cn Thu Aug 22 03:53:23 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 22 Aug 2019 11:53:23 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> Message-ID: <1fa86506-797b-6f18-bcc3-1d7f8d2a32ca@loongson.cn> Hi Vivek, Thanks for your review and comments. On 2019/8/22 ??3:37, Deshpande, Vivek R wrote: > The 2nd compilation(recompilation) of doit2 generates the code which does not use full vector width I don't understand why we must compile with vector-512 since you didn't share the performance of vector-256 and vector-512 respectively. As for my reproducer[1], when vector-128 is used, performance becomes much slower than vector-64. Does vector-512 runs much faster than vector-256 on your manchine? Could you please share the detailed performance data? > and also does not unroll after vectorization. Sorry. I can't reproduce this issue on my computer. Here is the compilation log[2] of my testing. Probably this issue can be only triggered on an AVX-512 machine. I will investigate it once my AVX-512 machine is ready. If I missed anything please let me know. Thanks a lot. Best regards, Jie [1] http://cr.openjdk.java.net/~jiefu/8227505/TestSuperWordOverunrolling.java [2] http://cr.openjdk.java.net/~jiefu/8227505/log/SmallByteAdd.log256 From jatin.bhateja at intel.com Thu Aug 22 06:49:58 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 22 Aug 2019 06:49:58 +0000 Subject: 8230015: [instruction selector] generic vector operands support. Message-ID: Hi All, Please find below a patch for generic vector operands[1] support during instruction selection. Motivation behind the patch is to reduce the number of vector selection patterns whose operands meagerly differ in vector lengths. This will not only result in lesser code being generated by ADLC which effectively translates to size reduction in libjvm.so but also help in better maintenance of AD files. Using generic operands we were able to collapse multiple vector patterns over mainline Initial number of vector instruction patterns (vec[XYZSD] + legVec[ZXYSD] : 510 Reduced vector instruction patterns (vecG + legVecG) : 222 With this we could see around 1MB size reduction in libjvm.so. In order to have minimal impact over downstream compiler passes, a post-selection pass has been introduced (currently enabled only for X86 target) which replaces these generic operands with their corresponding concreter vector length variants. JBS : https://bugs.openjdk.java.net/browse/JDK-8230015 Patch : http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/ Kindly review and share your feedback. Best Regards, Jatin Bhateja [1] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From tobias.hartmann at oracle.com Thu Aug 22 06:51:23 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Aug 2019 08:51:23 +0200 Subject: RFR(S): 8229970: ZGC: C2: fixup_uses_in_catch may fail when expanding many uses In-Reply-To: <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> References: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> Message-ID: +1 Best regards, Tobias On 21.08.19 17:31, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 8/21/19 3:00 AM, Nils Eliasson wrote: >> Hi, >> >> This is a very small patch that fixes an issue when iterating over all uses to call >> fixup_uses_in_catch. The out array may be reallocated which might cause the iteration to miss some >> uses. This patch fix point iterates over each use instead. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8229970 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8229970/webrev.01/ >> >> Please review, >> >> // Nils >> From tobias.hartmann at oracle.com Thu Aug 22 06:53:14 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Aug 2019 08:53:14 +0200 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection In-Reply-To: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> References: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> Message-ID: <94d0110d-a333-8c58-e1c8-d80d6a9ac4bc@oracle.com> Hi Nils, looks good to me but please fix the comment in zBarrierSetC2.cpp:884 "wants" -> "want", "projetion" -> "projection". Best regards, Tobias On 21.08.19 12:30, Nils Eliasson wrote: > Hi, > > This is a workaround for the problem that sometimes non-CFG nodes have their ctrl > (PhaseIdealLoop::get_ctrl(n)) set directly to calls, instead of the call's control projection. I'm > working on a patch for that, but it might end up a bit intrusive. This is a workaround that simply > checks and adjusts for that case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 > > Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ > > Please review, > > // Nils > From nils.eliasson at oracle.com Thu Aug 22 07:29:11 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 22 Aug 2019 09:29:11 +0200 Subject: RFR(S): 8228839: Non-CFG nodes have control edges to calls, instead of the call's control projection In-Reply-To: <94d0110d-a333-8c58-e1c8-d80d6a9ac4bc@oracle.com> References: <26b7b326-d54c-cb41-ff44-52b2dc54978e@oracle.com> <94d0110d-a333-8c58-e1c8-d80d6a9ac4bc@oracle.com> Message-ID: <0ec1e90c-8a3c-d91b-401b-2127b5d1b167@oracle.com> Fixed. Thanks! /Nils On 2019-08-22 08:53, Tobias Hartmann wrote: > Hi Nils, > > looks good to me but please fix the comment in zBarrierSetC2.cpp:884 "wants" -> "want", "projetion" > -> "projection". > > Best regards, > Tobias > > On 21.08.19 12:30, Nils Eliasson wrote: >> Hi, >> >> This is a workaround for the problem that sometimes non-CFG nodes have their ctrl >> (PhaseIdealLoop::get_ctrl(n)) set directly to calls, instead of the call's control projection. I'm >> working on a patch for that, but it might end up a bit intrusive. This is a workaround that simply >> checks and adjusts for that case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8228839 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8228839/webrev.01/ >> >> Please review, >> >> // Nils >> From nils.eliasson at oracle.com Thu Aug 22 07:29:27 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 22 Aug 2019 09:29:27 +0200 Subject: RFR(S): 8229970: ZGC: C2: fixup_uses_in_catch may fail when expanding many uses In-Reply-To: References: <343acb66-620b-3aee-e9ed-b0e794ce46bd@oracle.com> <4e514f5e-a2f6-4526-d776-f786bdacf50d@oracle.com> Message-ID: <19185d1f-a247-311a-60df-688469546d2d@oracle.com> Thanks Tobias! // Nils On 2019-08-22 08:51, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 21.08.19 17:31, Vladimir Kozlov wrote: >> Good. >> >> Thanks, >> Vladimir >> >> On 8/21/19 3:00 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This is a very small patch that fixes an issue when iterating over all uses to call >>> fixup_uses_in_catch. The out array may be reallocated which might cause the iteration to miss some >>> uses. This patch fix point iterates over each use instead. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8229970 >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8229970/webrev.01/ >>> >>> Please review, >>> >>> // Nils >>> From tobias.hartmann at oracle.com Thu Aug 22 08:17:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Aug 2019 10:17:41 +0200 Subject: [14] RFR(T): 8230020: [BACKOUT] compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" Message-ID: Hi, please review the following patch that backs out the fix for JDK-8225670 because it caused a performance regression with several benchmarks. https://bugs.openjdk.java.net/browse/JDK-8230020 http://cr.openjdk.java.net/~thartmann/8230020/webrev.00/ I've also closed 8229446 as duplicate and removed the corresponding entry in the problem list. Thanks, Tobias From rwestrel at redhat.com Thu Aug 22 09:21:55 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 22 Aug 2019 11:21:55 +0200 Subject: [14] RFR(T): 8230020: [BACKOUT] compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: References: Message-ID: <87pnkxr1vw.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8230020/webrev.00/ Looks good to me. Roland. From tobias.hartmann at oracle.com Thu Aug 22 10:00:03 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Aug 2019 12:00:03 +0200 Subject: [14] RFR(T): 8230020: [BACKOUT] compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: <87pnkxr1vw.fsf@redhat.com> References: <87pnkxr1vw.fsf@redhat.com> Message-ID: <51109b4a-5756-f690-be23-0ec6beeebc6c@oracle.com> Thanks Roland. Best regards, Tobias On 22.08.19 11:21, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8230020/webrev.00/ > > Looks good to me. > > Roland. > From patrick at os.amperecomputing.com Thu Aug 22 11:20:00 2019 From: patrick at os.amperecomputing.com (Patrick Zhang OS) Date: Thu, 22 Aug 2019 11:20:00 +0000 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: <47a506e5-f5c6-40ce-4c89-2a06853eff71@oracle.com> References: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> <47a506e5-f5c6-40ce-4c89-2a06853eff71@oracle.com> Message-ID: Hi Christian, My test system shows this fix probably introduced a significant performance loss with a couple of apps/benchmarks on both aarch64 (Ampere eMAG) and x86 systems, I filed this ticket to track it. https://bugs.openjdk.java.net/browse/JDK-8230036 Regards Patrick -----Original Message----- From: hotspot-compiler-dev On Behalf Of Christian Hagedorn Sent: Friday, August 9, 2019 7:56 PM To: Erik ?sterlund ; hotspot-compiler-dev at openjdk.java.net; Tobias Hartmann Subject: Re: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" Hi Erik, hi Tobias Thank you for your reviews! On 09.08.19 11:21, Erik ?sterlund wrote: > Hi Christian, > > Looks good - well spotted. > > To answer your question - yes the GC (ZGC in particular, and probably > soon Shenandoah when they hook in to the concurrent class unloading > framework) cleans the extra data section of MDOs concurrently, under > the extra data lock. > > However, they clear whole rows from the extra data section, under the > extra data lock of the MDOs; they never write that the Klass is NULL. > So I believe this bug only relates to the use of the WhiteBox API. That was my guess, too. Thanks for clearing that up and answering the question. > The row clearing of concurrent GCs synchronizes with a metadata > preparation phase for unpacking MDOs to ciMDOs. The preparation phase > will in a fixed-point iteration try to create ci handles for all > encountered metadata under the extradata lock. Every time it > encounters an uncached metadata instance, it has to release the lock > due to ranking issues, and may also run into safepoints then. Such > situations are detected, triggering a restart of the fixed-pont iteration. > > Once the fixed-point iteration has finished, we know that we under the > lock walked all metadata in the extra data section without ever > releasing the lock, have ci handles keeping all metadata alive, and > can't have gotten any safepoints due to being in VM state. After that, > the rows are copied and translated, and now we are guaranteed that the > translation will always already have the ci handles cached. Thanks for the detailed explanation! > There is some random original copy of the raw MDO extra data that is > performed before preparing the metadata. I don't think it is really > used or needed. Might be interesting to remove in a future RFE. It > gets overwritten by the subsequent row-by-row processing after > metadata preparation. I created a new RFE [1] and referenced this conversation. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8229353 From tobias.hartmann at oracle.com Thu Aug 22 11:39:49 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 22 Aug 2019 13:39:49 +0200 Subject: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" In-Reply-To: References: <678f82db-e5b9-c5ad-9f6c-439b4c7cecd7@oracle.com> <5b38061a-2069-8772-2f42-86fa1749ef15@oracle.com> <47a506e5-f5c6-40ce-4c89-2a06853eff71@oracle.com> Message-ID: <8b9cc13d-8969-603d-cadf-c698d07a14fe@oracle.com> Hi Patrick, thanks, we've already noticed that regression and backed the fix out: https://bugs.openjdk.java.net/browse/JDK-8230020 Best regards, Tobias On 22.08.19 13:20, Patrick Zhang OS wrote: > Hi Christian, > > My test system shows this fix probably introduced a significant performance loss with a couple of apps/benchmarks on both aarch64 (Ampere eMAG) and x86 systems, I filed this ticket to track it. https://bugs.openjdk.java.net/browse/JDK-8230036 > > Regards > Patrick > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Christian Hagedorn > Sent: Friday, August 9, 2019 7:56 PM > To: Erik ?sterlund ; hotspot-compiler-dev at openjdk.java.net; Tobias Hartmann > Subject: Re: [14] RFR(S): 8225670: compiler/types/correctness/* tests fail with "assert(recv == __null || recv->is_klass()) failed: wrong type" > > Hi Erik, hi Tobias > > Thank you for your reviews! > > On 09.08.19 11:21, Erik ?sterlund wrote: >> Hi Christian, >> >> Looks good - well spotted. >> >> To answer your question - yes the GC (ZGC in particular, and probably >> soon Shenandoah when they hook in to the concurrent class unloading >> framework) cleans the extra data section of MDOs concurrently, under >> the extra data lock. >> >> However, they clear whole rows from the extra data section, under the >> extra data lock of the MDOs; they never write that the Klass is NULL. >> So I believe this bug only relates to the use of the WhiteBox API. > > That was my guess, too. Thanks for clearing that up and answering the question. > >> The row clearing of concurrent GCs synchronizes with a metadata >> preparation phase for unpacking MDOs to ciMDOs. The preparation phase >> will in a fixed-point iteration try to create ci handles for all >> encountered metadata under the extradata lock. Every time it >> encounters an uncached metadata instance, it has to release the lock >> due to ranking issues, and may also run into safepoints then. Such >> situations are detected, triggering a restart of the fixed-pont iteration. >> >> Once the fixed-point iteration has finished, we know that we under the >> lock walked all metadata in the extra data section without ever >> releasing the lock, have ci handles keeping all metadata alive, and >> can't have gotten any safepoints due to being in VM state. After that, >> the rows are copied and translated, and now we are guaranteed that the >> translation will always already have the ci handles cached. > > Thanks for the detailed explanation! > >> There is some random original copy of the raw MDO extra data that is >> performed before preparing the metadata. I don't think it is really >> used or needed. Might be interesting to remove in a future RFE. It >> gets overwritten by the subsequent row-by-row processing after >> metadata preparation. > > I created a new RFE [1] and referenced this conversation. > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8229353 > From fujie at loongson.cn Thu Aug 22 11:50:29 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 22 Aug 2019 19:50:29 +0800 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly Message-ID: Hi all, JBS:??? https://bugs.openjdk.java.net/browse/JDK-8230037 Webrev: http://cr.openjdk.java.net/~jiefu/8230037/webrev.00/ People may get confused with the 'MetaData' dumped by PrintOptoAssembly. For detailed info, please see the JBS. It might be better to make it more clear. Testing: ?- make test TEST="test/hotspot/jtreg:tier1" CONF=fastdebug on Linux/x64 ?- make test TEST="tier1"??????????????????? CONF=release?? on Linux/x64 Thanks a lot. Best regards, Jie From vladimir.kozlov at oracle.com Thu Aug 22 16:16:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Aug 2019 09:16:33 -0700 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: References: Message-ID: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> To avoid confusing I normally use -Xbatch and CICompilerCount=1 (or =2 for tiered compilation). The output is under ttyLocker so it should be one block. I see that there is mix of tty and xtty streams in code. May be that is the reason it is not together. If we use xtty it should be passed to print_metadata() too. Please, investigate more. Instead of "Last Normal Compilation" I would print Compile_id. Thanks, Vladimir On 8/22/19 4:50 AM, Jie Fu wrote: > Hi all, > > JBS:??? https://bugs.openjdk.java.net/browse/JDK-8230037 > Webrev: http://cr.openjdk.java.net/~jiefu/8230037/webrev.00/ > > People may get confused with the 'MetaData' dumped by PrintOptoAssembly. > For detailed info, please see the JBS. > It might be better to make it more clear. > > Testing: > ?- make test TEST="test/hotspot/jtreg:tier1" CONF=fastdebug on Linux/x64 > ?- make test TEST="tier1"??????????????????? CONF=release?? on Linux/x64 > > Thanks a lot. > Best regards, > Jie > > From vivek.r.deshpande at intel.com Thu Aug 22 19:26:11 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 22 Aug 2019 19:26:11 +0000 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <1fa86506-797b-6f18-bcc3-1d7f8d2a32ca@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> <1fa86506-797b-6f18-bcc3-1d7f8d2a32ca@loongson.cn> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2AAB72806A@fmsmsx121.amr.corp.intel.com> Hi Jie On AVX2 (256 bit vector) machine I did not observe the difference in the generated code, same as your observation. But on AVX3(512 bit/ 64 byte vector) machine the generated code with the patch was generating the AVX2 (256 bit) instructions instead of AVX3 (512 bit) instructions. So it is not able to use the complete vector width with the patch. As far as performance is concerned with this particular benchmark, that I have shared, and with given number of iterations in the benchmark, I did not observe any difference with the patch from original. So it's the difference in the generated code which is not using full vector width. Regards, Vivek -----Original Message----- From: Jie Fu [mailto:fujie at loongson.cn] Sent: Wednesday, August 21, 2019 8:53 PM To: Deshpande, Vivek R ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling Hi Vivek, Thanks for your review and comments. On 2019/8/22 ??3:37, Deshpande, Vivek R wrote: > The 2nd compilation(recompilation) of doit2 generates the code which > does not use full vector width I don't understand why we must compile with vector-512 since you didn't share the performance of vector-256 and vector-512 respectively. As for my reproducer[1], when vector-128 is used, performance becomes much slower than vector-64. Does vector-512 runs much faster than vector-256 on your manchine? Could you please share the detailed performance data? > and also does not unroll after vectorization. Sorry. I can't reproduce this issue on my computer. Here is the compilation log[2] of my testing. Probably this issue can be only triggered on an AVX-512 machine. I will investigate it once my AVX-512 machine is ready. If I missed anything please let me know. Thanks a lot. Best regards, Jie [1] http://cr.openjdk.java.net/~jiefu/8227505/TestSuperWordOverunrolling.java [2] http://cr.openjdk.java.net/~jiefu/8227505/log/SmallByteAdd.log256 From fujie at loongson.cn Fri Aug 23 00:53:56 2019 From: fujie at loongson.cn (Jie Fu) Date: Fri, 23 Aug 2019 08:53:56 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2AAB72806A@fmsmsx121.amr.corp.intel.com> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> <1fa86506-797b-6f18-bcc3-1d7f8d2a32ca@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB72806A@fmsmsx121.amr.corp.intel.com> Message-ID: <3724faa6-c57c-744a-d7e8-22daa4231078@loongson.cn> Hi Vivek, Thanks for your clarification. Please seem comments inline. On 2019/8/23 ??3:26, Deshpande, Vivek R wrote: > Hi Jie > > On AVX2 (256 bit vector) machine I did not observe the difference in the generated code, same as your observation. > > But on AVX3(512 bit/ 64 byte vector) machine the generated code with the patch was generating the AVX2 (256 bit) instructions instead of AVX3 (512 bit) instructions. > So it is not able to use the complete vector width with the patch. > As far as performance is concerned with this particular benchmark, that I have shared, and with given number of iterations in the benchmark, I did not observe any difference with the patch from original. As for your particular case, I don't think it's a problem to compile with vector-256 since there is no performance drop compared with vector-512. Instead, I'd prefer using vector-256 to lower the risk of over loop unrolling. Also I'm not sure whether the power consumption will increase if vector-512 is used on your machine. > So it's the difference in the generated code which is not using full vector width. According to your performance analysis, vector-256 is good enough for your test case. What's the benefit to generate vector-512 for your case? Well, the patch doesn't disable the generation of vector-512 at all. You can increase the NUM in your program from 1024 to 2048 or more and try again. Thanks. What do you think? Any comments? Thanks a lot. Best regards, Jie > > Regards, > Vivek From fujie at loongson.cn Fri Aug 23 01:57:19 2019 From: fujie at loongson.cn (Jie Fu) Date: Fri, 23 Aug 2019 09:57:19 +0800 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: Hi Vladimir, Thanks for your review and valuable comments. Please see comments inline. On 2019/8/23 ??12:16, Vladimir Kozlov wrote: > The output is under ttyLocker so it should be one block. Yes. It's in the same block with OptoAssembly. Although they are in the same block, the MetaData is always older than the OptoAssembly. The MetaData belongs to the last normal compilation of the method, not the current OptoAssembly. > I see that there is mix of tty and xtty streams in code. May be that > is the reason it is not together. If we use xtty it should be passed > to print_metadata() too. Please, investigate more. > I didn't notice this problem before. Will do. Thanks. > Instead of "Last Normal Compilation" I would print Compile_id. Good suggestion. Thanks. Thanks a lot. Best regards, Jie > > Thanks, > Vladimir > > On 8/22/19 4:50 AM, Jie Fu wrote: >> Hi all, >> >> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8230037 >> Webrev: http://cr.openjdk.java.net/~jiefu/8230037/webrev.00/ >> >> People may get confused with the 'MetaData' dumped by PrintOptoAssembly. >> For detailed info, please see the JBS. >> It might be better to make it more clear. >> >> Testing: >> ??- make test TEST="test/hotspot/jtreg:tier1" CONF=fastdebug on >> Linux/x64 >> ??- make test TEST="tier1"??????????????????? CONF=release?? on >> Linux/x64 >> >> Thanks a lot. >> Best regards, >> Jie >> >> From nils.eliasson at oracle.com Fri Aug 23 08:30:04 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 23 Aug 2019 10:30:04 +0200 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks Message-ID: Hi, This patch adds some verfication of clean_catch_blocks in zBarrierSetC2. This would have catched the bug i recently fixed: JDK-8229970 Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ Regards, Nils From rickard.backman at oracle.com Fri Aug 23 08:49:12 2019 From: rickard.backman at oracle.com (Rickard =?utf-8?Q?B=C3=A4ckman?=) Date: Fri, 23 Aug 2019 10:49:12 +0200 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks In-Reply-To: References: Message-ID: <20190823084911.j3okkjvolopouge7@rbackman> Looks good! /R On 08/23, Nils Eliasson wrote: > Hi, > > This patch adds some verfication of clean_catch_blocks in zBarrierSetC2. > This would have catched the bug i recently fixed: JDK-8229970 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 > > Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ > > Regards, > > Nils > From nils.eliasson at oracle.com Fri Aug 23 08:49:38 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 23 Aug 2019 10:49:38 +0200 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks In-Reply-To: <20190823084911.j3okkjvolopouge7@rbackman> References: <20190823084911.j3okkjvolopouge7@rbackman> Message-ID: Thank you! //N On 2019-08-23 10:49, Rickard B?ckman wrote: > Looks good! > > /R > > On 08/23, Nils Eliasson wrote: >> Hi, >> >> This patch adds some verfication of clean_catch_blocks in zBarrierSetC2. >> This would have catched the bug i recently fixed: JDK-8229970 >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ >> >> Regards, >> >> Nils >> From vladimir.kozlov at oracle.com Fri Aug 23 17:19:00 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Aug 2019 10:19:00 -0700 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks In-Reply-To: References: Message-ID: Hi Nils Why you execute two times?: clean_catch_blocks(phase); + DEBUG_ONLY(clean_catch_blocks(phase, true /* verify */);) Did you mean something like next?: clean_catch_blocks(phase, DEBUG_ONLY(true) NOT_DEBUG(false) /* verify */); Thanks, Vladimir On 8/23/19 1:30 AM, Nils Eliasson wrote: > Hi, > > This patch adds some verfication of clean_catch_blocks in zBarrierSetC2. This would have catched the bug i recently > fixed: JDK-8229970 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 > > Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ > > Regards, > > Nils > From nils.eliasson at oracle.com Fri Aug 23 18:53:44 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 23 Aug 2019 20:53:44 +0200 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks In-Reply-To: References: Message-ID: <6ab7df06-0e12-591c-07d6-0258aff25433@oracle.com> Hi Vladimir, Sorry - I should have explained in the RFR. The first call moves/clones any load that are found at a improper place. The second call, with verify set to true, will do the same IR traversal but assert if it can find a load still needing the transform. (So the second call never transforms anything.) This will catch problems with the transform not actually doing its full job (like missing some uses in 8229970), or transforming into a shape that is not ok. // Nils On 2019-08-23 19:19, Vladimir Kozlov wrote: > Hi Nils > > Why you execute two times?: > > ???? clean_catch_blocks(phase); > +??? DEBUG_ONLY(clean_catch_blocks(phase, true /* verify */);) > > Did you mean something like next?: > > clean_catch_blocks(phase, DEBUG_ONLY(true) NOT_DEBUG(false) /* verify > */); > > Thanks, > Vladimir > > On 8/23/19 1:30 AM, Nils Eliasson wrote: >> Hi, >> >> This patch adds some verfication of clean_catch_blocks in >> zBarrierSetC2. This would have catched the bug i recently fixed: >> JDK-8229970 >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ >> >> Regards, >> >> Nils >> From vladimir.kozlov at oracle.com Fri Aug 23 20:11:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Aug 2019 13:11:10 -0700 Subject: [Enh]: 8230091: Add verification of clean_catch_blocks In-Reply-To: <6ab7df06-0e12-591c-07d6-0258aff25433@oracle.com> References: <6ab7df06-0e12-591c-07d6-0258aff25433@oracle.com> Message-ID: Okay. Good. Thanks, Vladimir On 8/23/19 11:53 AM, Nils Eliasson wrote: > Hi Vladimir, > > Sorry - I should have explained in the RFR. > > The first call moves/clones any load that are found at a improper place. The second call, with verify set to true, will > do the same IR traversal but assert if it can find a load still needing the transform. (So the second call never > transforms anything.) > > This will catch problems with the transform not actually doing its full job (like missing some uses in 8229970), or > transforming into a shape that is not ok. > > // Nils > > On 2019-08-23 19:19, Vladimir Kozlov wrote: >> Hi Nils >> >> Why you execute two times?: >> >> ???? clean_catch_blocks(phase); >> +??? DEBUG_ONLY(clean_catch_blocks(phase, true /* verify */);) >> >> Did you mean something like next?: >> >> clean_catch_blocks(phase, DEBUG_ONLY(true) NOT_DEBUG(false) /* verify */); >> >> Thanks, >> Vladimir >> >> On 8/23/19 1:30 AM, Nils Eliasson wrote: >>> Hi, >>> >>> This patch adds some verfication of clean_catch_blocks in zBarrierSetC2. This would have catched the bug i recently >>> fixed: JDK-8229970 >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230091 >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8230091/webrev.01/ >>> >>> Regards, >>> >>> Nils >>> From martin.doerr at sap.com Mon Aug 26 13:04:27 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 26 Aug 2019 13:04:27 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <7035ccb8-000c-3a58-b5ac-fb0a3b949784@oracle.com> <381f185e-ca2e-50c4-fe35-1e5e62ff88f6@oracle.com> Message-ID: Hi all, I had noticed that the platforms selection which need a fence in taskqueue.inline.hpp should get updated. My initial webrev http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.00/ was already reviewed on hotspot-gc-dev. It is an attempt to make things more consistent, especially the property "CPU_MULTI_COPY_ATOMIC". Also the compiler constant "support_IRIW_for_not_multiple_copy_atomic_cpu" depends on this property (currently only used on PPC64). We could go one step further and move even more #defines into the platform files to give platform maintainers more control. I haven't got feedback from arm/aarch64 folks about this addition, yet: http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.01/ With this proposal, each platform which is "CPU_MULTI_COPY_ATOMIC" is supposed to define this macro. Other platforms must define SUPPORT_IRIW_FOR_NOT_MULTI_COPY_ATOMIC_CPU and IRIW_WITH_RELEASE_VOLATILE_IN_CONSTRUCTOR for fine-grained control of the memory ordering behavior. We can even control them dynamically (added an experimental switch for PPC64 as an example). Note that neither webrev.00 nor webrev.01 contain any functional changes other than the taskqueue update for s390 (and the experimental switch for PPC64 in webrev.01). Feedback is welcome. Also if you have a preference wrt. webrev.00 vs. webrev.01. Best regards, Martin From Xiaohong.Gong at arm.com Tue Aug 27 07:14:11 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Tue, 27 Aug 2019 07:14:11 +0000 Subject: RFR: 8230129: Add jtreg "serviceability/sa/ClhsdbInspect.java" to graal problem list. Message-ID: Hi, Please help to review this small patch: Webrew: http://cr.openjdk.java.net/~pli/rfr/8230129/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8230129 Jtreg test "serviceability/sa/ClhsdbInspect.java" fails when running with Graal. It fails when inspecting an address to check whether it's pointing to an expected oop or method, which is printed by running "jstack -v" firstly. When running with graal, it needs more java heap for the JVMCI initialization and the compiler working. So it's inevitable to make GC happen during the application running. If GC happens after runnning "jstack", the actual address of the oops and methods may be different when running "inpsect". And the address inspected may point to other object or nothing. A simple fix is to add this test to the graal problem list. Thanks, Xiaohong Gong From navy.xliu at gmail.com Tue Aug 27 08:34:00 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Tue, 27 Aug 2019 01:34:00 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: Message-ID: Hello, MailList, I make a new revision of that bugfix for JDK-8229450 . Could you review it? It's a very corner case. In graph https://bugs.openjdk.java.net/secure/attachment/84402/JDK-8229450.png, 364 Bool is shared by two if blocks. One happens to dominate another one and the one happens to be CountedLoopedEnd of a strip-mined loop. A safepoint node is middle of it. I followed Tobias' guideline. I make a testcase: TestMe.java. It's very similar to the graph generated from replay file. I intend to force line 40 and 53 share the boolean expression 'len < BINARY_VERSION_MARKER_SIZE', but still no luck. IGVN always removes one of them. I am pretty sure that replay file compiles the same graph, but some profile data keep 364 Bool like that. Could you give me a hint? thanks, --lx On Tue, Aug 20, 2019 at 12:52 AM Liu Xin wrote: > hi, Tobias, > > Thank you for providing this information. I will work on it with this > guideline. > > thanks, > --lx > > > > On Mon, Aug 19, 2019 at 11:46 PM Tobias Hartmann < > tobias.hartmann at oracle.com> wrote: > >> >> On 19.08.19 10:26, Liu Xin wrote: >> > I still have 2 questions. Could reviewers help me out? >> > >> > 1. I'd like to add a testcase, but JVM won't hit it even though I >> repeat >> > invoke that function thousands of times. I can see that function is >> > compiled by C2, but C2 succeeds to compile it. >> > The only way to reproduce that problem is using replay file. Is that >> > possible to build a testcase from a replay file? or What kinda of >> > information should I pull out of the replay file? >> >> Generating a simple regression test is a science in itself but such tests >> are extremely valuable. >> >> If a replay compilation file reproduces the failure, I usually attempt >> the following to create a >> simple regression test: >> - Try to simplify the replay compilation file as far as possible by >> removing inline statements >> - Disable as many C1/C2 optimizations [1] as possible so that the issue >> still reproduces (this will >> simplify the graph) >> - Debug the issue using replay compilation until you get a good >> understanding of the root cause and >> the required conditions (inlining, optimistic optimizations based on >> profiling, IR shape, ...) >> - Look at the Java code that is compiled and the way it is >> invoked/profiled. Not being able to >> reproduce with a normal run usually suggests that the profiling is >> different. >> - Write a test that invokes the same method in the same way (same >> arguments, same inlining). Use >> CompileCommands to control inling if necessary. Compare the IR generated >> for your test to the IR >> generated with the replay file (flags like -XX:+TraceLoopOpts also help) >> and modify your test to get >> closer. Sometimes this takes days but it's worth it. >> - Once the failure triggers, simplify the test as good as possible. >> - If you don't have a fix yet, use the simple regression test to debug >> further. >> >> > 2. I found you guys sometimes paste a sub-graph of Ideal nodes in JBS >> > issues. Do you have a script to render a IdealLoopTree? So far, I only >> have >> > idealgraphvisualizer. It renders the whole function.Too big to >> understand. >> >> Do you mean sub-graphs like [2]? That graph is created by the >> IdealGraphVisualizer. You can search >> for nodes or apply filters to find the interesting parts of the usually >> large graph and then click >> on nodes to expand/collapse other parts of the graph. >> >> I usually print the ids of offending nodes by modifying the sources and >> then look at the surrounding >> nodes with the IdealGraphVisualizer or directly at the -XX:+PrintIdeal >> output. Stepping through the >> optimization phases might help as well. >> >> Best regards, >> Tobias >> >> [1] Here are some C2 optimizations you might want to try to disable >> -XX:-OptimizePtrCompare -XX:-OptoPeephole -XX:LoopUnrollLimit=0 >> -XX:LoopMaxUnroll=0 >> -XX:-SuperWordLoopUnrollAnalysis -XX:-UseCountedLoopSafepoints >> -XX:-UseLoopPredicate >> -XX:-PartialPeelAtUnsignedTests -XX:-LoopUnswitching -XX:-UseSuperWord >> -XX:-SubsumeLoads >> -XX:-OptimizeStringConcat -XX:-SplitIfBlocks -XX:-RangeCheckElimination >> [2] http://cr.openjdk.java.net/~thartmann/8228888/8228888_graph.png >> > From rwestrel at redhat.com Tue Aug 27 09:23:57 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 27 Aug 2019 11:23:57 +0200 Subject: RFR(S): 8229483: Sinking load out of loop may trigger: assert(found_sfpt) failed: no node in loop that's not input to safepoint Message-ID: <87mufvotaq.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8229483/webrev.00/ The field load in the test case has an early control right below: barrier = 1; (it's a volatile store) and a late control that's conservatively set to be right above: array[0] = j; by anti dependence analysis. The store array[0] is sunk out of the inner loop in the outer strip mined loop right above the strip mined loop's safepoint. The load initially scheduled in the outer loop is a candidate for being cloned and sunk out of loop at the load usages. Because of the anti-dependence with the array[0], it is sunk into the outer strip mined loop eventhough it is not referenced by the safepoint. That breaks loop strip mining verification because the expectation is that any data node in the outer strip mined loop is there because it's referenced by the safepoint. The fix simply recognizes this special case when the load is being sunk out of loop and makes sure it is not moved in the outer strip mined loop. Unrelated to this fix, I wonder if the code at: http://hg.openjdk.java.net/jdk/jdk/file/cb836bd08d58/src/hotspot/share/opto/loopopts.cpp#l1346 really does what the comment says it does for loads (and really does anything useful actually). Late control for the load is computed with: late_load_ctrl = get_late_ctrl(n, n_ctrl); to make sure anti dependences are taken into account. But if a load is in a loop, it's because it has uses outside the loop. So late control for the load is in the loop too. When sinking the load, the restriction is that clones should not float below late control, then clones are going to stay in the loop. And that code doesn't do anything for loads. Or am I missing something? Roland. From rwestrel at redhat.com Tue Aug 27 11:45:15 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 27 Aug 2019 13:45:15 +0200 Subject: RFR(XS): aarch64: C2 OSR compilation fails with "shouldn't process one node several times" in final graph reshaping Message-ID: <87k1ayq1bo.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8229701/webrev.00/ In the compiled method of the test case, there are 2 ConvI2L nodes with the same input but different types. One of them is used twice as input to a single AddL nodes. The other is from the address calculation of the array access. The logic where the assert fires is specific to aarch64 and replaces convI2L nodes with the same inputs but different types with a single one with a wide type. That logic finds the array access ConvI2L first and tries to replace the other ConvI2L with it. It then hits the assert because that ConvI2L has 2 uses which are the same node, the AddL. That's perfectly legal and the assert is too strong. So I removed it and used an Unique_Node_List instead. The test case is a reduced version of the fuzzer test case: order of node processing in final graph reshaping matters so a straightfoward test doesn't trigger a failure. Roland. From shade at redhat.com Tue Aug 27 14:19:46 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Aug 2019 16:19:46 +0200 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8230238 8u fails without JDK-8134739 fix. Unfortunately, it does not reproduce on compiler/loopopts/superword/TestVectorizationWithInvariant from there. It does, however, reproduce on fuzzer tests that we luckily got (attached as 0021.tar.gz in JDK-8134739). We should really turn that into proper additional regression test for JDK-8134739. This would allow us to test 8u backports better. I am planning to backport it to 11u and 8u (where it would expose the bug we want to fix). jdk/jdk webrev: http://cr.openjdk.java.net/~shade/8230238/webrev.01/ Testing: new test with C1, C2, -Xcomp; the same in 8u (where it fails without 8134739, passes with it); jdk-submit (running) -- Thanks, -Aleksey From vladimir.kozlov at oracle.com Tue Aug 27 16:02:11 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Aug 2019 09:02:11 -0700 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 In-Reply-To: References: Message-ID: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> Looks good. Did you tested it with Graal? Vladimir On 8/27/19 7:19 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8230238 > > 8u fails without JDK-8134739 fix. Unfortunately, it does not reproduce on > compiler/loopopts/superword/TestVectorizationWithInvariant from there. It does, however, reproduce > on fuzzer tests that we luckily got (attached as 0021.tar.gz in JDK-8134739). We should really turn > that into proper additional regression test for JDK-8134739. > > This would allow us to test 8u backports better. I am planning to backport it to 11u and 8u (where > it would expose the bug we want to fix). > > jdk/jdk webrev: > http://cr.openjdk.java.net/~shade/8230238/webrev.01/ > > Testing: new test with C1, C2, -Xcomp; the same in 8u (where it fails without 8134739, passes with > it); jdk-submit (running) > From vladimir.kozlov at oracle.com Tue Aug 27 16:35:35 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Aug 2019 09:35:35 -0700 Subject: RFR(XS): aarch64: C2 OSR compilation fails with "shouldn't process one node several times" in final graph reshaping In-Reply-To: <87k1ayq1bo.fsf@redhat.com> References: <87k1ayq1bo.fsf@redhat.com> Message-ID: Good. Thanks, Vladimir On 8/27/19 4:45 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8229701/webrev.00/ > > In the compiled method of the test case, there are 2 ConvI2L nodes with > the same input but different types. One of them is used twice as input > to a single AddL nodes. The other is from the address calculation of the > array access. The logic where the assert fires is specific to aarch64 > and replaces convI2L nodes with the same inputs but different types with > a single one with a wide type. That logic finds the array access ConvI2L > first and tries to replace the other ConvI2L with it. It then hits the > assert because that ConvI2L has 2 uses which are the same node, the > AddL. That's perfectly legal and the assert is too strong. So I removed > it and used an Unique_Node_List instead. The test case is a reduced > version of the fuzzer test case: order of node processing in final graph > reshaping matters so a straightfoward test doesn't trigger a failure. > > Roland. > From shade at redhat.com Tue Aug 27 16:56:45 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Aug 2019 18:56:45 +0200 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 In-Reply-To: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> References: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> Message-ID: <88551ea1-042a-0f85-91e1-e012e4bd2827@redhat.com> On 8/27/19 6:02 PM, Vladimir Kozlov wrote: > Looks good. Did you tested it with Graal? Nope, I did not test with Graal. I mostly eyeballed different C1/C2 configs to see if it runs within the appropriate time (<10s). If you want me to try Graal, I would need instructions and some more time next week. -- Thanks, -Aleksey From vladimir.kozlov at oracle.com Tue Aug 27 17:04:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Aug 2019 10:04:02 -0700 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 In-Reply-To: <88551ea1-042a-0f85-91e1-e012e4bd2827@redhat.com> References: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> <88551ea1-042a-0f85-91e1-e012e4bd2827@redhat.com> Message-ID: <39e59e95-2b01-3485-fe40-ff52d485aff7@oracle.com> On 8/27/19 9:56 AM, Aleksey Shipilev wrote: > On 8/27/19 6:02 PM, Vladimir Kozlov wrote: >> Looks good. Did you tested it with Graal? > > Nope, I did not test with Graal. I mostly eyeballed different C1/C2 configs to see if it runs within > the appropriate time (<10s). > > If you want me to try Graal, I would need instructions and some more time next week. Run with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal" It will use Graal instead of C2. It could be really slow if you switch off -XX:-TieredCompilation so run it in Tiered mode. Vladimir From shade at redhat.com Tue Aug 27 17:22:14 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 27 Aug 2019 19:22:14 +0200 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 In-Reply-To: <39e59e95-2b01-3485-fe40-ff52d485aff7@oracle.com> References: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> <88551ea1-042a-0f85-91e1-e012e4bd2827@redhat.com> <39e59e95-2b01-3485-fe40-ff52d485aff7@oracle.com> Message-ID: <40095bd1-cfec-ee63-e0dc-bd49815dcea6@redhat.com> On 8/27/19 7:04 PM, Vladimir Kozlov wrote: > On 8/27/19 9:56 AM, Aleksey Shipilev wrote: >> On 8/27/19 6:02 PM, Vladimir Kozlov wrote: >>> Looks good. Did you tested it with Graal? >> >> Nope, I did not test with Graal. I mostly eyeballed different C1/C2 configs to see if it runs within >> the appropriate time (<10s). >> >> If you want me to try Graal, I would need instructions and some more time next week. > > Run with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler > -Djvmci.Compiler=graal" > > It will use Graal instead of C2. It could be really slow if you switch off -XX:-TieredCompilation so > run it in Tiered mode. Ran with Graal and both +|-TieredCompilation, works fine. -- Thanks, -Aleksey From vladimir.kozlov at oracle.com Tue Aug 27 17:54:31 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Aug 2019 10:54:31 -0700 Subject: RFR (S) 8230238: Add another regression test for JDK-8134739 In-Reply-To: <40095bd1-cfec-ee63-e0dc-bd49815dcea6@redhat.com> References: <01d3296f-89b0-8ce8-7456-ec32d8b73a1f@oracle.com> <88551ea1-042a-0f85-91e1-e012e4bd2827@redhat.com> <39e59e95-2b01-3485-fe40-ff52d485aff7@oracle.com> <40095bd1-cfec-ee63-e0dc-bd49815dcea6@redhat.com> Message-ID: <9f99b987-ef0b-7b0a-2800-fc9b988b4198@oracle.com> Perfect. Thank you for testing. Vladimir On 8/27/19 10:22 AM, Aleksey Shipilev wrote: > On 8/27/19 7:04 PM, Vladimir Kozlov wrote: >> On 8/27/19 9:56 AM, Aleksey Shipilev wrote: >>> On 8/27/19 6:02 PM, Vladimir Kozlov wrote: >>>> Looks good. Did you tested it with Graal? >>> >>> Nope, I did not test with Graal. I mostly eyeballed different C1/C2 configs to see if it runs within >>> the appropriate time (<10s). >>> >>> If you want me to try Graal, I would need instructions and some more time next week. >> >> Run with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler >> -Djvmci.Compiler=graal" >> >> It will use Graal instead of C2. It could be really slow if you switch off -XX:-TieredCompilation so >> run it in Tiered mode. > > Ran with Graal and both +|-TieredCompilation, works fine. > From daniil.x.titov at oracle.com Tue Aug 27 21:08:57 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 27 Aug 2019 14:08:57 -0700 Subject: 8195600: [Graal] jdi tests timeouts with Graal because debuggee vm is not resumed In-Reply-To: <14CB21E2-FD4F-4424-B1F5-97F82A17E36C@oracle.com> References: <855244c9-014c-59d1-cda0-b5f38057f588@oracle.com> <8df66d4e-df9d-c502-3510-30c22ba58445@oracle.com> <15699850-37f5-93f6-6a55-525a4d099bd3@oracle.com> <95eb05da-cf13-f14b-74a6-9e8bf604b29b@oracle.com> <14CB21E2-FD4F-4424-B1F5-97F82A17E36C@oracle.com> Message-ID: Hi Dean and Chris, Just wanted to check with you would it be OK now to add this issue to Graal-specific problem list, as Dean suggested in one of the previous emails, while the proposal about introducing new options for @requires is being discussed? -Thanks! --Daniil ?On 8/9/19, 3:37 PM, "hotspot-compiler-dev-bounces at openjdk.java.net on behalf of dean.long at oracle.com" wrote: Good question When we have libgraal, there will still be an option (at least for debugging) to turn it off and use Graal the same way we do now, so it seems like the @requires would need to take that into account once we have libgraal. Maybe we will need a new "vm.libgraal.enabled" or make "vm.graal.enabled" be false for libgraal? It does seem a little backwards to require tests to know about the OOM handling details of different JVM features. Instead, how about if we let the test assert that it requires "vm.no-background-oom" or whatever, and let the JVM decide if it supports it. CC'ing hotspot-compiler-dev. dl On 8/8/19 7:42 PM, Chris Plummer wrote: > Actually looking at JDK-8207267 a little closer, it looks like it's > job is to re-enable tests that have been disabled with @requires > !vm.graal.enabled, so it looks like we have two different approaches > going in here. Which is preferred? If the preference is to problem > list, do we want to undo JDK-8207261 (except use JDK-8196611 as the CR). > > Chris > > On 8/8/19 5:08 PM, Chris Plummer wrote: >> That sounds like a better approach to me. >> >> thanks, >> >> Chris >> >> On 8/8/19 4:33 PM, dean.long at oracle.com wrote: >>> This is the kind of failure that is expected to go away with >>> libgraal. You can add the tests to the Graal-specific problem list >>> (see JDK-8196611) and they should be re-enabled with libgraal (see >>> JDK-JDK-8207267). >>> >>> dl >>> >>> On 8/8/19 10:21 AM, Chris Plummer wrote: >>>> Hi Daniil, >>>> >>>> My only objection is at some point it seems we need to be able to >>>> run these tests with graal (and other tests that have been disabled >>>> due to graal) because graal might be the only compiler, and we'll >>>> lose test coverage without these tests. Currently we have 260 jtreg >>>> tests disabled due to graal. I'm not sure to what extent they are >>>> waiting on graal fixes or otherwise have a bug filed to eventually >>>> fix them. Would be nice if we had a process in place to make sure >>>> these issues are eventually addressed. That fact that tests that >>>> exhaust memory in general seem to be incompatible with graal would >>>> to be the bigger issue that needs to be addressed. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 8/7/19 3:38 PM, Daniil Titov wrote: >>>>> Please review the change that fixes the failing tests when running >>>>> with Graal. The issue originally >>>>> included several vmTestbase/nsk/jdi tests but only 2 of them still >>>>> fail: >>>>> - >>>>> vmTestbase/nsk/jdi/VirtualMachine/instanceCounts/instancecounts003/instancecounts003.java >>>>> - >>>>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects002/referringObjects002.java >>>>> >>>>> The problem with these two tests is that they consume all memory >>>>> to force the class unloading that >>>>> results in the exception during JVMCI compiler initialization and >>>>> the test failure. >>>>> The fix filters these tests out to not run with Graal compiler. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~dtitov/8195600/webrev.01/ >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8195600 >>>>> >>>>> Thanks, >>>>> Daniil >>>>> >>>>> >>>> >>> >> >> > > From chris.plummer at oracle.com Tue Aug 27 21:50:28 2019 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 27 Aug 2019 14:50:28 -0700 Subject: 8195600: [Graal] jdi tests timeouts with Graal because debuggee vm is not resumed In-Reply-To: References: <855244c9-014c-59d1-cda0-b5f38057f588@oracle.com> <8df66d4e-df9d-c502-3510-30c22ba58445@oracle.com> <15699850-37f5-93f6-6a55-525a4d099bd3@oracle.com> <95eb05da-cf13-f14b-74a6-9e8bf604b29b@oracle.com> <14CB21E2-FD4F-4424-B1F5-97F82A17E36C@oracle.com> Message-ID: I'm not sure. You could problem list it, but then the question is which bug to problem list it under, JDK-8195600 or JDK-8207267 (in which case JDK-8195600 would be closed). I'd hate to see a separate CR for every test that fails due to graal unexpectedly executing java code. But then JDK-8207267 seems to be more about getting rid of the use of @requires once libgraal is added, not going through the graal problemlist. Chris On 8/27/19 2:08 PM, Daniil Titov wrote: > Hi Dean and Chris, > > Just wanted to check with you would it be OK now to add this issue to > Graal-specific problem list, as Dean suggested in one of the previous emails, > while the proposal about introducing new options for @requires is being discussed? > > -Thanks! > --Daniil > > > > ?On 8/9/19, 3:37 PM, "hotspot-compiler-dev-bounces at openjdk.java.net on behalf of dean.long at oracle.com" wrote: > > Good question When we have libgraal, there will still be an option (at > least for debugging) to turn it off and use Graal the same way we do > now, so it seems like the @requires would need to take that into account > once we have libgraal. Maybe we will need a new "vm.libgraal.enabled" > or make "vm.graal.enabled" be false for libgraal? > > It does seem a little backwards to require tests to know about the OOM > handling details of different JVM features. Instead, how about if we > let the test assert that it requires "vm.no-background-oom" or whatever, > and let the JVM decide if it supports it. > > CC'ing hotspot-compiler-dev. > > dl > > On 8/8/19 7:42 PM, Chris Plummer wrote: > > Actually looking at JDK-8207267 a little closer, it looks like it's > > job is to re-enable tests that have been disabled with @requires > > !vm.graal.enabled, so it looks like we have two different approaches > > going in here. Which is preferred? If the preference is to problem > > list, do we want to undo JDK-8207261 (except use JDK-8196611 as the CR). > > > > Chris > > > > On 8/8/19 5:08 PM, Chris Plummer wrote: > >> That sounds like a better approach to me. > >> > >> thanks, > >> > >> Chris > >> > >> On 8/8/19 4:33 PM, dean.long at oracle.com wrote: > >>> This is the kind of failure that is expected to go away with > >>> libgraal. You can add the tests to the Graal-specific problem list > >>> (see JDK-8196611) and they should be re-enabled with libgraal (see > >>> JDK-JDK-8207267). > >>> > >>> dl > >>> > >>> On 8/8/19 10:21 AM, Chris Plummer wrote: > >>>> Hi Daniil, > >>>> > >>>> My only objection is at some point it seems we need to be able to > >>>> run these tests with graal (and other tests that have been disabled > >>>> due to graal) because graal might be the only compiler, and we'll > >>>> lose test coverage without these tests. Currently we have 260 jtreg > >>>> tests disabled due to graal. I'm not sure to what extent they are > >>>> waiting on graal fixes or otherwise have a bug filed to eventually > >>>> fix them. Would be nice if we had a process in place to make sure > >>>> these issues are eventually addressed. That fact that tests that > >>>> exhaust memory in general seem to be incompatible with graal would > >>>> to be the bigger issue that needs to be addressed. > >>>> > >>>> thanks, > >>>> > >>>> Chris > >>>> > >>>> On 8/7/19 3:38 PM, Daniil Titov wrote: > >>>>> Please review the change that fixes the failing tests when running > >>>>> with Graal. The issue originally > >>>>> included several vmTestbase/nsk/jdi tests but only 2 of them still > >>>>> fail: > >>>>> - > >>>>> vmTestbase/nsk/jdi/VirtualMachine/instanceCounts/instancecounts003/instancecounts003.java > >>>>> - > >>>>> vmTestbase/nsk/jdi/ObjectReference/referringObjects/referringObjects002/referringObjects002.java > >>>>> > >>>>> The problem with these two tests is that they consume all memory > >>>>> to force the class unloading that > >>>>> results in the exception during JVMCI compiler initialization and > >>>>> the test failure. > >>>>> The fix filters these tests out to not run with Graal compiler. > >>>>> > >>>>> Webrev: http://cr.openjdk.java.net/~dtitov/8195600/webrev.01/ > >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8195600 > >>>>> > >>>>> Thanks, > >>>>> Daniil > >>>>> > >>>>> > >>>> > >>> > >> > >> > > > > > > > > > From Xiaohong.Gong at arm.com Wed Aug 28 02:48:41 2019 From: Xiaohong.Gong at arm.com (Xiaohong Gong (Arm Technology China)) Date: Wed, 28 Aug 2019 02:48:41 +0000 Subject: 8229797: [JVMCI] Clean up no longer used JVMCI::dependencies_invalid value. Message-ID: Hi, Please help to review this jvmci patch: Webrew: http://cr.openjdk.java.net/~pli/rfr/8229797/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8229797 This patch fix issue: https://github.com/oracle/graal/issues/1587. The loading of new classes can cause dependencies to become false, which requires the dependent nmethods to be discarded and deoptimized. So if validating dependencies fails, it should make the result to be JVMCI::dependencies_failed, which makes jvmci throw the BailoutException. The invalid dependencies happen at the time of installation without any intervening modification of the system dictionary. So as the system dictionary modification optimization has been removed, the compiler can not know whether the failed dependencies are triggered by class reloading or not. It's better to use dependencies_failed to mark the result. Thanks, Xiaohong Gong From rwestrel at redhat.com Wed Aug 28 08:09:06 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Aug 2019 10:09:06 +0200 Subject: RFR(S): 8230061: # assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node Message-ID: <87ef15pv8d.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8230061/webrev.00/ An LShiftI, that's in an outer strip mined loop because it's referenced from the safepoint node, is transformed by PhaseIdealLoop::remix_address_expressions(). This causes dead nodes to be produced. In the same loop opts pass, the inner strip mined loop is unrolled. When the loop body is cloned, C2 hits the dead nodes created above (uses from the loop body in the outer strip mined loop not referenced from the safepoint) and the assert fires. The fix I propose is to relax the assert so it takes dead nodes into account. Roland. From tobias.hartmann at oracle.com Wed Aug 28 09:07:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 28 Aug 2019 11:07:15 +0200 Subject: RFR(S): 8230061: # assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node In-Reply-To: <87ef15pv8d.fsf@redhat.com> References: <87ef15pv8d.fsf@redhat.com> Message-ID: Hi Roland, Why do you need a cast to Node* in node.cpp:711? And wouldn't it be better to bail out of the loop and return false if u == root? Thanks, Tobias On 28.08.19 10:09, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8230061/webrev.00/ > > An LShiftI, that's in an outer strip mined loop because it's referenced > from the safepoint node, is transformed by > PhaseIdealLoop::remix_address_expressions(). This causes dead nodes to > be produced. In the same loop opts pass, the inner strip mined loop is > unrolled. When the loop body is cloned, C2 hits the dead nodes created > above (uses from the loop body in the outer strip mined loop not > referenced from the safepoint) and the assert fires. The fix I propose > is to relax the assert so it takes dead nodes into account. > > Roland. > From tobias.hartmann at oracle.com Wed Aug 28 09:12:17 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 28 Aug 2019 11:12:17 +0200 Subject: RFR(XS): aarch64: C2 OSR compilation fails with "shouldn't process one node several times" in final graph reshaping In-Reply-To: References: <87k1ayq1bo.fsf@redhat.com> Message-ID: <91a93b1f-b756-ae1e-edee-9b6377af15d3@oracle.com> +1 Best regards, Tobias On 27.08.19 18:35, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 8/27/19 4:45 AM, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8229701/webrev.00/ >> >> In the compiled method of the test case, there are 2 ConvI2L nodes with >> the same input but different types. One of them is used twice as input >> to a single AddL nodes. The other is from the address calculation of the >> array access. The logic where the assert fires is specific to aarch64 >> and replaces convI2L nodes with the same inputs but different types with >> a single one with a wide type. That logic finds the array access ConvI2L >> first and tries to replace the other ConvI2L with it. It then hits the >> assert because that ConvI2L has 2 uses which are the same node, the >> AddL. That's perfectly legal and the assert is too strong. So I removed >> it and used an Unique_Node_List instead. The test case is a reduced >> version of the fuzzer test case: order of node processing in final graph >> reshaping matters so a straightfoward test doesn't trigger a failure. >> >> Roland. >> From Yang.Zhang at arm.com Wed Aug 28 09:18:45 2019 From: Yang.Zhang at arm.com (Yang Zhang (Arm Technology China)) Date: Wed, 28 Aug 2019 09:18:45 +0000 Subject: 8230015: [instruction selector] generic vector operands support. In-Reply-To: References: Message-ID: Hi Jatin The question how to reduce code size is discussed previously. I also create a JBS to track it under panama project. https://bugs.openjdk.java.net/browse/JDK-8229866 There is a more aggressive idea. http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033362.html Using the idea of generic vector operands, I implement an example (vaddB/S/I) in AArch64 platform. The code size reduction in libjvm.so is ~30kb. If all the vector instructions are merged, the estimated size reduction will be ~300kb. How about using a more aggressive way to reduce code size? PS. When I test this patch in AArch64 platform, build fails with the log undefined reference to `Matcher::do_post_selection_processing(Compile*, Node*)'. I fix this failure by adding #ifdef X86 to these code. Regards Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of Bhateja, Jatin Sent: Thursday, August 22, 2019 2:50 PM To: hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov Subject: 8230015: [instruction selector] generic vector operands support. Hi All, Please find below a patch for generic vector operands[1] support during instruction selection. Motivation behind the patch is to reduce the number of vector selection patterns whose operands meagerly differ in vector lengths. This will not only result in lesser code being generated by ADLC which effectively translates to size reduction in libjvm.so but also help in better maintenance of AD files. Using generic operands we were able to collapse multiple vector patterns over mainline Initial number of vector instruction patterns (vec[XYZSD] + legVec[ZXYSD] : 510 Reduced vector instruction patterns (vecG + legVecG) : 222 With this we could see around 1MB size reduction in libjvm.so. In order to have minimal impact over downstream compiler passes, a post-selection pass has been introduced (currently enabled only for X86 target) which replaces these generic operands with their corresponding concreter vector length variants. JBS : https://bugs.openjdk.java.net/browse/JDK-8230015 Patch : http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/ Kindly review and share your feedback. Best Regards, Jatin Bhateja [1] http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_operands_support_v1.0.pdf From tobias.hartmann at oracle.com Wed Aug 28 09:31:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 28 Aug 2019 11:31:01 +0200 Subject: RFR(S): 8229483: Sinking load out of loop may trigger: assert(found_sfpt) failed: no node in loop that's not input to safepoint In-Reply-To: <87mufvotaq.fsf@redhat.com> References: <87mufvotaq.fsf@redhat.com> Message-ID: <95293c8e-be51-ce33-f509-75aad32ac4c4@oracle.com> Hi Roland, On 27.08.19 11:23, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8229483/webrev.00/ I've noticed that field2 in the test is unused. > The field load in the test case has an early control right below: > > barrier = 1; > > (it's a volatile store) > > and a late control that's conservatively set to be right above: > > array[0] = j; > > by anti dependence analysis. > > The store array[0] is sunk out of the inner loop in the outer strip > mined loop right above the strip mined loop's safepoint. The load > initially scheduled in the outer loop is a candidate for being cloned > and sunk out of loop at the load usages. Because of the anti-dependence > with the array[0], it is sunk into the outer strip mined loop eventhough > it is not referenced by the safepoint. That breaks loop strip mining > verification because the expectation is that any data node in the outer > strip mined loop is there because it's referenced by the safepoint. I don't see a field load in the loop, so which load are you referring to? > The fix simply recognizes this special case when the load is being sunk > out of loop and makes sure it is not moved in the outer strip mined > loop. Wouldn't it be better to relax the verification code if possible? If we ever fix dependency analysis to be less restrictive (see JDK-8229449), we should move the load out of the loop, right? > Unrelated to this fix, I wonder if the code at: > > http://hg.openjdk.java.net/jdk/jdk/file/cb836bd08d58/src/hotspot/share/opto/loopopts.cpp#l1346 > > really does what the comment says it does for loads (and really does > anything useful actually). I'm not really familiar with that code but maybe file a RFE to investigate this later. Best regards, Tobias From jatin.bhateja at intel.com Wed Aug 28 09:44:35 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Wed, 28 Aug 2019 09:44:35 +0000 Subject: 8230015: [instruction selector] generic vector operands support. In-Reply-To: References: Message-ID: Hi Yang, Thanks for your response. We are also working over the idea of moving out the operator (uOp and bOp) into a separate ideal node, but that will need changes in shape of ideal graph. Generic operands is another level of optimization which will complement "aggressive optimization". In addition, it will also reduce the number of checks for different vector operand type (vecS, vecD, vecX, vecY, vecZ and their legacy variants) within ADLC generate DFA used while matching, since there will only be two vector operands (vecG and legVecG) now. Current patch enables this support only for x86 target, to get a feedback from community. Best Regards, Jatin > -----Original Message----- > From: Yang Zhang (Arm Technology China) > Sent: Wednesday, August 28, 2019 2:49 PM > To: Bhateja, Jatin ; hotspot-compiler- > dev at openjdk.java.net > Cc: Vladimir Kozlov > Subject: RE: 8230015: [instruction selector] generic vector operands support. > > Hi Jatin > > The question how to reduce code size is discussed previously. I also create a > JBS to track it under panama project. > https://bugs.openjdk.java.net/browse/JDK-8229866 > There is a more aggressive idea. > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- > April/033362.html > > Using the idea of generic vector operands, I implement an example > (vaddB/S/I) in AArch64 platform. The code size reduction in libjvm.so is > ~30kb. If all the vector instructions are merged, the estimated size reduction > will be ~300kb. > > How about using a more aggressive way to reduce code size? > > PS. When I test this patch in AArch64 platform, build fails with the log > undefined reference to `Matcher::do_post_selection_processing(Compile*, > Node*)'. > I fix this failure by adding #ifdef X86 to these code. > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Bhateja, Jatin > Sent: Thursday, August 22, 2019 2:50 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Vladimir Kozlov > Subject: 8230015: [instruction selector] generic vector operands support. > > Hi All, > > Please find below a patch for generic vector operands[1] support during > instruction selection. > > Motivation behind the patch is to reduce the number of vector selection > patterns whose operands meagerly differ in vector lengths. > This will not only result in lesser code being generated by ADLC which > effectively translates to size reduction in libjvm.so but also help in better > maintenance of AD files. > > Using generic operands we were able to collapse multiple vector patterns > over mainline > Initial number of vector instruction patterns (vec[XYZSD] + > legVec[ZXYSD] : 510 > Reduced vector instruction patterns (vecG + legVecG) > : 222 > > With this we could see around 1MB size reduction in libjvm.so. > > In order to have minimal impact over downstream compiler passes, a post- > selection pass has been introduced (currently enabled only for X86 target) > which replaces these generic operands with their corresponding concreter > vector length variants. > > JBS : https://bugs.openjdk.java.net/browse/JDK-8230015 > Patch : > http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/ > > Kindly review and share your feedback. > > Best Regards, > Jatin Bhateja > > [1] > http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_opera > nds_support_v1.0.pdf > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended recipient, > please notify the sender immediately and do not disclose the contents to any > other person, use it for any purpose, or store or copy the information in any > medium. Thank you. From tobias.hartmann at oracle.com Wed Aug 28 09:42:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 28 Aug 2019 11:42:41 +0200 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: Message-ID: <45138947-3c8e-9925-25cd-670afb1eb7cb@oracle.com> On 27.08.19 10:34, Liu Xin wrote: > I followed Tobias' guideline.? I make a testcase: TestMe.java. It's very similar to the graph > generated from replay file. I intend to force line 40 and 53 share the boolean expression 'len < > BINARY_VERSION_MARKER_SIZE', but still no luck.? > IGVN always removes one of them. I am pretty sure that replay file compiles the same graph, but some > profile data keep 364 Bool like that. Could you give me a hint?? Just quickly looked at your test: - line 53: How can bytes ever be NULL? - Compared to the replay file, are you sure that the same paths are being marked as taken with your test? For example, is the break in line 43 executed or the return in line 54? It seems that you are always running with the exact same array whereas it seems likely that the original run used different arrays (and therefore the profile in the replay reports different paths to be taken). - Did you compare the output of TraceLoopOpts? Best regards, Tobias From tobias.hartmann at oracle.com Wed Aug 28 13:14:34 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 28 Aug 2019 15:14:34 +0200 Subject: [14] RFR(M): 8229496: SIGFPE (division by zero) in C2 OSR compiled method Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8229496 http://cr.openjdk.java.net/~thartmann/8229496/webrev.00/ A DivI/ModINode loses direct control dependency to its div != 0 check (but remains being dominated by it) and is then moved to before that check by PhaseIdealLoop::dominated_by() because depends_only_on_test() is true. Details: - The inner do-while loop is unswitched based on the div != 0 check added by GraphKit::zero_check_int() when parsing the irem/idiv bytecode. - The div == 0 loop is always throwing an arithmetic exception (loop is removed), the div != 0 loop computes 1 % div where the ModI is not directly control dependent on the div != 0 check anymore (but on the preceding array store check). - Loop predicates are added to the outer loop in PhaseIdealLoop::loop_predication_impl_helper() and since the ModI is control dependent on the array store range check once that one is converted to a predicate, PhaseIdealLoop::dominated_by() moves the ModI up as well because depends_only_on_test() is true: http://hg.openjdk.java.net/jdk/jdk/file/cb836bd08d58/src/hotspot/share/opto/loopopts.cpp#l272 - As a result, ModI is moved to before the outer loop while the div != 0 check from unswitching is before the inner loop(s). I've discussed this with Roland and he suggested adding a CastNode to keep the dependency between the div/mod operation and the zero check. I had to add a CastLLNode to implement this for long div/mod as well. Thanks, Tobias From rwestrel at redhat.com Wed Aug 28 13:57:58 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Aug 2019 15:57:58 +0200 Subject: RFR(S): 8230061: # assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node In-Reply-To: References: <87ef15pv8d.fsf@redhat.com> Message-ID: <878srdpf2x.fsf@redhat.com> Hi Tobias, Thanks for reviewing this. > Why do you need a cast to Node* in node.cpp:711? To cast const-ness out. > And wouldn't it be better to bail out of the loop and return false if u == root? Of course your right. I also switched to is_reachable_from_root() (rather than unreachable) which feels more natural. Here is a new webrev: http://cr.openjdk.java.net/~roland/8230061/webrev.01/ Roland. From rwestrel at redhat.com Wed Aug 28 14:32:28 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Aug 2019 16:32:28 +0200 Subject: RFR(S): 8229483: Sinking load out of loop may trigger: assert(found_sfpt) failed: no node in loop that's not input to safepoint In-Reply-To: <95293c8e-be51-ce33-f509-75aad32ac4c4@oracle.com> References: <87mufvotaq.fsf@redhat.com> <95293c8e-be51-ce33-f509-75aad32ac4c4@oracle.com> Message-ID: <8736hlpdhf.fsf@redhat.com> Thanks for reviewing this, Tobias. >> http://cr.openjdk.java.net/~roland/8229483/webrev.00/ > > I've noticed that field2 in the test is unused. Good catch. I'll remove it. >> The field load in the test case has an early control right below: >> >> barrier = 1; >> >> (it's a volatile store) >> >> and a late control that's conservatively set to be right above: >> >> array[0] = j; >> >> by anti dependence analysis. >> >> The store array[0] is sunk out of the inner loop in the outer strip >> mined loop right above the strip mined loop's safepoint. The load >> initially scheduled in the outer loop is a candidate for being cloned >> and sunk out of loop at the load usages. Because of the anti-dependence >> with the array[0], it is sunk into the outer strip mined loop eventhough >> it is not referenced by the safepoint. That breaks loop strip mining >> verification because the expectation is that any data node in the outer >> strip mined loop is there because it's referenced by the safepoint. > > I don't see a field load in the loop, so which load are you referring to? 58 return field + res + field * 2; the load of static field "field". >> The fix simply recognizes this special case when the load is being sunk >> out of loop and makes sure it is not moved in the outer strip mined >> loop. > > Wouldn't it be better to relax the verification code if possible? If we ever fix dependency analysis > to be less restrictive (see JDK-8229449), we should move the load out of the loop, right? Dependency analysis is so conservative right now that fixing it seems a long way from happening. So it could possibly be improved but I don't see it being "fixed" for good. I'm not sure relaxing the verification code is better. Loop strip mining works under the assumption that some simple invariants remain true. That's what the verification code is here to check. Relaxing the verification code and thus the invariants would make reasoning about loop strip mining harder and doesn't seem like the right way to fix this. The invariant in this case is that nothing is in the outer loop other that the control flow for the outer loop, the safepoint and data nodes that are referred to from the safepoint. >> Unrelated to this fix, I wonder if the code at: >> >> http://hg.openjdk.java.net/jdk/jdk/file/cb836bd08d58/src/hotspot/share/opto/loopopts.cpp#l1346 >> >> really does what the comment says it does for loads (and really does >> anything useful actually). > > I'm not really familiar with that code but maybe file a RFE to investigate this later. I'm hoping Vladimir will comment. If not, I'll file a RFE as you suggest. Roland. From dean.long at oracle.com Wed Aug 28 16:51:21 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 28 Aug 2019 09:51:21 -0700 Subject: RFR: 8229797: [JVMCI] Clean up no longer used JVMCI::dependencies_invalid value. In-Reply-To: References: Message-ID: Looks good. dl On 8/27/19 7:25 PM, Xiaohong Gong (Arm Technology China) wrote: > Hi, > > Please help to review this jvmci patch: > Webrew: http://cr.openjdk.java.net/~pli/rfr/8229797/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8229797 > > This patch fix issue: https://github.com/oracle/graal/issues/1587. > The loading of new classes can cause dependencies to become false, which requires the dependent > nmethods to be discarded and deoptimized. So if validating dependencies fails, it should make the > result to be JVMCI::dependencies_failed, which makes jvmci throw the BailoutException. > The invalid dependencies happen at the time of installation without any intervening modification of > the system dictionary. So as the system dictionary modification optimization has been removed, the > compiler can not know whether the failed dependencies are triggered by class reloading or not. It's > better to use dependencies_failed to mark the result. > > Thanks, > Xiaohong Gong From navy.xliu at gmail.com Thu Aug 29 06:41:29 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Wed, 28 Aug 2019 23:41:29 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: <45138947-3c8e-9925-25cd-670afb1eb7cb@oracle.com> References: <45138947-3c8e-9925-25cd-670afb1eb7cb@oracle.com> Message-ID: - line 53: How can bytes ever be NULL? MyTest.java try to simulate this function. https://github.com/amzn/ion-java/blob/0873d5c1429a3bf6365d56ae2ae8461bd3865289/src/com/amazon/ion/impl/_Private_IonReaderFactory.java#L327 I manually inlined isIonBinary https://github.com/amzn/ion-java/blob/0873d5c1429a3bf6365d56ae2ae8461bd3865289/src/com/amazon/ion/util/IonStreamUtils.java#L70 - Compared to the replay file, are you sure that the same paths are being marked as taken with your test? yes. I tried all kinds of combination. - Did you compare the output of TraceLoopOpts? yes. I did compare them. they are very same, right? TestMe.java Loop: N0/N0 has_call has_sfpt Loop: N354/N353 limit_check sfpts={ 356 } Loop: N355/N216 limit_check counted [0,4),+1 (-1 iters) has_sfpt strip_mined Loop: N364/N362 limit_check counted [int,0),-1 (-1 iters) has_call has_sfpt Replay TraceLoopOpts Loop: N0/N0 has_call has_sfpt Loop: N681/N680 limit_check sfpts={ 683 } Loop: N682/N366 limit_check counted [0,4),+1 (-1 iters) has_sfpt strip_mined Loop: N691/N689 limit_check counted [int,0),-1 (-1 iters) has_call has_sfpt One thing is beyond my understanding. According to the inline tree of replay, it should have yet another loop but I can't find it! This loop should appear after the reverse loop 'Loop: N691/N689 limit_check counted [int,0),-1 (-1 iters)' https://github.com/amzn/ion-java/blob/0873d5c1429a3bf6365d56ae2ae8461bd3865289/src/com/amazon/ion/util/IonStreamUtils.java#L104 Do you know who can wipe out this loop? it's not dead code. I can't understand how dare optimizer remove it. Another thing is I still have a hard time to understand profile data embedded in IR. in this graph: https://bugs.openjdk.java.net/secure/attachment/84402/JDK-8229450.png 677 CountedLoopEnd, [lt] P=0.800000, C=8188.000000 520 If, P=0.000000, C=4617.000000 I guess P == possibility and C == counts. my program is same possibilities as replay's. On Wed, Aug 28, 2019 at 2:46 AM Tobias Hartmann wrote: > > On 27.08.19 10:34, Liu Xin wrote: > > I followed Tobias' guideline. I make a testcase: TestMe.java. It's very > similar to the graph > > generated from replay file. I intend to force line 40 and 53 share the > boolean expression 'len < > > BINARY_VERSION_MARKER_SIZE', but still no luck. > > IGVN always removes one of them. I am pretty sure that replay file > compiles the same graph, but some > > profile data keep 364 Bool like that. Could you give me a hint? > > Just quickly looked at your test: > - line 53: How can bytes ever be NULL? > - Compared to the replay file, are you sure that the same paths are being > marked as taken with your > test? For example, is the break in line 43 executed or the return in line > 54? It seems that you are > always running with the exact same array whereas it seems likely that the > original run used > different arrays (and therefore the profile in the replay reports > different paths to be taken). > - Did you compare the output of TraceLoopOpts? > > Best regards, > Tobias > From tobias.hartmann at oracle.com Thu Aug 29 07:03:05 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 29 Aug 2019 09:03:05 +0200 Subject: RFR(S): 8230061: # assert(mode == ControlAroundStripMined && use == sfpt) failed: missed a node In-Reply-To: <878srdpf2x.fsf@redhat.com> References: <87ef15pv8d.fsf@redhat.com> <878srdpf2x.fsf@redhat.com> Message-ID: Hi Roland, On 28.08.19 15:57, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8230061/webrev.01/ Looks good. Thanks, Tobias From tobias.hartmann at oracle.com Thu Aug 29 07:17:24 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 29 Aug 2019 09:17:24 +0200 Subject: RFR(S): 8229483: Sinking load out of loop may trigger: assert(found_sfpt) failed: no node in loop that's not input to safepoint In-Reply-To: <8736hlpdhf.fsf@redhat.com> References: <87mufvotaq.fsf@redhat.com> <95293c8e-be51-ce33-f509-75aad32ac4c4@oracle.com> <8736hlpdhf.fsf@redhat.com> Message-ID: Hi Roland, thanks for the explanation. Very unfortunate that anti-dependency analysis can not do better but your fix seems reasonable to me. Best regards, Tobias On 28.08.19 16:32, Roland Westrelin wrote: > > Thanks for reviewing this, Tobias. > >>> http://cr.openjdk.java.net/~roland/8229483/webrev.00/ >> >> I've noticed that field2 in the test is unused. > > Good catch. I'll remove it. > >>> The field load in the test case has an early control right below: >>> >>> barrier = 1; >>> >>> (it's a volatile store) >>> >>> and a late control that's conservatively set to be right above: >>> >>> array[0] = j; >>> >>> by anti dependence analysis. >>> >>> The store array[0] is sunk out of the inner loop in the outer strip >>> mined loop right above the strip mined loop's safepoint. The load >>> initially scheduled in the outer loop is a candidate for being cloned >>> and sunk out of loop at the load usages. Because of the anti-dependence >>> with the array[0], it is sunk into the outer strip mined loop eventhough >>> it is not referenced by the safepoint. That breaks loop strip mining >>> verification because the expectation is that any data node in the outer >>> strip mined loop is there because it's referenced by the safepoint. >> >> I don't see a field load in the loop, so which load are you referring to? > > 58 return field + res + field * 2; > > the load of static field "field". > >>> The fix simply recognizes this special case when the load is being sunk >>> out of loop and makes sure it is not moved in the outer strip mined >>> loop. >> >> Wouldn't it be better to relax the verification code if possible? If we ever fix dependency analysis >> to be less restrictive (see JDK-8229449), we should move the load out of the loop, right? > > Dependency analysis is so conservative right now that fixing it seems a > long way from happening. So it could possibly be improved but I don't > see it being "fixed" for good. > > I'm not sure relaxing the verification code is better. Loop strip mining > works under the assumption that some simple invariants remain > true. That's what the verification code is here to check. Relaxing the > verification code and thus the invariants would make reasoning about > loop strip mining harder and doesn't seem like the right way to fix > this. The invariant in this case is that nothing is in the outer loop > other that the control flow for the outer loop, the safepoint and data > nodes that are referred to from the safepoint. > >>> Unrelated to this fix, I wonder if the code at: >>> >>> http://hg.openjdk.java.net/jdk/jdk/file/cb836bd08d58/src/hotspot/share/opto/loopopts.cpp#l1346 >>> >>> really does what the comment says it does for loads (and really does >>> anything useful actually). >> >> I'm not really familiar with that code but maybe file a RFE to investigate this later. > > I'm hoping Vladimir will comment. If not, I'll file a RFE as you > suggest. > > Roland. > From fujie at loongson.cn Thu Aug 29 07:52:25 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 29 Aug 2019 15:52:25 +0800 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: Hi Vladimir and all, Updated: http://cr.openjdk.java.net/~jiefu/8230037/webrev.01/ Apart from the example described in the JBS[1], here is a more confusing one. During debugging, people may think that the OptoAssembly @line 32383 should be for Compile_id=482 according to the MetaData @line 32381. ? 32355 ============================= C2-compiled nmethod ============================== ? 32356 ----------------------------------- MetaData ----------------------------------- ? 32357 {method} ? 32358? - this oop:????????? 0x00007f8b33c4a3f0 ? 32359? - method holder: 'spec/benchmarks/compress/Compressor' ? 32360? - constants:???????? 0x00007f8b33c497b0 constant pool [160] {0x00007f8b33c497b0} for 'spec/benchmarks/compress/Compressor' cache=0x00007f8b33c4ab38 ? 32361? - access:??????????? 0xc1000002? private ? 32362? - name:????????????? 'output' ? 32363? - signature:???????? '(I)V' ? 32364? - max stack:???????? 7 ? 32365? - max locals:??????? 5 ? 32366? - size of params:??? 2 ? 32367? - method size:?????? 13 ? 32368? - highest level:???? 4 ? 32369? - vtable index:????? -2 ? 32370? - i2i entry:???????? 0x00007f8b70781460 ? 32371? - adapters:????????? AHE at 0x00007f8b88426e90: 0xba000000 i2c: 0x00007f8b70b88800 c2i: 0x00007f8b70b8894d c2iUV: 0x00007f8b70b88910 c2iNCI: 0x00007f8b70b8898a ? 32372? - compiled entry???? 0x00007f8b7838e0b0 ? 32373? - code size:???????? 373 ? 32374? - code start:??????? 0x00007f8b33c4a208 ? 32375? - code end (excl):?? 0x00007f8b33c4a37d ? 32376? - method data:?????? 0x00007f8b33c4d350 ? 32377? - checked ex length: 0 ? 32378? - linenumber start:? 0x00007f8b33c4a37d ? 32379? - localvar length:?? 5 ? 32380? - localvar start:??? 0x00007f8b33c4a3b2 ? 32381? - compiled code: nmethod?? 1929? 482?????? 4 spec.benchmarks.compress.Compressor::output (373 bytes) ? 32382 ? 32383 --------------------------------- OptoAssembly --------------------------------- However, the MetaData[2] is always for the last compilation of the method, not for the current compilation. Therefore, it's wrong to match Compile_id for the OptoAssembly with the dumped MetaData. So to find the correct Compile_id, the MetaData is *not only helpless but also quite misleading*. The effect of the patch is the following. To avoid confusing, the Compile_id (@line 32356 and 32383) was dumped explicitly, which was inspired by Vladimir. Please see more comments inline. ? 32355 ============================= C2-compiled nmethod ============================== ? 32356 ----------------------- MetaData before Compile_id = 485 ------------------------ ? 32357 {method} ? 32358? - this oop:????????? 0x00007f8b33c4a3f0 ? 32359? - method holder: 'spec/benchmarks/compress/Compressor' ? 32360? - constants:???????? 0x00007f8b33c497b0 constant pool [160] {0x00007f8b33c497b0} for 'spec/benchmarks/compress/Compressor' cache=0x00007f8b33c4ab38 ? 32361? - access:??????????? 0xc1000002? private ? 32362? - name:????????????? 'output' ? 32363? - signature:???????? '(I)V' ? 32364? - max stack:???????? 7 ? 32365? - max locals:??????? 5 ? 32366? - size of params:??? 2 ? 32367? - method size:?????? 13 ? 32368? - highest level:???? 4 ? 32369? - vtable index:????? -2 ? 32370? - i2i entry:???????? 0x00007f8b70781460 ? 32371? - adapters:????????? AHE at 0x00007f8b88426e90: 0xba000000 i2c: 0x00007f8b70b88800 c2i: 0x00007f8b70b8894d c2iUV: 0x00007f8b70b88910 c2iNCI: 0x00007f8b70b8898a ? 32372? - compiled entry???? 0x00007f8b7838e0b0 ? 32373? - code size:???????? 373 ? 32374? - code start:??????? 0x00007f8b33c4a208 ? 32375? - code end (excl):?? 0x00007f8b33c4a37d ? 32376? - method data:?????? 0x00007f8b33c4d350 ? 32377? - checked ex length: 0 ? 32378? - linenumber start:? 0x00007f8b33c4a37d ? 32379? - localvar length:?? 5 ? 32380? - localvar start:??? 0x00007f8b33c4a3b2 ? 32381? - compiled code: nmethod?? 1929? 482?????? 4 spec.benchmarks.compress.Compressor::output (373 bytes) ? 32382 ? 32383 ------------------------ OptoAssembly for Compile_id = 485 ----------------------- Testing: ? - make test TEST="tier1 tier2 tier3" CONF=fastdebug on Linux/x64 ? - make test TEST="tier1 tier2 tier3" CONF=release?? on Linux/x64 ? - SPECjvm2008 with fastdebug using -XX:+PrintOptoAssembly on Linux/x64 Any comments? Thanks a lot. Best regards, Jie [1] https://bugs.openjdk.java.net/browse/JDK-8230037 [2] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/opto/output.cpp#l1573 On 2019/8/23 ??12:16, Vladimir Kozlov wrote: > To avoid confusing I normally use -Xbatch and CICompilerCount=1 (or =2 > for tiered compilation). > > The output is under ttyLocker so it should be one block. I see that > there is mix of tty and xtty streams in code. May be that is the > reason it is not together. I had tested the tty and xtty logging with SPECjvm2008 on a 56-core machine. It works well enough for both of them (except for the safepoint case). > If we use xtty it should be passed to print_metadata() too. Please, > investigate more. I think there is no need to pass xtty to print_metadata() since ?1) the ttyLocker also works for xtty [3] ?2) and the tty and xtty actually will write to the same log file [4] [3] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/opto/output.cpp#l1585 [4] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/utilities/ostream.cpp#l645 > > Instead of "Last Normal Compilation" I would print Compile_id. Done. Thanks. > > Thanks, > Vladimir From rwestrel at redhat.com Thu Aug 29 11:51:35 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Aug 2019 13:51:35 +0200 Subject: [14] RFR(M): 8229496: SIGFPE (division by zero) in C2 OSR compiled method In-Reply-To: References: Message-ID: <87r254nq9k.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8229496/webrev.00/ That looks good to me. Have you verified performance? Wouldn't the code in CastLLNode::Ideal() after line 301 belong in CastLLNode::Value()? I realize you followed CastIINode::Ideal() which would need to be changed to. Roland. From rwestrel at redhat.com Thu Aug 29 12:21:11 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Aug 2019 14:21:11 +0200 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: Message-ID: <87lfvcnow8.fsf@redhat.com> Hi, Thanks for working on that bug. > Please review my change to fix IfSplitBlocks for a corner case. > JBS: https://bugs.openjdk.java.net/browse/JDK-8229450 > webrev: https://cr.openjdk.java.net/~xliu/8229450/webrev/ The fix is correct but too conservative I think. The optimization itself is valid: the second if is redundant and should be eliminated but control dependent data nodes shouldn't be moved in the inner strip mined loop. That for 2 reasons: 1) it breaks verification code/strip mining assumptions as you've found 2) it's actually an illegal transformation because at the end of compilation the inner loop exit condition is adjusted to only cover a subset of the iterations so the inner loop exit condition is not longer redundant with the if that's eliminated. I would propose this as a fix instead: diff -r 85fbdb87baad -r c1ff28db28e7 src/hotspot/share/opto/loopopts.cpp --- a/src/hotspot/share/opto/loopopts.cpp Wed Aug 14 15:07:04 2019 +0200 +++ b/src/hotspot/share/opto/loopopts.cpp Thu Aug 29 13:35:15 2019 +0200 @@ -1326,6 +1326,11 @@ if (dom->req() > 1 && dom->in(1) == bol && prevdom->in(0) == dom) { // Replace the dominated test with an obvious true or false. // Place it on the IGVN worklist for later cleanup. + if (prevdom->in(0)->is_CountedLoopEnd() && + prevdom->in(0)->as_CountedLoopEnd()->loopnode() != NULL && + prevdom->in(0)->as_CountedLoopEnd()->loopnode()->is_strip_mined()) { + prevdom = prevdom->in(0)->as_CountedLoopEnd()->loopnode()->in(LoopNode::EntryControl)->as_OuterStripMinedLoop()->outer_loop_exit(); + } C->set_major_progress(); dominated_by(prevdom, n, false, true); #ifndef PRODUCT Remove the redundant if but replace it by the exit of the outer strip mined loop. I also managed to write a test case. See below. I run it with: java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:LoopMaxUnroll=0 -XX:CompileCommand=dontinline,LoadDependsOnIfIdenticalToLoopExit::not_inlined LoadDependsOnIfIdenticalToLoopExit Note that removing an If if it's the branch of a dominating equivalent If is performed twice: once during loop opts (the one that causes this bug) and once during igvn in IfNode::search_identical(). test2() tries to trigger the same bug but during igvn but with the current code it fails to because the igvn code is stricter (too strict actually) because it checks that the dominating If opcode is the same as the dominated If so a CountedLoopEnd can't replace an If. My own recipe for writing regression tests is to: 1) understand the failure 2) determine the chain of events that lead to the failure 3) write a test case that reproduces the chain of events from scratch, ignoring the actual method that caused the bug in the first place 4) if I can't find out how to trigger a particular event, I go back to the initial failure, try to understand what happens and incorporate it in the test case So I don't try to replicate the failing method but build my own method from scratch from my understanding of the bug. Roland. public class LoadDependsOnIfIdenticalToLoopExit { private static int[] field = new int[1]; private int field2; public static void main(String[] args) { LoadDependsOnIfIdenticalToLoopExit instance = new LoadDependsOnIfIdenticalToLoopExit(); for (int i = 0; i < 20_000; i++) { test1(false, false); test1(true, true); test2(); } } private static int test1(boolean flag1, boolean flag2) { int res = 1; int[] array = new int[10]; not_inlined(array); int i; for (i = 0; i < 2000; i++) { res *= i; } if (flag1) { if (flag2) { res++; } } if (i >= 2000) { res *= array[0]; } return res; } private static int test2() { int j = 2; for (; j < 4; j *= 2); int res = 1; int[] array = new int[10]; not_inlined(array); int i; for (i = 1; i < 2000; i++) { res *= i; } if (i >= j * 500) { res *= array[0]; } return res; } private static void not_inlined(int[] array) { } } From vladimir.kozlov at oracle.com Thu Aug 29 16:07:35 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Aug 2019 09:07:35 -0700 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: Hi Jie, This looks good. Thanks, Vladimir On 8/29/19 12:52 AM, Jie Fu wrote: > Hi Vladimir and all, > > Updated: http://cr.openjdk.java.net/~jiefu/8230037/webrev.01/ > > Apart from the example described in the JBS[1], here is a more confusing one. > During debugging, people may think that the OptoAssembly @line 32383 should be for Compile_id=482 according to the > MetaData @line 32381. > > ? 32355 ============================= C2-compiled nmethod ============================== > ? 32356 ----------------------------------- MetaData ----------------------------------- > ? 32357 {method} > ? 32358? - this oop:????????? 0x00007f8b33c4a3f0 > ? 32359? - method holder: 'spec/benchmarks/compress/Compressor' > ? 32360? - constants:???????? 0x00007f8b33c497b0 constant pool [160] {0x00007f8b33c497b0} for > 'spec/benchmarks/compress/Compressor' cache=0x00007f8b33c4ab38 > ? 32361? - access:??????????? 0xc1000002? private > ? 32362? - name:????????????? 'output' > ? 32363? - signature:???????? '(I)V' > ? 32364? - max stack:???????? 7 > ? 32365? - max locals:??????? 5 > ? 32366? - size of params:??? 2 > ? 32367? - method size:?????? 13 > ? 32368? - highest level:???? 4 > ? 32369? - vtable index:????? -2 > ? 32370? - i2i entry:???????? 0x00007f8b70781460 > ? 32371? - adapters:????????? AHE at 0x00007f8b88426e90: 0xba000000 i2c: 0x00007f8b70b88800 c2i: 0x00007f8b70b8894d c2iUV: > 0x00007f8b70b88910 c2iNCI: 0x00007f8b70b8898a > ? 32372? - compiled entry???? 0x00007f8b7838e0b0 > ? 32373? - code size:???????? 373 > ? 32374? - code start:??????? 0x00007f8b33c4a208 > ? 32375? - code end (excl):?? 0x00007f8b33c4a37d > ? 32376? - method data:?????? 0x00007f8b33c4d350 > ? 32377? - checked ex length: 0 > ? 32378? - linenumber start:? 0x00007f8b33c4a37d > ? 32379? - localvar length:?? 5 > ? 32380? - localvar start:??? 0x00007f8b33c4a3b2 > ? 32381? - compiled code: nmethod?? 1929? 482?????? 4 spec.benchmarks.compress.Compressor::output (373 bytes) > ? 32382 > ? 32383 --------------------------------- OptoAssembly --------------------------------- > > However, the MetaData[2] is always for the last compilation of the method, not for the current compilation. > Therefore, it's wrong to match Compile_id for the OptoAssembly with the dumped MetaData. > > So to find the correct Compile_id, the MetaData is *not only helpless but also quite misleading*. > > The effect of the patch is the following. > To avoid confusing, the Compile_id (@line 32356 and 32383) was dumped explicitly, which was inspired by Vladimir. > Please see more comments inline. > > ? 32355 ============================= C2-compiled nmethod ============================== > ? 32356 ----------------------- MetaData before Compile_id = 485 ------------------------ > ? 32357 {method} > ? 32358? - this oop:????????? 0x00007f8b33c4a3f0 > ? 32359? - method holder: 'spec/benchmarks/compress/Compressor' > ? 32360? - constants:???????? 0x00007f8b33c497b0 constant pool [160] {0x00007f8b33c497b0} for > 'spec/benchmarks/compress/Compressor' cache=0x00007f8b33c4ab38 > ? 32361? - access:??????????? 0xc1000002? private > ? 32362? - name:????????????? 'output' > ? 32363? - signature:???????? '(I)V' > ? 32364? - max stack:???????? 7 > ? 32365? - max locals:??????? 5 > ? 32366? - size of params:??? 2 > ? 32367? - method size:?????? 13 > ? 32368? - highest level:???? 4 > ? 32369? - vtable index:????? -2 > ? 32370? - i2i entry:???????? 0x00007f8b70781460 > ? 32371? - adapters:????????? AHE at 0x00007f8b88426e90: 0xba000000 i2c: 0x00007f8b70b88800 c2i: 0x00007f8b70b8894d c2iUV: > 0x00007f8b70b88910 c2iNCI: 0x00007f8b70b8898a > ? 32372? - compiled entry???? 0x00007f8b7838e0b0 > ? 32373? - code size:???????? 373 > ? 32374? - code start:??????? 0x00007f8b33c4a208 > ? 32375? - code end (excl):?? 0x00007f8b33c4a37d > ? 32376? - method data:?????? 0x00007f8b33c4d350 > ? 32377? - checked ex length: 0 > ? 32378? - linenumber start:? 0x00007f8b33c4a37d > ? 32379? - localvar length:?? 5 > ? 32380? - localvar start:??? 0x00007f8b33c4a3b2 > ? 32381? - compiled code: nmethod?? 1929? 482?????? 4 spec.benchmarks.compress.Compressor::output (373 bytes) > ? 32382 > ? 32383 ------------------------ OptoAssembly for Compile_id = 485 ----------------------- > > Testing: > ? - make test TEST="tier1 tier2 tier3" CONF=fastdebug on Linux/x64 > ? - make test TEST="tier1 tier2 tier3" CONF=release?? on Linux/x64 > ? - SPECjvm2008 with fastdebug using -XX:+PrintOptoAssembly on Linux/x64 > > Any comments? > > Thanks a lot. > Best regards, > Jie > > [1] https://bugs.openjdk.java.net/browse/JDK-8230037 > [2] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/opto/output.cpp#l1573 > > On 2019/8/23 ??12:16, Vladimir Kozlov wrote: >> To avoid confusing I normally use -Xbatch and CICompilerCount=1 (or =2 for tiered compilation). >> >> The output is under ttyLocker so it should be one block. I see that there is mix of tty and xtty streams in code. May >> be that is the reason it is not together. > I had tested the tty and xtty logging with SPECjvm2008 on a 56-core machine. > It works well enough for both of them (except for the safepoint case). > > >> If we use xtty it should be passed to print_metadata() too. Please, investigate more. > > I think there is no need to pass xtty to print_metadata() since > ?1) the ttyLocker also works for xtty [3] > ?2) and the tty and xtty actually will write to the same log file [4] > > [3] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/opto/output.cpp#l1585 > [4] http://hg.openjdk.java.net/jdk/jdk/file/3da1848cc39b/src/hotspot/share/utilities/ostream.cpp#l645 > > >> >> Instead of "Last Normal Compilation" I would print Compile_id. > > Done. Thanks. > > >> >> Thanks, >> Vladimir > From bsrbnd at gmail.com Thu Aug 29 19:02:05 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Thu, 29 Aug 2019 21:02:05 +0200 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: Hi Jie and Vladimir, This looks useful and trivial to me too. Would you like me to push it on Jie's behalf? Bernard On Thu, 29 Aug 2019 at 18:08, Vladimir Kozlov wrote: > > Hi Jie, > > This looks good. > > Thanks, > Vladimir > > On 8/29/19 12:52 AM, Jie Fu wrote: > > Hi Vladimir and all, > > > > Updated: http://cr.openjdk.java.net/~jiefu/8230037/webrev.01/ From richard.reingruber at sap.com Thu Aug 29 20:31:54 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Thu, 29 Aug 2019 20:31:54 +0000 Subject: RFR(XXS) 8230363: C2: Let ConnectionGraph::not_global_escape(Node* n) return false if n is not in the CG Message-ID: Hi, could I please get reviews and sponsoring for Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8230363/webrev.0/ Bug: https://bugs.openjdk.java.net/browse/JDK-8230363 The fix avoids crashes if ConnectionGraph::not_global_escape(Node* n) should be called with a node n that was not added to the connection graph. Note that not all ideal nodes are added (e.g. constant numbers). This case should be handled more gracefully by returning false. This is split off from JDK-8227745[1]. There not_global_escape() is applied to the arguments of java calls. If one argument should be e.g. an integer constant not_global_escape() would crash. I've run tier1 tests. Thanks, Richard. [1] https://bugs.openjdk.java.net/browse/JDK-8227745 From vladimir.kozlov at oracle.com Thu Aug 29 20:44:03 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Aug 2019 13:44:03 -0700 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: <4A20CB68-AA4F-419F-A24A-C07DF5433FBA@oracle.com> > On Aug 29, 2019, at 12:02 PM, B. Blaser wrote: > > Hi Jie and Vladimir, > > This looks useful and trivial to me too. > Would you like me to push it on Jie's behalf? Yes, please Vladimir > > Bernard > > On Thu, 29 Aug 2019 at 18:08, Vladimir Kozlov > wrote: >> >> Hi Jie, >> >> This looks good. >> >> Thanks, >> Vladimir >> >>> On 8/29/19 12:52 AM, Jie Fu wrote: >>> Hi Vladimir and all, >>> >>> Updated: http://cr.openjdk.java.net/~jiefu/8230037/webrev.01/ From dean.long at oracle.com Thu Aug 29 23:40:24 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 29 Aug 2019 16:40:24 -0700 Subject: RFR: 8230129: Add jtreg "serviceability/sa/ClhsdbInspect.java" to graal problem list. In-Reply-To: References: Message-ID: <5c3dffcc-12bc-48a8-3edf-99e0ac415ced@oracle.com> Isn't it the correct process that there should be a master bug for fixing the test or underlying issue, then use that bug number for the problem list entry, and push it using a SubTask.? For example, see the relationship between JDK-8229447 and JDK-8229446 as an example. dl On 8/27/19 12:14 AM, Xiaohong Gong (Arm Technology China) wrote: > Hi, > > Please help to review this small patch: > Webrew: http://cr.openjdk.java.net/~pli/rfr/8230129/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8230129 > > Jtreg test "serviceability/sa/ClhsdbInspect.java" fails when running with Graal. It fails when inspecting an address to check whether it's pointing to an expected oop > or method, which is printed by running "jstack -v" firstly. > When running with graal, it needs more java heap for the JVMCI initialization and the compiler working. So it's inevitable to make GC happen during the application > running. If GC happens after runnning "jstack", the actual address of the oops and methods may be different when running "inpsect". And the address inspected may > point to other object or nothing. > A simple fix is to add this test to the graal problem list. > > Thanks, > Xiaohong Gong From fujie at loongson.cn Fri Aug 30 00:22:07 2019 From: fujie at loongson.cn (Jie Fu) Date: Fri, 30 Aug 2019 08:22:07 +0800 Subject: RFR(trivial): 8230037: Confused MetaData dumped by PrintOptoAssembly In-Reply-To: References: <7a124106-6c12-0f3c-4b64-7a7de13e22b9@oracle.com> Message-ID: <568ad456-c735-0d33-aaf9-3e7c3efc65b8@loongson.cn> Thanks Vladimir and Bernard for your review. And special thanks to Bernard for sponsoring it. On 2019/8/30 ??3:02, B. Blaser wrote: > Hi Jie and Vladimir, > > This looks useful and trivial to me too. > Would you like me to push it on Jie's behalf? > > Bernard > > On Thu, 29 Aug 2019 at 18:08, Vladimir Kozlov > wrote: >> Hi Jie, >> >> This looks good. >> >> Thanks, >> Vladimir >> >> On 8/29/19 12:52 AM, Jie Fu wrote: >>> Hi Vladimir and all, >>> >>> Updated: http://cr.openjdk.java.net/~jiefu/8230037/webrev.01/ From tobias.hartmann at oracle.com Fri Aug 30 06:36:33 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 08:36:33 +0200 Subject: [14] RFR(M): 8229496: SIGFPE (division by zero) in C2 OSR compiled method In-Reply-To: <87r254nq9k.fsf@redhat.com> References: <87r254nq9k.fsf@redhat.com> Message-ID: <76d8d371-d8da-88ff-815d-cd28d7e0964c@oracle.com> Hi Roland, On 29.08.19 13:51, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/8229496/webrev.00/ > > That looks good to me. Thanks for the review. > Have you verified performance? Not yet, I wanted to wait for reviews. But I'll submit testing over the weekend. > Wouldn't the code in CastLLNode::Ideal() after line 301 belong in > CastLLNode::Value()? I realize you followed CastIINode::Ideal() which > would need to be changed to. Right, same in ConvI2LNode::Ideal. I've filed a separate enhancement for this: https://bugs.openjdk.java.net/browse/JDK-8230382 Best regards, Tobias From tobias.hartmann at oracle.com Fri Aug 30 06:51:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 08:51:40 +0200 Subject: RFR(XXS) 8230363: C2: Let ConnectionGraph::not_global_escape(Node* n) return false if n is not in the CG In-Reply-To: References: Message-ID: <71bc3355-69ae-9805-efae-ed404db17951@oracle.com> Hi Richard, looks good to me but please check ptn == NULL instead of casting to boolean before pushing. Thanks, Tobias On 29.08.19 22:31, Reingruber, Richard wrote: > Hi, > > could I please get reviews and sponsoring for > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8230363/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8230363 > > The fix avoids crashes if ConnectionGraph::not_global_escape(Node* n) should be called with a node n > that was not added to the connection graph. Note that not all ideal nodes are added (e.g. constant > numbers). This case should be handled more gracefully by returning false. > > This is split off from JDK-8227745[1]. There not_global_escape() is applied to the arguments of java > calls. If one argument should be e.g. an integer constant not_global_escape() would crash. > > I've run tier1 tests. > > Thanks, Richard. > > [1] https://bugs.openjdk.java.net/browse/JDK-8227745 > From navy.xliu at gmail.com Fri Aug 30 07:39:18 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Fri, 30 Aug 2019 00:39:18 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: <87lfvcnow8.fsf@redhat.com> References: <87lfvcnow8.fsf@redhat.com> Message-ID: Hi, Roland, Thank you so much! Please see the comments inline. On Thu, Aug 29, 2019 at 5:21 AM Roland Westrelin wrote: > > Hi, > > Thanks for working on that bug. > > > Please review my change to fix IfSplitBlocks for a corner case. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8229450 > > webrev: https://cr.openjdk.java.net/~xliu/8229450/webrev/ > > The fix is correct but too conservative I think. The optimization itself > is valid: the second if is redundant and should be eliminated but > control dependent data nodes shouldn't be moved in the inner strip mined > loop. That for 2 reasons: 1) it breaks verification code/strip mining > assumptions as you've found 2) it's actually an illegal transformation > because at the end of compilation the inner loop exit condition is > adjusted to only cover a subset of the iterations so the inner loop exit > condition is not longer redundant with the if that's eliminated. > > for 2), yes, it's definitely an illegal transformation. but I feel it's because it moves in code right after inner loop exit. New code could be 1) cause data dependency 2) change control flow, eg. return early Do you think my comment is explanatory? https://cr.openjdk.java.net/~xliu/8229450/03/webrev/ I just did cosmetic things for your patch. Could you review and sponsor it? I also verified it using hotspot:tier1. I would propose this as a fix instead: > > diff -r 85fbdb87baad -r c1ff28db28e7 src/hotspot/share/opto/loopopts.cpp > --- a/src/hotspot/share/opto/loopopts.cpp Wed Aug 14 15:07:04 2019 > +0200 > +++ b/src/hotspot/share/opto/loopopts.cpp Thu Aug 29 13:35:15 2019 > +0200 > @@ -1326,6 +1326,11 @@ > if (dom->req() > 1 && dom->in(1) == bol && prevdom->in(0) == dom) > { > // Replace the dominated test with an obvious true or false. > // Place it on the IGVN worklist for later cleanup. > + if (prevdom->in(0)->is_CountedLoopEnd() && > + prevdom->in(0)->as_CountedLoopEnd()->loopnode() != NULL && > + > prevdom->in(0)->as_CountedLoopEnd()->loopnode()->is_strip_mined()) { > + prevdom = > prevdom->in(0)->as_CountedLoopEnd()->loopnode()->in(LoopNode::EntryControl)->as_OuterStripMinedLoop()->outer_loop_exit(); > + } > C->set_major_progress(); > dominated_by(prevdom, n, false, true); > #ifndef PRODUCT > > Remove the redundant if but replace it by the exit of the outer strip > mined loop. > > I also managed to write a test case. See below. I run it with: > > java -XX:-TieredCompilation -XX:-UseOnStackReplacement > -XX:-BackgroundCompilation -XX:LoopMaxUnroll=0 > -XX:CompileCommand=dontinline,LoadDependsOnIfIdenticalToLoopExit::not_inlined > LoadDependsOnIfIdenticalToLoopExit > > Note that removing an If if it's the branch of a dominating equivalent > If is performed twice: once during loop opts (the one that causes this > bug) and once during igvn in IfNode::search_identical(). test2() tries > to trigger the same bug but during igvn but with the current code it > fails to because the igvn code is stricter (too strict actually) because > it checks that the dominating If opcode is the same as the dominated If > so a CountedLoopEnd can't replace an If. thanks for the IfNode::search_identical() thing. Let's forget about search_identical() this time because it won't hit a bug. > My own recipe for writing regression tests is to: > > 1) understand the failure > 2) determine the chain of events that lead to the failure > 3) write a test case that reproduces the chain of events from scratch, > ignoring the actual method that caused the bug in the first place > 4) if I can't find out how to trigger a particular event, I go back to > the initial failure, try to understand what happens and incorporate it > in the test case > > So I don't try to replicate the failing method but build my own method > from scratch from my understanding of the bug. > > Thanks a lot! Those are very valuable take-away for me too. Apparently, I am learning c2 in a hard way. My direction was wrong. Tweaking original program is difficult or even impossible sometimes. I feel your approach 'model a chain of events' is more scientific. Could you educate me more? 1. Is it a mental "chain of event", or you can dump it from JVM? 2. What's your definition of 'event' here? Do you mean an event is an individual optimization? I did write down a sequence of optimizations for this case, but I still can't write a testcase... You are so amazing! I know how to make up code, but I don't know what event I want to trigger. Could you tell me what 'chain of events' leads you to your test1? Thanks, --lx > Roland. > > public class LoadDependsOnIfIdenticalToLoopExit { > private static int[] field = new int[1]; > private int field2; > > public static void main(String[] args) { > LoadDependsOnIfIdenticalToLoopExit instance = new > LoadDependsOnIfIdenticalToLoopExit(); > for (int i = 0; i < 20_000; i++) { > test1(false, false); > test1(true, true); > test2(); > } > } > > private static int test1(boolean flag1, boolean flag2) { > int res = 1; > int[] array = new int[10]; > not_inlined(array); > int i; > for (i = 0; i < 2000; i++) { > res *= i; > } > > if (flag1) { > if (flag2) { > res++; > } > } > > if (i >= 2000) { > res *= array[0]; > } > return res; > } > > private static int test2() { > int j = 2; > for (; j < 4; j *= 2); > > int res = 1; > int[] array = new int[10]; > not_inlined(array); > int i; > for (i = 1; i < 2000; i++) { > res *= i; > } > > if (i >= j * 500) { > res *= array[0]; > } > return res; > } > > private static void not_inlined(int[] array) { > } > } > From tobias.hartmann at oracle.com Fri Aug 30 07:38:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 09:38:19 +0200 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: <45138947-3c8e-9925-25cd-670afb1eb7cb@oracle.com> Message-ID: On 29.08.19 08:41, Liu Xin wrote: > - Did you compare the output of TraceLoopOpts? > yes. I did compare them. they are very same, right? Yes, overall loop layout looks the same. > TestMe.java > Loop: N0/N0 ?has_call has_sfpt > ? Loop: N354/N353 ?limit_check sfpts={ 356 } > ? ? Loop: N355/N216 ?limit_check counted [0,4),+1 (-1 iters) ?has_sfpt strip_mined > ? Loop: N364/N362 ?limit_check counted [int,0),-1 (-1 iters) ?has_call has_sfpt > > Replay TraceLoopOpts > Loop: N0/N0 ?has_call has_sfpt > ? Loop: N681/N680 ?limit_check sfpts={ 683 } > ? ? Loop: N682/N366 ?limit_check counted [0,4),+1 (-1 iters) ?has_sfpt strip_mined > ? Loop: N691/N689 ?limit_check counted [int,0),-1 (-1 iters) ?has_call has_sfpt > > > One thing is beyond my understanding.? According to the inline tree of replay, it should have yet > another loop but I can't find it! > This loop should appear after the reverse loop 'Loop: N691/N689 ?limit_check counted [int,0),-1 (-1 > iters)'? > https://github.com/amzn/ion-java/blob/0873d5c1429a3bf6365d56ae2ae8461bd3865289/src/com/amazon/ion/util/IonStreamUtils.java#L104 > Do you know who can wipe out this loop?? it's not dead code. I can't understand how dare optimizer > remove it. There are multiple optimizations that could cause a loop to go away. If you still want to find out and enabling more Print/Trace flags does not help, one way would be to attach a debugger and set a watchpoint on the loop node inputs to catch when it is removed. It's usually a bit of an effort and you have to be careful to reset the watchpoint if the nodes input array is reallocated for extension but it's helpful when everything else fails (you can also try reverse-executing with the rr debugger). > Another thing is I still have a hard time to understand profile data embedded in IR. > in this graph:? ?https://bugs.openjdk.java.net/secure/attachment/84402/JDK-8229450.png > 677 CountedLoopEnd,?[lt] P=0.800000, C=8188.000000 > 520 If,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? P=0.000000, C=4617.000000 > > I guess P == possibility and C == counts. my program is same possibilities as replay's. Right. Hard to tell why your reproducer does not work but as I said, creating one is a science in itself and there are multiple approaches (see Roland's suggestions). Best regards, Tobias From richard.reingruber at sap.com Fri Aug 30 07:43:42 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 30 Aug 2019 07:43:42 +0000 Subject: RFR(XXS) 8230363: C2: Let ConnectionGraph::not_global_escape(Node* n) return false if n is not in the CG In-Reply-To: <71bc3355-69ae-9805-efae-ed404db17951@oracle.com> References: <71bc3355-69ae-9805-efae-ed404db17951@oracle.com> Message-ID: Hi Tobias, your right, that's better. I've updated the webrev in place. Thanks for your review! Richard. -----Original Message----- From: Tobias Hartmann Sent: Freitag, 30. August 2019 08:52 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(XXS) 8230363: C2: Let ConnectionGraph::not_global_escape(Node* n) return false if n is not in the CG Hi Richard, looks good to me but please check ptn == NULL instead of casting to boolean before pushing. Thanks, Tobias On 29.08.19 22:31, Reingruber, Richard wrote: > Hi, > > could I please get reviews and sponsoring for > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8230363/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8230363 > > The fix avoids crashes if ConnectionGraph::not_global_escape(Node* n) should be called with a node n > that was not added to the connection graph. Note that not all ideal nodes are added (e.g. constant > numbers). This case should be handled more gracefully by returning false. > > This is split off from JDK-8227745[1]. There not_global_escape() is applied to the arguments of java > calls. If one argument should be e.g. an integer constant not_global_escape() would crash. > > I've run tier1 tests. > > Thanks, Richard. > > [1] https://bugs.openjdk.java.net/browse/JDK-8227745 > From thomas.schatzl at oracle.com Fri Aug 30 08:32:50 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 30 Aug 2019 10:32:50 +0200 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <7035ccb8-000c-3a58-b5ac-fb0a3b949784@oracle.com> <381f185e-ca2e-50c4-fe35-1e5e62ff88f6@oracle.com> Message-ID: <95f94a8d-d32f-c2e4-25a0-9d7471f74e08@oracle.com> Hi, On 26.08.19 15:04, Doerr, Martin wrote: > Hi all, > > I had noticed that the platforms selection which need a fence in taskqueue.inline.hpp should get updated. > > My initial webrev > http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.00/ > was already reviewed on hotspot-gc-dev. It is an attempt to make things more consistent, especially the property "CPU_MULTI_COPY_ATOMIC". > Also the compiler constant "support_IRIW_for_not_multiple_copy_atomic_cpu" depends on this property (currently only used on PPC64). > > We could go one step further and move even more #defines into the platform files to give platform maintainers more control. > I haven't got feedback from arm/aarch64 folks about this addition, yet: > http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.01/ > With this proposal, each platform which is "CPU_MULTI_COPY_ATOMIC" is supposed to define this macro. > Other platforms must define SUPPORT_IRIW_FOR_NOT_MULTI_COPY_ATOMIC_CPU and IRIW_WITH_RELEASE_VOLATILE_IN_CONSTRUCTOR for fine-grained control of the memory ordering behavior. > We can even control them dynamically (added an experimental switch for PPC64 as an example). > > Note that neither webrev.00 nor webrev.01 contain any functional changes other than the taskqueue update for s390 (and the experimental switch for PPC64 in webrev.01). > > Feedback is welcome. Also if you have a preference wrt. webrev.00 vs. webrev.01. for pushing I would prefer the minimal amount of changes to solve the original issue, and move all other changes to a different CR. Also, I would prefer if all globalDefinitions files contained all defines, commented out if needed. I.e. to try to show that not defining a particular macro has been deliberate and not an oversight. (Like in the 00 webrev where the code at least states for aarch64: 37 // aarch64 is not CPU_MULTI_COPY_ATOMIC I am aware that this is not correct given new information, but in context of the CR it is/was) Further, let's avoid "TODOs" in the sources, the correct place for those is JIRA imho. :) Thanks, Thomas From tobias.hartmann at oracle.com Fri Aug 30 09:02:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 11:02:31 +0200 Subject: [14] RFR(T): 8230388: Problemlist additional compiler/rtm tests Message-ID: Hi, the fix for JDK-8226899 [1] missed to problem list some additional compiler/rtm tests due to 8183263: https://bugs.openjdk.java.net/browse/JDK-8230388 http://cr.openjdk.java.net/~thartmann/8230388/webrev.00/ Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8226899 From rwestrel at redhat.com Fri Aug 30 09:06:59 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 30 Aug 2019 11:06:59 +0200 Subject: [14] RFR(T): 8230388: Problemlist additional compiler/rtm tests In-Reply-To: References: Message-ID: <87blw7nhsc.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8230388/webrev.00/ Looks good. Roland. From tobias.hartmann at oracle.com Fri Aug 30 09:11:45 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 11:11:45 +0200 Subject: [14] RFR(T): 8230388: Problemlist additional compiler/rtm tests In-Reply-To: <87blw7nhsc.fsf@redhat.com> References: <87blw7nhsc.fsf@redhat.com> Message-ID: Thanks for the quick review. Pushed. Best regards, Tobias On 30.08.19 11:06, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8230388/webrev.00/ > > Looks good. > > Roland. > From tobias.hartmann at oracle.com Fri Aug 30 10:06:03 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 12:06:03 +0200 Subject: [14] RFR(T): 8230390: Problemlist SA tests with AOT Message-ID: <5e2090c1-f07e-7bb5-ff5e-6a0696b211a4@oracle.com> Hi, Most of the SA tests have already been problem listed with AOT: http://hg.openjdk.java.net/jdk/jdk13/file/76647c08ce0c/test/hotspot/jtreg/ProblemList-aot.txt#l30 The DebugdConnectTest.java added recently with JDK-8209790 should be problem listed as well: http://cr.openjdk.java.net/~thartmann/8230390/webrev.00/ Thanks, Tobias From martin.doerr at sap.com Fri Aug 30 11:14:45 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 30 Aug 2019 11:14:45 +0000 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: <95f94a8d-d32f-c2e4-25a0-9d7471f74e08@oracle.com> References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <7035ccb8-000c-3a58-b5ac-fb0a3b949784@oracle.com> <381f185e-ca2e-50c4-fe35-1e5e62ff88f6@oracle.com> <95f94a8d-d32f-c2e4-25a0-9d7471f74e08@oracle.com> Message-ID: Hi Thomas, good proposal. Here's the minimal version: http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.02/ I've removed the compiler part. I can create a separate issue for making C1 and C2 consistent. Arm32/aarch64 folks can create new issues if they like further changes. I don't have any further requirements for s390 and PPC64 at the moment. Can I consider it as reviewed by Thomas, David and Derek? Best regards, Martin > -----Original Message----- > From: Thomas Schatzl > Sent: Freitag, 30. August 2019 10:33 > To: Doerr, Martin ; hotspot-runtime- > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' compiler-dev at openjdk.java.net> > Cc: hotspot-gc-dev at openjdk.java.net; David Holmes > (david.holmes at oracle.com) ; Derek White > > Subject: Re: RFR(S): 8229422: Taskqueue: Outdated selection of weak > memory model platforms > > Hi, > > On 26.08.19 15:04, Doerr, Martin wrote: > > Hi all, > > > > I had noticed that the platforms selection which need a fence in > taskqueue.inline.hpp should get updated. > > > > My initial webrev > > http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy- > atomic/webrev.00/ > > was already reviewed on hotspot-gc-dev. It is an attempt to make things > more consistent, especially the property "CPU_MULTI_COPY_ATOMIC". > > Also the compiler constant > "support_IRIW_for_not_multiple_copy_atomic_cpu" depends on this > property (currently only used on PPC64). > > > > We could go one step further and move even more #defines into the > platform files to give platform maintainers more control. > > I haven't got feedback from arm/aarch64 folks about this addition, yet: > > http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy- > atomic/webrev.01/ > > With this proposal, each platform which is "CPU_MULTI_COPY_ATOMIC" is > supposed to define this macro. > > Other platforms must define > SUPPORT_IRIW_FOR_NOT_MULTI_COPY_ATOMIC_CPU and > IRIW_WITH_RELEASE_VOLATILE_IN_CONSTRUCTOR for fine-grained control > of the memory ordering behavior. > > We can even control them dynamically (added an experimental switch for > PPC64 as an example). > > > > Note that neither webrev.00 nor webrev.01 contain any functional changes > other than the taskqueue update for s390 (and the experimental switch for > PPC64 in webrev.01). > > > > Feedback is welcome. Also if you have a preference wrt. webrev.00 vs. > webrev.01. > > for pushing I would prefer the minimal amount of changes to solve the > original issue, and move all other changes to a different CR. > > Also, I would prefer if all globalDefinitions files contained all > defines, commented out if needed. I.e. to try to show that not defining > a particular macro has been deliberate and not an oversight. > > (Like in the 00 webrev where the code at least states for aarch64: > 37 // aarch64 is not CPU_MULTI_COPY_ATOMIC > > I am aware that this is not correct given new information, but in > context of the CR it is/was) > > Further, let's avoid "TODOs" in the sources, the correct place for those > is JIRA imho. :) > > Thanks, > Thomas > From thomas.schatzl at oracle.com Fri Aug 30 11:34:24 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 30 Aug 2019 13:34:24 +0200 Subject: RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms In-Reply-To: References: <9d9819fe-560f-13f0-1907-794e063ee687@oracle.com> <7035ccb8-000c-3a58-b5ac-fb0a3b949784@oracle.com> <381f185e-ca2e-50c4-fe35-1e5e62ff88f6@oracle.com> <95f94a8d-d32f-c2e4-25a0-9d7471f74e08@oracle.com> Message-ID: <55b931eb-6cc9-1352-02a8-12e51d1231e9@oracle.com> Hi Martin, On 30.08.19 13:14, Doerr, Martin wrote: > Hi Thomas, > > good proposal. > > Here's the minimal version: > http://cr.openjdk.java.net/~mdoerr/8229422_multi-copy-atomic/webrev.02/ > > I've removed the compiler part. I can create a separate issue for making C1 and C2 consistent. > > Arm32/aarch64 folks can create new issues if they like further changes. > I don't have any further requirements for s390 and PPC64 at the moment. > > Can I consider it as reviewed by Thomas, David and Derek? > looks good. I filed JDK-8230392 to pick up and test by Aarch64 maintainers. I am not so knowledgeable about the other proposals made here earlier, so I defer filing and fixing these to the respective maintainers. Thanks, Thomas From rwestrel at redhat.com Fri Aug 30 13:50:43 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 30 Aug 2019 15:50:43 +0200 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: References: <87lfvcnow8.fsf@redhat.com> Message-ID: <875zmeoj7w.fsf@redhat.com> > I just did cosmetic things for your patch. Could you review and sponsor > it? Let me run some more testing with it. > Could you tell me what 'chain of events' leads you to your test1? Something like this: 1- we need a loop that's strip mined and followed by a test with the same test as the exit condition with a dependent load (the load has to have its control input set to the right branch of the if) - One way to have a load with a control set is to have a load from an array (because then, the load is dependent on an array bound check). A field load wouldn't be directly control dependent. - but the load must be directly dependent on the if, there must be no bound check in between: one way for c2 to optimize out the bound check is it can prove it's useless which is the case for a newly allocated object - but a load from a newly allocated would also be optimized out so let's put a non inlined call in the way to defeat that optimization 2- we need the if to be optimized out during loop opts. If it's right at the exit of the loop, then IGVN optimizes it out before the strip mined loop is even created so put extra control flow between the if and the loop exit to get in the way of the IGVN optimization with a lot of trial and error and going back and forth between the test and what c2 actually generates. Roland. From rwestrel at redhat.com Fri Aug 30 13:51:25 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 30 Aug 2019 15:51:25 +0200 Subject: [14] RFR(T): 8230390: Problemlist SA tests with AOT In-Reply-To: <5e2090c1-f07e-7bb5-ff5e-6a0696b211a4@oracle.com> References: <5e2090c1-f07e-7bb5-ff5e-6a0696b211a4@oracle.com> Message-ID: <8736hioj6q.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8230390/webrev.00/ Ok. Roland. From tobias.hartmann at oracle.com Fri Aug 30 14:16:39 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 30 Aug 2019 16:16:39 +0200 Subject: [14] RFR(T): 8230390: Problemlist SA tests with AOT In-Reply-To: <8736hioj6q.fsf@redhat.com> References: <5e2090c1-f07e-7bb5-ff5e-6a0696b211a4@oracle.com> <8736hioj6q.fsf@redhat.com> Message-ID: <3eab7376-4fd2-bdcc-8abc-54c184420bd1@oracle.com> Thanks Roland. Best regards, Tobias On 30.08.19 15:51, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8230390/webrev.00/ > > Ok. > > Roland. > From vivek.r.deshpande at intel.com Fri Aug 30 22:25:18 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Fri, 30 Aug 2019 22:25:18 +0000 Subject: 8230015: [instruction selector] generic vector operands support. In-Reply-To: References: Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2AAB730228@fmsmsx121.amr.corp.intel.com> Hi Jatin, Could you please give me the updated patch with #ifdefs and I can submit it to submit-repo for testing. Regards, Vivek -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Bhateja, Jatin Sent: Wednesday, August 28, 2019 2:45 AM To: Yang Zhang (Arm Technology China) ; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov Subject: RE: 8230015: [instruction selector] generic vector operands support. Hi Yang, Thanks for your response. We are also working over the idea of moving out the operator (uOp and bOp) into a separate ideal node, but that will need changes in shape of ideal graph. Generic operands is another level of optimization which will complement "aggressive optimization". In addition, it will also reduce the number of checks for different vector operand type (vecS, vecD, vecX, vecY, vecZ and their legacy variants) within ADLC generate DFA used while matching, since there will only be two vector operands (vecG and legVecG) now. Current patch enables this support only for x86 target, to get a feedback from community. Best Regards, Jatin > -----Original Message----- > From: Yang Zhang (Arm Technology China) > Sent: Wednesday, August 28, 2019 2:49 PM > To: Bhateja, Jatin ; hotspot-compiler- > dev at openjdk.java.net > Cc: Vladimir Kozlov > Subject: RE: 8230015: [instruction selector] generic vector operands support. > > Hi Jatin > > The question how to reduce code size is discussed previously. I also > create a JBS to track it under panama project. > https://bugs.openjdk.java.net/browse/JDK-8229866 > There is a more aggressive idea. > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- > April/033362.html > > Using the idea of generic vector operands, I implement an example > (vaddB/S/I) in AArch64 platform. The code size reduction in libjvm.so > is ~30kb. If all the vector instructions are merged, the estimated > size reduction will be ~300kb. > > How about using a more aggressive way to reduce code size? > > PS. When I test this patch in AArch64 platform, build fails with the > log undefined reference to > `Matcher::do_post_selection_processing(Compile*, > Node*)'. > I fix this failure by adding #ifdef X86 to these code. > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev bounces at openjdk.java.net> On Behalf Of Bhateja, Jatin > Sent: Thursday, August 22, 2019 2:50 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Vladimir Kozlov > Subject: 8230015: [instruction selector] generic vector operands support. > > Hi All, > > Please find below a patch for generic vector operands[1] support > during instruction selection. > > Motivation behind the patch is to reduce the number of vector > selection patterns whose operands meagerly differ in vector lengths. > This will not only result in lesser code being generated by ADLC which > effectively translates to size reduction in libjvm.so but also help in > better maintenance of AD files. > > Using generic operands we were able to collapse multiple vector > patterns over mainline > Initial number of vector instruction patterns (vec[XYZSD] + > legVec[ZXYSD] : 510 > Reduced vector instruction patterns (vecG + legVecG) > : 222 > > With this we could see around 1MB size reduction in libjvm.so. > > In order to have minimal impact over downstream compiler passes, a > post- selection pass has been introduced (currently enabled only for > X86 target) which replaces these generic operands with their > corresponding concreter vector length variants. > > JBS : https://bugs.openjdk.java.net/browse/JDK-8230015 > Patch : > http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/webrev.00/ > > Kindly review and share your feedback. > > Best Regards, > Jatin Bhateja > > [1] > http://cr.openjdk.java.net/~jbhateja/genericVectorOperands/generic_ope > ra > nds_support_v1.0.pdf > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose > the contents to any other person, use it for any purpose, or store or > copy the information in any medium. Thank you. From navy.xliu at gmail.com Sat Aug 31 05:40:05 2019 From: navy.xliu at gmail.com (Liu Xin) Date: Fri, 30 Aug 2019 22:40:05 -0700 Subject: RFR(XS): 8229450: C2 compilation fails with assert(found_sfpt) failed In-Reply-To: <875zmeoj7w.fsf@redhat.com> References: <87lfvcnow8.fsf@redhat.com> <875zmeoj7w.fsf@redhat.com> Message-ID: hi, Roland and Tobias, I made mistake yesterday. I missed ' -XX:+UseCountedLoopSafepoints'. https://cr.openjdk.java.net/~xliu/8229450/03/webrev/test/hotspot/jtreg/compiler/loopstripmining/LoadDependsOnIfIdenticalToLoopExit.java.html The reason I want to explicitly declare '-XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000' because this test needs 'strip-mined loop' Yes, indeed, G1 and shenandoah set it by default, but it's good idea to set this explicitly. On Fri, Aug 30, 2019 at 6:50 AM Roland Westrelin wrote: > > > I just did cosmetic things for your patch. Could you review and sponsor > > it? > > Let me run some more testing with it. > > > Could you tell me what 'chain of events' leads you to your test1? > > Something like this: > > 1- we need a loop that's strip mined and followed by a test with the same > test as the exit condition with a dependent load (the load has to have > its control input set to the right branch of the if) > > - One way to have a load with a control set is to have a load from an > array (because then, the load is dependent on an array bound check). A > field load wouldn't be directly control dependent. > > - but the load must be directly dependent on the if, there must be no > bound check in between: one way for c2 to optimize out the bound check > is it can prove it's useless which is the case for a newly allocated > object > > - but a load from a newly allocated would also be optimized out so let's > put a non inlined call in the way to defeat that optimization > > 2- we need the if to be optimized out during loop opts. If it's right at > the exit of the loop, then IGVN optimizes it out before the strip mined > loop is even created so put extra control flow between the if and the > loop exit to get in the way of the IGVN optimization > > with a lot of trial and error and going back and forth between the test > and what c2 actually generates. > > Roland. > Thanks a lot. now I know how you got test1. The event is actually c2 event. I thought there's a short-cut to be there. Actually, there isn't! A man can't get whatever code shape he want until he knows every single optimization. Clearly, I am far far away from that level. Tobias, Thank you to let me watchpoint tips and rr debugger. I got a video on youtube. thanks, --lx From fujie at loongson.cn Sat Aug 31 15:04:23 2019 From: fujie at loongson.cn (Jie Fu) Date: Sat, 31 Aug 2019 23:04:23 +0800 Subject: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling In-Reply-To: <3724faa6-c57c-744a-d7e8-22daa4231078@loongson.cn> References: <66fc5921-0627-468a-892a-d1ae8e8feb47@loongson.cn> <6df77af9-4dbb-264e-9fc3-d928737b82f5@oracle.com> <3033ff07-93b2-c99b-f5fb-6b35bc03b4b9@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB724CC7@fmsmsx121.amr.corp.intel.com> <5bf19e73-eeee-0b85-7d53-8866756072c1@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB726AFB@fmsmsx121.amr.corp.intel.com> <1fa86506-797b-6f18-bcc3-1d7f8d2a32ca@loongson.cn> <53E8E64DB2403849AFD89B7D4DAC8B2AAB72806A@fmsmsx121.amr.corp.intel.com> <3724faa6-c57c-744a-d7e8-22daa4231078@loongson.cn> Message-ID: <93e239f5-a718-0d24-a799-9a4e586761c8@loongson.cn> Hi Vivek, Would you mind if I assign this issue[1] to you? I can't find an AVX-512 machine in our company to do more investigation. I'm sorry for that. Thanks a lot. Best regards, Jie [1] https://bugs.openjdk.java.net/browse/JDK-8227505 On 2019/8/23 ??8:53, Jie Fu wrote: > Hi Vivek, > > Thanks for your clarification. > Please seem comments inline. > > On 2019/8/23 ??3:26, Deshpande, Vivek R wrote: >> Hi Jie >> >> On AVX2 (256 bit vector) machine I did not observe the difference in >> the generated code, same as your observation. >> >> But on AVX3(512 bit/ 64 byte vector) machine the generated code with >> the patch was generating the AVX2 (256 bit) instructions instead of >> AVX3 (512 bit) instructions. >> So it is not able to use the complete vector width with the patch. >> As far as performance is concerned with this particular benchmark, >> that I have shared,? and with given number of iterations in the >> benchmark, I did not observe any difference with the patch from >> original. > As for your particular case, I don't think it's a problem to compile > with vector-256 since there is no performance drop compared with > vector-512. > Instead, I'd prefer using vector-256 to lower the risk of over loop > unrolling. > > Also I'm not sure whether the power consumption will increase if > vector-512 is used on your machine. > > >> So it's the difference in the generated code which is not using full >> vector width. > According to your performance analysis, vector-256 is good enough for > your test case. > What's the benefit to generate vector-512 for your case? > > Well, the patch doesn't disable the generation of vector-512 at all. > You can increase the NUM in your program from 1024 to 2048 or more and > try again. > Thanks. > > What do you think? > Any comments? > > Thanks a lot. > Best regards, > Jie > >> >> Regards, >> Vivek