From Xiaohong.Gong at arm.com Mon Jun 1 09:25:13 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Mon, 1 Jun 2020 09:25:13 +0000 Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled In-Reply-To: References: Message-ID: Hi, Ping again! Does anyone have idea about it please? Many thanks for any help! Thanks, Xiaohong Gong From: Xiaohong Gong Sent: Friday, May 29, 2020 2:26 PM To: hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Cc: nd Subject: RE: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled Add hotspot-runtime-dev at openjdk.java.net channel. Thanks! From: Xiaohong Gong Sent: Wednesday, May 27, 2020 5:19 PM To: hotspot-compiler-dev at openjdk.java.net Cc: nd > Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled Hi, Recently we found an issue that the JVM can crash in debug mode when the JVMCI compiler is used on the jvm variant that C2 is disabled (Add "-with-jvm-features=-compiler2" for configuration). The JVM crashes with the assertion fails: Internal Error (jdk/src/hotspot/share/compiler/compileBroker.cpp:891), pid=10824, tid=10825 # assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? It is obvious that the jvm cannot find a compiler since both the "_c2_count" and "_c1_count" is zero due to some internal issues. Since "TieredCompilation" is closed when C2 is disabled, the compile mode should be "interpreter+C1" by default, and it works well as expected. However, I'm confused about the expected behavior if the JVMCI compiler is specified to use. For one side, I thought it should use "interpreter+JVMCI" as the compile mode. If so we have to fix the issues. For another side, I noticed that there is a VM warning when using JVMCI compiler and disabling tiered compilation with normal configuration: "Disabling tiered compilation with non-native JVMCI compiler is not recommended". So considering that "TieredCompilation" is also closed when C2 is disabled, I thought it would be better to just invalid the JVMCI compiler for it. So my question is which should be the expected behavior, choose "interpreter+JVMCI" as the compile mode or make it invalid to use JVMCI compiler when C2 is disabled? It's very appreciative if I can get any opinion! Thanks, Xiaohong From aph at redhat.com Mon Jun 1 12:53:13 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Jun 2020 13:53:13 +0100 Subject: RFR(XS): Provide information when hitting a HaltNode for architectures other than x86 In-Reply-To: References: <92E14A43-E260-49D5-BF74-CB6331A2EB33@amazon.com> <0B03A385-BC1F-41B9-8B8F-02056BD5A706@amazon.com> <40eed1f3-27b9-5263-16c1-7563a6ff9082@arm.com> Message-ID: On 30/05/2020 00:24, Liu, Xin wrote: > Since JDK-8245986(aarch64) has been resolved, may I ask a sponsor to push this change? Done. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gnu.andrew at redhat.com Mon Jun 1 12:53:43 2020 From: gnu.andrew at redhat.com (Andrew Hughes) Date: Mon, 1 Jun 2020 13:53:43 +0100 Subject: [8u] RFR: 8237951: CTW: C2 compilation fails with "malformed control flow" In-Reply-To: <871rp8ek1x.fsf@redhat.com> References: <871rp8ek1x.fsf@redhat.com> Message-ID: <1ed737d5-a71a-4df6-00c5-befea79d4754@redhat.com> On 31/03/2020 14:22, Roland Westrelin wrote: > > The patch from the fix applies cleanly but it relies on > Node::find_out_with() that's missing from 8. The backport below cherry > picks that method from 8066312 (Add new Node* Node::find_out(int opc) > method). > > http://cr.openjdk.java.net/~roland/8237951.8u/webrev.00/ > > Initial change: > https://bugs.openjdk.java.net/browse/JDK-8237951 > https://hg.openjdk.java.net/jdk/jdk/rev/c7152f7e01a6 > > Tested with tier1. > > Roland. > Hmm, I'm not sure we should be cherry-picking this function for just the one use case. For consistency, we should either bring in JDK-8066312 (which doesn't amount to much more than is cherry-picked here) or replace the call in this patch with the function body, as is the case in other places in the code e.g. https://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/f7691a80458c/src/share/vm/opto/escape.cpp#l3105 Thanks, -- Andrew :) Senior Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 From aph at redhat.com Mon Jun 1 12:54:29 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Jun 2020 13:54:29 +0100 Subject: RFR(XS): Provide information when hitting a HaltNode for architectures other than x86 In-Reply-To: References: <92E14A43-E260-49D5-BF74-CB6331A2EB33@amazon.com> <0B03A385-BC1F-41B9-8B8F-02056BD5A706@amazon.com> <40eed1f3-27b9-5263-16c1-7563a6ff9082@arm.com> Message-ID: <7a0dbb2a-80b7-b768-f8c4-3577c2e9177c@redhat.com> On 01/06/2020 13:53, Andrew Haley wrote: > On 30/05/2020 00:24, Liu, Xin wrote: >> Since JDK-8245986(aarch64) has been resolved, may I ask a sponsor to push this change? > > Done. Next time, please make sure that the patch in webrev/jdk.patch is a real Hg webrev. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Jun 1 13:40:20 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Jun 2020 14:40:20 +0100 Subject: [8u] RFR: 8237951: CTW: C2 compilation fails with "malformed control flow" In-Reply-To: <1ed737d5-a71a-4df6-00c5-befea79d4754@redhat.com> References: <871rp8ek1x.fsf@redhat.com> <1ed737d5-a71a-4df6-00c5-befea79d4754@redhat.com> Message-ID: <684fa615-dc7b-f92e-6c92-a0289ed58bdc@redhat.com> On 01/06/2020 13:53, Andrew Hughes wrote: > Hmm, I'm not sure we should be cherry-picking this function for just the > one use case. For consistency, we should either bring in JDK-8066312 > (which doesn't amount to much more than is cherry-picked here) or > replace the call in this patch with the function body I already approved the patch. Bringing in all of 8066312 would be excessive, but if Roland prefers to use the function body rather than cheery pick the function I don't object. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Jun 1 16:22:41 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 09:22:41 -0700 Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled In-Reply-To: References: Message-ID: <555b435b-72ad-12aa-167f-67eefce8712a@oracle.com> It is https://bugs.openjdk.java.net/browse/JDK-8241779 Will work on it later. Regards, Vladimir On 6/1/20 2:25 AM, Xiaohong Gong wrote: > Hi, > > Ping again! Does anyone have idea about it please? Many thanks for any help! > > Thanks, > Xiaohong Gong > > From: Xiaohong Gong > Sent: Friday, May 29, 2020 2:26 PM > To: hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Cc: nd > Subject: RE: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled > > Add hotspot-runtime-dev at openjdk.java.net channel. Thanks! > > From: Xiaohong Gong > Sent: Wednesday, May 27, 2020 5:19 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: nd > > Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled > > Hi, > > Recently we found an issue that the JVM can crash in debug mode when the JVMCI compiler is used on > the jvm variant that C2 is disabled (Add "-with-jvm-features=-compiler2" for configuration). > The JVM crashes with the assertion fails: > > Internal Error (jdk/src/hotspot/share/compiler/compileBroker.cpp:891), pid=10824, tid=10825 > # assert(_c2_count > 0 || _c1_count > 0) failed: No compilers? > > It is obvious that the jvm cannot find a compiler since both the "_c2_count" and "_c1_count" is zero > due to some internal issues. Since "TieredCompilation" is closed when C2 is disabled, the compile mode > should be "interpreter+C1" by default, and it works well as expected. However, I'm confused about the > expected behavior if the JVMCI compiler is specified to use. > > For one side, I thought it should use "interpreter+JVMCI" as the compile mode. If so we have to fix the > issues. For another side, I noticed that there is a VM warning when using JVMCI compiler and disabling > tiered compilation with normal configuration: "Disabling tiered compilation with non-native JVMCI compiler > is not recommended". So considering that "TieredCompilation" is also closed when C2 is disabled, I thought > it would be better to just invalid the JVMCI compiler for it. > > So my question is which should be the expected behavior, choose "interpreter+JVMCI" as the compile > mode or make it invalid to use JVMCI compiler when C2 is disabled? > > It's very appreciative if I can get any opinion! > > Thanks, > Xiaohong > > > From xxinliu at amazon.com Mon Jun 1 17:33:16 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 1 Jun 2020 17:33:16 +0000 Subject: RFR(XS): Provide information when hitting a HaltNode for architectures other than x86 In-Reply-To: <7a0dbb2a-80b7-b768-f8c4-3577c2e9177c@redhat.com> References: <92E14A43-E260-49D5-BF74-CB6331A2EB33@amazon.com> <0B03A385-BC1F-41B9-8B8F-02056BD5A706@amazon.com> <40eed1f3-27b9-5263-16c1-7563a6ff9082@arm.com> <7a0dbb2a-80b7-b768-f8c4-3577c2e9177c@redhat.com> Message-ID: <4D1E4D62-889E-48EA-980F-55590AE850C8@amazon.com> Hi, Andrew, Thank you! I will make that happen. Thanks, --lx ?On 6/1/20, 5:54 AM, "Andrew Haley" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 01/06/2020 13:53, Andrew Haley wrote: > On 30/05/2020 00:24, Liu, Xin wrote: >> Since JDK-8245986(aarch64) has been resolved, may I ask a sponsor to push this change? > > Done. Next time, please make sure that the patch in webrev/jdk.patch is a real Hg webrev. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Jun 1 21:36:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 14:36:20 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: Message-ID: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> Hi Shravya, Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? This file is used only for AES intrinsic code - nothing else should be there. If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. It is difficult to judge the implementation code. I hope you ran all tests for it. Thanks, Vladimir On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: > Hi All, > > We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. > > > > Contributors: > > Shravya Rukmannagari(shravya.rukmannagari at intel.com) > > Greg B Tucker(greg.b.tucker at intel.com) > > > > I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. > > > > Please take a look and let me know if you have any questions or comments. > > > > Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 > > https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ > > > > Regards, > > Shravya Rukmannagari > From dean.long at oracle.com Mon Jun 1 22:10:53 2020 From: dean.long at oracle.com (Dean Long) Date: Mon, 1 Jun 2020 15:10:53 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> Message-ID: <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > To check the is_old as you suggest the target method has to be passed > to the cache_jvmti_state() as argument. Is it what you are suggesting? I believe you can use use _task->method()->is_old(), as the ciEnv already has the task. > Just want to make sure I understand you correctly. > > The cache_jvmti_state() and cache_dtrace_flags() are called in the > CompileBroker::init_compiler_runtime() for a ciEnv with the NULL > CompileTask > which looks unnecessary (or I don't understand it): > > bool CompileBroker::init_compiler_runtime() { > ? CompilerThread* thread = CompilerThread::current(); > ? . . . > ??? ciEnv ci_env((CompileTask*)NULL); > ??? // Cache Jvmti state > ??? ci_env.cache_jvmti_state(); > ??? // Cache DTrace flags > ??? ci_env.cache_dtrace_flags(); > These calls look unnecessary to me, as the ci_env will cache these again before compiling a method. I suggest removing these calls.? We should make sure the cache fields are initialized to sane values in the ciEnv ctor. > The JVMCI has a separate implementation for ciEnv which is jvmciEnv and > its own set of cache_jvmti_state() and jvmti_state_changed() functions. > Both are not called in the JVMCI case. > So, these checks look as broken in JVMCI now. > JVMCI is in better shape, because it doesn't transition out of _thread_in_vm state, but yes it needs similar changes. > Not sure, I have enough compiler knowledge to fix this at this stage > of release. > Would it better to file a separate hotspot/compiler RFE targeted to 16? > It can be assigned to me if it helps. > This is a P3 so I believe we have time to fix it for 15.? Please go ahead and let's see if we can get it in.? I can help with the JVMCI changes if they are not straightforward. dl > Thanks, > Serguei > > > On 5/28/20 10:54, Dean Long wrote: >> Sure, you could just have cache_jvmti_state() return a boolean to >> bail out immediately for is_old. >> >> dl >> >> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>> Hi Dean, >>> >>> Thank you for looking at this! >>> Okay. Let me check what cab be done in this direction. >>> There is no point to cache is_old. The compilation has to bail out >>> if it is discovered to be true. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 5/28/20 00:59, Dean Long wrote: >>>> This seems OK as long as the memory barriers in the thread state >>>> transitions prevent the C++ compiler from doing something like >>>> reading is_old before reading redefinition_count.? I would feel >>>> better if both JVMCI and C1/C2 cached is_old and redefinition_count >>>> at the same time (making sure to be in the _thread_in_vm state), >>>> then bail out based on the cached value of is_old. >>>> >>>> dl >>>> >>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>> Please, review a fix for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>> >>>>>> >>>>>> Summary: >>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>> enabled does >>>>>> ? a lot of class retransformations in parallel with all other >>>>>> stressing. >>>>>> ? It provokes the assert at the compiled code installation time: >>>>>> ??? assert(!method->is_old()) failed: Should not be installing >>>>>> old methods >>>>>> >>>>>> ? The problem is that the >>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>> ? (non-JVMCI tiered compilation) is missing the check that exists >>>>>> in the JVMCI >>>>>> ? part of implementation: >>>>>> 2148 // Skip redefined methods >>>>>> 2149 if (target_handle->is_old()) { >>>>>> 2150 failure_reason = "redefined method"; >>>>>> 2151 retry_message = "not retryable"; >>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>> 2153 } else { >>>>>> . . . >>>>>> 2168 } >>>>>> >>>>>> ? The fix is to add this check. >>>>> >>>>> Sorry, forgot to explain one thing. >>>>> Compiler code has a special mechanism to ensure the JVMTI class >>>>> redefinition did >>>>> not happen while the method was compiled, so all the assumptions >>>>> remain correct. >>>>> 2190 // Cache Jvmti state >>>>> 2191 ci_env.cache_jvmti_state(); >>>>> Part of this is a check that the value of >>>>> JvmtiExport::redefinition_count() is >>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>> The JvmtiExport::redefinition_count() value change means a class >>>>> redefinition >>>>> happened which also implies some of methods may become old. >>>>> However, the method being compiled can be already old at the point >>>>> where the >>>>> redefinition counter is cached, so the redefinition counter check >>>>> does not help much. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> Testing: >>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>> Will also submit hs tiers1-5. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>> >>>> >>> >> > From vladimir.kozlov at oracle.com Tue Jun 2 01:40:35 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 18:40:35 -0700 Subject: RFR: 8245452: Clean up compressed pointer logic in lcm.cpp In-Reply-To: <9e93fcd4-a6ba-0705-05b4-581fa9d39482@oracle.com> References: <9e93fcd4-a6ba-0705-05b4-581fa9d39482@oracle.com> Message-ID: <68da9289-8b07-e6e0-a642-02cd58d0bd73@oracle.com> Looks good. Thanks, Vladimir On 5/27/20 6:14 AM, Erik ?sterlund wrote: > Hi, > > After my change enabling compressed class pointers when compressed oops is disabled, > Vladimir Kozlov pointed out that there is potential for simplifying some code in lcm.cpp > that uses various checks if there is any form of compressed class/oop pointers with shift 0, > as a way of using either the base->get_ptr_type() or base->bottom_type()->is_ptr() of > a base pointer. These tests have always had false positives, where the base->get_ptr_type() > is used when there is no way it could be a compressed pointer with shift 0. > > This dance is not really necessary if we just use the base->get_ptr_type() always, instead of > carefully figuring out when we can use the bottom type. Because it works in both cases. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8245452 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8245452/webrev.00/ > > Thanks, > /Erik From vladimir.kozlov at oracle.com Tue Jun 2 01:56:16 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 18:56:16 -0700 Subject: [15] RFR(S): 8245957: Remove unused LIR_OpBranch::type after SPARC port removal In-Reply-To: References: Message-ID: <4bf6b46a-4a3f-225a-0513-ba3965c085a8@oracle.com> +1 for 2) Thanks, Vladimir On 5/27/20 6:42 AM, Doerr, Martin wrote: > Hi Tobias, > >> I would prefer 2). > +1 > > Thanks for cleaning this up. Looks good to me. > > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev > bounces at openjdk.java.net> On Behalf Of Tobias Hartmann >> Sent: Mittwoch, 27. Mai 2020 14:04 >> To: hotspot compiler >> Subject: [15] RFR(S): 8245957: Remove unused LIR_OpBranch::type after >> SPARC port removal >> >> Hi, >> >> please review the following patch that removes LIR_OpBranch::type after >> the only remaining usage [1] >> was removed with the SPARC port removal (JDK-8244224). >> >> https://bugs.openjdk.java.net/browse/JDK-8245957 >> >> We have two options: >> 1) Keep the type asserts in LIR_OpBranch::branch: >> http://cr.openjdk.java.net/~thartmann/8245957/webrev.v1.00/ >> 2) Remove the asserts: >> http://cr.openjdk.java.net/~thartmann/8245957/webrev.v2.00/ >> >> I would prefer 2). >> >> Best regards, >> Tobias >> >> [1] >> https://hg.openjdk.java.net/jdk/jdk/file/ae7ed29a5f70/src/hotspot/cpu/spa >> rc/c1_LIRAssembler_sparc.cpp#l597 From vladimir.kozlov at oracle.com Tue Jun 2 02:09:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 19:09:43 -0700 Subject: [15] RFR(XS): 8239083: C1 assert(known_holder == NULL || (known_holder->is_instance_klass() && (!known_holder->is_interface() || ((ciInstanceKlass*)known_holder)->has_nonstatic_concrete_methods())), "should be non-static concrete method"); In-Reply-To: <7bf13c9a-a1ce-a6a1-979c-50a1ea7a80bf@oracle.com> References: <1dd061e7-f872-877e-b574-08e578f006ba@oracle.com> <7bf13c9a-a1ce-a6a1-979c-50a1ea7a80bf@oracle.com> Message-ID: <19b9e678-4a8d-7d70-747e-24870b9bb6d0@oracle.com> +1 Thanks, Vladimir On 5/26/20 6:56 AM, Tobias Hartmann wrote: > Hi Christian, > > looks reasonable to me. > > Best regards, > Tobias > > On 15.05.20 11:11, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8239083 >> http://cr.openjdk.java.net/~chagedorn/8239083/webrev.00/ >> >> The assert fails in the test case when invoking the only static interface method with a method >> handle. In this case, known_holder is non-NULL. However, known_holder would be set to NULL at [1] >> since the call returns NULL when known_holder is an interface. >> >> In the failing test case, known_holder is non-NULL since GraphBuilder::try_method_handle_inline() >> calls GraphBuilder::try_inline() with holder_known set to true which eventually lets profile_call() >> to be called with a non-NULL known_holder argument. >> >> On the other hand, when calling a static method without a method handle, known_holder seems to be >> always NULL: >> profile_call() is called directly at [2] with NULL or indirectly via try_inline() [3]. In the latter >> case, cha_monomorphic_target and exact_target are always NULL for static methods and therefore >> known_holder will also be always NULL in profile_call(). >> >> We could therefore just remove the assert which seems to be too strong (not handling this edge >> case). Another option would be to change the call to try_inline() in try_method_handle_inline() to >> only set holder_known to true if the target is not static. The known_holder is eventually only used >> in LIR_Assembler::emit_profile_call() [4] but only if op->should_profile_receiver_type() holds [5]. >> This is only true if the callee is not static [6]. The webrev uses the second approach. >> >> What do you think? >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l4386 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l3571 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l2039 >> [4] >> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3589 >> [5] >> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3584 >> [6] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_LIR.hpp#l1916 From Xiaohong.Gong at arm.com Tue Jun 2 02:41:33 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Tue, 2 Jun 2020 02:41:33 +0000 Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled In-Reply-To: <555b435b-72ad-12aa-167f-67eefce8712a@oracle.com> References: <555b435b-72ad-12aa-167f-67eefce8712a@oracle.com> Message-ID: Hi Vladimir, > It is https://bugs.openjdk.java.net/browse/JDK-8241779 Thanks for looking at it! So the answer is that the JVMCI feature should work well even if C2 is disabled, is it? > Will work on it later. Looks great! Thanks so much! Thanks, Xiaohong From vladimir.kozlov at oracle.com Tue Jun 2 02:44:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Jun 2020 19:44:19 -0700 Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled In-Reply-To: References: <555b435b-72ad-12aa-167f-67eefce8712a@oracle.com> Message-ID: On 6/1/20 7:41 PM, Xiaohong Gong wrote: > Hi Vladimir, > > > It is https://bugs.openjdk.java.net/browse/JDK-8241779 > > Thanks for looking at it! So the answer is that the JVMCI feature should work well even if C2 is disabled, is it? Yes. > > > Will work on it later. > > Looks great! Thanks so much! > > Thanks, > Xiaohong > From Xiaohong.Gong at arm.com Tue Jun 2 02:47:48 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Tue, 2 Jun 2020 02:47:48 +0000 Subject: Question about the expected behavior if JVMCI compiler is used on the jvm variant with C2 disabled In-Reply-To: References: <555b435b-72ad-12aa-167f-67eefce8712a@oracle.com> Message-ID: > > Hi Vladimir, > > > > > It is https://bugs.openjdk.java.net/browse/JDK-8241779 > > > > Thanks for looking at it! So the answer is that the JVMCI feature > should work well even if C2 is disabled, is it? > > Yes. Great! It make sense! Thanks for the help! Best Regards, Xiaohong Gong From tobias.hartmann at oracle.com Tue Jun 2 06:30:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:30:14 +0200 Subject: [15] RFR(S): 8245957: Remove unused LIR_OpBranch::type after SPARC port removal In-Reply-To: <4bf6b46a-4a3f-225a-0513-ba3965c085a8@oracle.com> References: <4bf6b46a-4a3f-225a-0513-ba3965c085a8@oracle.com> Message-ID: Thanks Vladimir! Best regards, Tobias On 02.06.20 03:56, Vladimir Kozlov wrote: > +1 for 2) > > Thanks, > Vladimir > > On 5/27/20 6:42 AM, Doerr, Martin wrote: >> Hi Tobias, >> >>> I would prefer 2). >> +1 >> >> Thanks for cleaning this up. Looks good to me. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> bounces at openjdk.java.net> On Behalf Of Tobias Hartmann >>> Sent: Mittwoch, 27. Mai 2020 14:04 >>> To: hotspot compiler >>> Subject: [15] RFR(S): 8245957: Remove unused LIR_OpBranch::type after >>> SPARC port removal >>> >>> Hi, >>> >>> please review the following patch that removes LIR_OpBranch::type after >>> the only remaining usage [1] >>> was removed with the SPARC port removal (JDK-8244224). >>> >>> https://bugs.openjdk.java.net/browse/JDK-8245957 >>> >>> We have two options: >>> 1) Keep the type asserts in LIR_OpBranch::branch: >>> http://cr.openjdk.java.net/~thartmann/8245957/webrev.v1.00/ >>> 2) Remove the asserts: >>> http://cr.openjdk.java.net/~thartmann/8245957/webrev.v2.00/ >>> >>> I would prefer 2). >>> >>> Best regards, >>> Tobias >>> >>> [1] >>> https://hg.openjdk.java.net/jdk/jdk/file/ae7ed29a5f70/src/hotspot/cpu/spa >>> rc/c1_LIRAssembler_sparc.cpp#l597 From tobias.hartmann at oracle.com Tue Jun 2 06:30:28 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:30:28 +0200 Subject: [15] RFR(XS): 8239083: C1 assert(known_holder == NULL || (known_holder->is_instance_klass() && (!known_holder->is_interface() || ((ciInstanceKlass*)known_holder)->has_nonstatic_concrete_methods())), "should be non-static concrete method"); In-Reply-To: <19b9e678-4a8d-7d70-747e-24870b9bb6d0@oracle.com> References: <1dd061e7-f872-877e-b574-08e578f006ba@oracle.com> <7bf13c9a-a1ce-a6a1-979c-50a1ea7a80bf@oracle.com> <19b9e678-4a8d-7d70-747e-24870b9bb6d0@oracle.com> Message-ID: <7700eb3f-9d6c-80a6-4266-31c7b84b1263@oracle.com> Thanks Vladimir! Best regards, Tobias On 02.06.20 04:09, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir > > On 5/26/20 6:56 AM, Tobias Hartmann wrote: >> Hi Christian, >> >> looks reasonable to me. >> >> Best regards, >> Tobias >> >> On 15.05.20 11:11, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8239083 >>> http://cr.openjdk.java.net/~chagedorn/8239083/webrev.00/ >>> >>> The assert fails in the test case when invoking the only static interface method with a method >>> handle. In this case, known_holder is non-NULL. However, known_holder would be set to NULL at [1] >>> since the call returns NULL when known_holder is an interface. >>> >>> In the failing test case, known_holder is non-NULL since GraphBuilder::try_method_handle_inline() >>> calls GraphBuilder::try_inline() with holder_known set to true which eventually lets profile_call() >>> to be called with a non-NULL known_holder argument. >>> >>> On the other hand, when calling a static method without a method handle, known_holder seems to be >>> always NULL: >>> profile_call() is called directly at [2] with NULL or indirectly via try_inline() [3]. In the latter >>> case, cha_monomorphic_target and exact_target are always NULL for static methods and therefore >>> known_holder will also be always NULL in profile_call(). >>> >>> We could therefore just remove the assert which seems to be too strong (not handling this edge >>> case). Another option would be to change the call to try_inline() in try_method_handle_inline() to >>> only set holder_known to true if the target is not static. The known_holder is eventually only used >>> in LIR_Assembler::emit_profile_call() [4] but only if op->should_profile_receiver_type() holds [5]. >>> This is only true if the callee is not static [6]. The webrev uses the second approach. >>> >>> What do you think? >>> >>> Best regards, >>> Christian >>> >>> >>> [1] >>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l4386 >>> [2] >>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l3571 >>> [3] >>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l2039 >>> [4] >>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3589 >>> >>> [5] >>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3584 >>> >>> [6] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_LIR.hpp#l1916 From tobias.hartmann at oracle.com Tue Jun 2 06:31:10 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:31:10 +0200 Subject: [15] RFR(XS): 8239083: C1 assert(known_holder == NULL || (known_holder->is_instance_klass() && (!known_holder->is_interface() || ((ciInstanceKlass*)known_holder)->has_nonstatic_concrete_methods())), "should be non-static concrete method"); In-Reply-To: <7700eb3f-9d6c-80a6-4266-31c7b84b1263@oracle.com> References: <1dd061e7-f872-877e-b574-08e578f006ba@oracle.com> <7bf13c9a-a1ce-a6a1-979c-50a1ea7a80bf@oracle.com> <19b9e678-4a8d-7d70-747e-24870b9bb6d0@oracle.com> <7700eb3f-9d6c-80a6-4266-31c7b84b1263@oracle.com> Message-ID: <3eb25862-38c3-56a7-ed0b-aecfabb282bb@oracle.com> Oops, pressed reply on the wrong email :D Best regards, Tobias On 02.06.20 08:30, Tobias Hartmann wrote: > Thanks Vladimir! > > Best regards, > Tobias > > On 02.06.20 04:09, Vladimir Kozlov wrote: >> +1 >> >> Thanks, >> Vladimir >> >> On 5/26/20 6:56 AM, Tobias Hartmann wrote: >>> Hi Christian, >>> >>> looks reasonable to me. >>> >>> Best regards, >>> Tobias >>> >>> On 15.05.20 11:11, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8239083 >>>> http://cr.openjdk.java.net/~chagedorn/8239083/webrev.00/ >>>> >>>> The assert fails in the test case when invoking the only static interface method with a method >>>> handle. In this case, known_holder is non-NULL. However, known_holder would be set to NULL at [1] >>>> since the call returns NULL when known_holder is an interface. >>>> >>>> In the failing test case, known_holder is non-NULL since GraphBuilder::try_method_handle_inline() >>>> calls GraphBuilder::try_inline() with holder_known set to true which eventually lets profile_call() >>>> to be called with a non-NULL known_holder argument. >>>> >>>> On the other hand, when calling a static method without a method handle, known_holder seems to be >>>> always NULL: >>>> profile_call() is called directly at [2] with NULL or indirectly via try_inline() [3]. In the latter >>>> case, cha_monomorphic_target and exact_target are always NULL for static methods and therefore >>>> known_holder will also be always NULL in profile_call(). >>>> >>>> We could therefore just remove the assert which seems to be too strong (not handling this edge >>>> case). Another option would be to change the call to try_inline() in try_method_handle_inline() to >>>> only set holder_known to true if the target is not static. The known_holder is eventually only used >>>> in LIR_Assembler::emit_profile_call() [4] but only if op->should_profile_receiver_type() holds [5]. >>>> This is only true if the callee is not static [6]. The webrev uses the second approach. >>>> >>>> What do you think? >>>> >>>> Best regards, >>>> Christian >>>> >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l4386 >>>> [2] >>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l3571 >>>> [3] >>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l2039 >>>> [4] >>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3589 >>>> >>>> [5] >>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3584 >>>> >>>> [6] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_LIR.hpp#l1916 From tobias.hartmann at oracle.com Tue Jun 2 06:31:36 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:31:36 +0200 Subject: [15] RFR(XS): 8246153: TestEliminateArrayCopy fails with -XX:+StressReflectiveCode In-Reply-To: References: <66126747-f74c-57f2-8960-4cfa603fbf10@oracle.com> Message-ID: <28ab29ca-df08-3640-a044-167c745b375a@oracle.com> Thanks Vladimir! Best regards, Tobias On 29.05.20 18:52, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 5/29/20 5:46 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8246153 >> http://cr.openjdk.java.net/~thartmann/8246153/webrev.00/ >> >> With -XX:+StressReflectiveCode, loads from the layout helper emitted by GraphKit::get_layout_helper >> are not folded (usually done via LoadNode::Value -> LoadNode::load_array_final_field). As a result, >> the control input of the AllocateNode does not directly point to the MemBar but to the >> initial_slow_test emitted by GraphKit::new_instance that has not been folded either. >> >> Instead of using the control input to find the MemBar when removing allocations after scalar >> replacement, we should simply use the memory input. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Jun 2 06:33:41 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:33:41 +0200 Subject: RFR(M): 8244660: Code cache sweeper heuristics is broken In-Reply-To: References: <0688678b-986b-082c-425e-543c3c32b094@oracle.com> <1e06ca0e-803a-416f-2313-0f9e53aa94ba@oracle.com> <577ab253-b878-92b0-b170-14bac54173a4@oracle.com> Message-ID: <2302072f-d4f6-8c23-1eca-30392de3da64@oracle.com> Hi Nils, On 29.05.20 22:12, Nils Eliasson wrote: > New webrev: http://cr.openjdk.java.net/~neliasso/8244660/webrev.04 Looks good. Ship it! Best regards, Tobias From nils.eliasson at oracle.com Tue Jun 2 06:54:30 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 2 Jun 2020 08:54:30 +0200 Subject: [15] RFR(XS): 8246153: TestEliminateArrayCopy fails with -XX:+StressReflectiveCode In-Reply-To: References: <66126747-f74c-57f2-8960-4cfa603fbf10@oracle.com> Message-ID: <32932f75-65a6-20b1-763d-a84c8c39c969@oracle.com> +1 Best regards, Nils On 2020-05-29 18:52, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 5/29/20 5:46 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8246153 >> http://cr.openjdk.java.net/~thartmann/8246153/webrev.00/ >> >> With -XX:+StressReflectiveCode, loads from the layout helper emitted >> by GraphKit::get_layout_helper >> are not folded (usually done via LoadNode::Value -> >> LoadNode::load_array_final_field). As a result, >> the control input of the AllocateNode does not directly point to the >> MemBar but to the >> initial_slow_test emitted by GraphKit::new_instance that has not been >> folded either. >> >> Instead of using the control input to find the MemBar when removing >> allocations after scalar >> replacement, we should simply use the memory input. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Jun 2 06:55:46 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 08:55:46 +0200 Subject: [15] RFR(XS): 8246153: TestEliminateArrayCopy fails with -XX:+StressReflectiveCode In-Reply-To: <32932f75-65a6-20b1-763d-a84c8c39c969@oracle.com> References: <66126747-f74c-57f2-8960-4cfa603fbf10@oracle.com> <32932f75-65a6-20b1-763d-a84c8c39c969@oracle.com> Message-ID: Thanks Nils! Best regards, Tobias On 02.06.20 08:54, Nils Eliasson wrote: > +1 > > Best regards, > Nils > > On 2020-05-29 18:52, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks, >> Vladimir >> >> On 5/29/20 5:46 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8246153 >>> http://cr.openjdk.java.net/~thartmann/8246153/webrev.00/ >>> >>> With -XX:+StressReflectiveCode, loads from the layout helper emitted by GraphKit::get_layout_helper >>> are not folded (usually done via LoadNode::Value -> LoadNode::load_array_final_field). As a result, >>> the control input of the AllocateNode does not directly point to the MemBar but to the >>> initial_slow_test emitted by GraphKit::new_instance that has not been folded either. >>> >>> Instead of using the control input to find the MemBar when removing allocations after scalar >>> replacement, we should simply use the memory input. >>> >>> Thanks, >>> Tobias >>> > From christian.hagedorn at oracle.com Tue Jun 2 07:07:33 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 2 Jun 2020 09:07:33 +0200 Subject: [15] RFR(XS): 8239083: C1 assert(known_holder == NULL || (known_holder->is_instance_klass() && (!known_holder->is_interface() || ((ciInstanceKlass*)known_holder)->has_nonstatic_concrete_methods())), "should be non-static concrete method"); In-Reply-To: <3eb25862-38c3-56a7-ed0b-aecfabb282bb@oracle.com> References: <1dd061e7-f872-877e-b574-08e578f006ba@oracle.com> <7bf13c9a-a1ce-a6a1-979c-50a1ea7a80bf@oracle.com> <19b9e678-4a8d-7d70-747e-24870b9bb6d0@oracle.com> <7700eb3f-9d6c-80a6-4266-31c7b84b1263@oracle.com> <3eb25862-38c3-56a7-ed0b-aecfabb282bb@oracle.com> Message-ID: <3c2e3761-d6da-863c-c1cc-45e731bc7be2@oracle.com> Thank you Vladimir and Tobias for your reviews! Best regards, Christian On 02.06.20 08:31, Tobias Hartmann wrote: > Oops, pressed reply on the wrong email :D > > Best regards, > Tobias > > > On 02.06.20 08:30, Tobias Hartmann wrote: >> Thanks Vladimir! >> >> Best regards, >> Tobias >> >> On 02.06.20 04:09, Vladimir Kozlov wrote: >>> +1 >>> >>> Thanks, >>> Vladimir >>> >>> On 5/26/20 6:56 AM, Tobias Hartmann wrote: >>>> Hi Christian, >>>> >>>> looks reasonable to me. >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 15.05.20 11:11, Christian Hagedorn wrote: >>>>> Hi >>>>> >>>>> Please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8239083 >>>>> http://cr.openjdk.java.net/~chagedorn/8239083/webrev.00/ >>>>> >>>>> The assert fails in the test case when invoking the only static interface method with a method >>>>> handle. In this case, known_holder is non-NULL. However, known_holder would be set to NULL at [1] >>>>> since the call returns NULL when known_holder is an interface. >>>>> >>>>> In the failing test case, known_holder is non-NULL since GraphBuilder::try_method_handle_inline() >>>>> calls GraphBuilder::try_inline() with holder_known set to true which eventually lets profile_call() >>>>> to be called with a non-NULL known_holder argument. >>>>> >>>>> On the other hand, when calling a static method without a method handle, known_holder seems to be >>>>> always NULL: >>>>> profile_call() is called directly at [2] with NULL or indirectly via try_inline() [3]. In the latter >>>>> case, cha_monomorphic_target and exact_target are always NULL for static methods and therefore >>>>> known_holder will also be always NULL in profile_call(). >>>>> >>>>> We could therefore just remove the assert which seems to be too strong (not handling this edge >>>>> case). Another option would be to change the call to try_inline() in try_method_handle_inline() to >>>>> only set holder_known to true if the target is not static. The known_holder is eventually only used >>>>> in LIR_Assembler::emit_profile_call() [4] but only if op->should_profile_receiver_type() holds [5]. >>>>> This is only true if the callee is not static [6]. The webrev uses the second approach. >>>>> >>>>> What do you think? >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> >>>>> [1] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l4386 >>>>> [2] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l3571 >>>>> [3] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_GraphBuilder.cpp#l2039 >>>>> [4] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3589 >>>>> >>>>> [5] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#l3584 >>>>> >>>>> [6] http://hg.openjdk.java.net/jdk/jdk/file/dd0caf00b05c/src/hotspot/share/c1/c1_LIR.hpp#l1916 From nils.eliasson at oracle.com Tue Jun 2 07:51:54 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 2 Jun 2020 09:51:54 +0200 Subject: RFR(S): 8245021: Add method 'remove_if_existing' to growableArray. In-Reply-To: <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> References: <054bdcb1-9543-eefc-b814-60ad5ab641d3@oracle.com> <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> Message-ID: <9c722439-2b3f-a94f-baa6-2ac9aef825c4@oracle.com> +1 Best regards, Nils Eliasson On 2020-05-19 11:33, Tobias Hartmann wrote: > Hi Patric, > > Looks good to me but please add brackets around the for loop. > > Also, there are some more cases of this code pattern. For example, > JvmtiPendingMonitors::destroy/exit and > ShenandoahBarrierSetC2State::remove_enqueue_barrier/remove_load_reference_barrier. > > Best regards, > Tobias > > On 18.05.20 22:37, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8245021 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8245021/ >> >> >> 8245021: Add method 'remove_if_existing' to growableArray. >> >> Minor improvement to simplify the code pattern "if contains then remove" found in a few places (in >> "compile.hpp"). >> >> >> Testing: hs-tier1-3 >> >> >> Best regards, >> Patric >> From nils.eliasson at oracle.com Tue Jun 2 08:19:28 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 2 Jun 2020 10:19:28 +0200 Subject: RFR: 8245452: Clean up compressed pointer logic in lcm.cpp In-Reply-To: <68da9289-8b07-e6e0-a642-02cd58d0bd73@oracle.com> References: <9e93fcd4-a6ba-0705-05b4-581fa9d39482@oracle.com> <68da9289-8b07-e6e0-a642-02cd58d0bd73@oracle.com> Message-ID: <9b4c4505-98df-98aa-237a-f577a320f496@oracle.com> +1 Best regards, Nils On 2020-06-02 03:40, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 5/27/20 6:14 AM, Erik ?sterlund wrote: >> Hi, >> >> After my change enabling compressed class pointers when compressed >> oops is disabled, >> Vladimir Kozlov pointed out that there is potential for simplifying >> some code in lcm.cpp >> that uses various checks if there is any form of compressed class/oop >> pointers with shift 0, >> as a way of using either the base->get_ptr_type() or >> base->bottom_type()->is_ptr() of >> a base pointer. These tests have always had false positives, where >> the base->get_ptr_type() >> is used when there is no way it could be a compressed pointer with >> shift 0. >> >> This dance is not really necessary if we just use the >> base->get_ptr_type() always, instead of >> carefully figuring out when we can use the bottom type. Because it >> works in both cases. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8245452 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8245452/webrev.00/ >> >> Thanks, >> /Erik From aph at redhat.com Tue Jun 2 09:44:17 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 2 Jun 2020 10:44:17 +0100 Subject: [aarch64-port-dev ] RFR:8246051:[AArch64]SIGBUS by unaligned Unsafe compare_and_swap In-Reply-To: <6a308572-98db-4603-81c7-159833ebc15e.zhuoren.wz@alibaba-inc.com> References: <497b376c-561c-c40c-add6-a63af8736a3c@redhat.com> <6a308572-98db-4603-81c7-159833ebc15e.zhuoren.wz@alibaba-inc.com> Message-ID: <88adc311-4d0e-4037-f2fb-f581e1a9d0b8@redhat.com> On 02/06/2020 09:29, Wang Zhuo(Zhuoren) wrote: > Updated the test. Not AArch64-only now. > http://cr.openjdk.java.net/~wzhuo/8246051/webrev.02/ OK, looks good. It should work on PPC and similar as well. > BTW, the behavior of unaligned Unsafe swap on aarch64(throw > InternalError) are different from on X86(do the swap). Not sure > whether the difference makes sence. We need to make sure that the VM doesn't error out and exit; there's not much more we can do. It's all rather horrible. "Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel ? AtomTM, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided." https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf Aiee! Run away! :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Jun 2 10:40:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 12:40:14 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <87ftbjyz7y.fsf@redhat.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> Message-ID: <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> Hi Roland, Looks good to me. I'll run some testing and report back. Maybe add a @requires vm.flavor == "server" to the test because the ArrayCopyLoadStoreMaxElem flag is only available with C2 (no new webrev required). Best regards, Tobias On 29.05.20 09:30, Roland Westrelin wrote: > >> https://bugs.openjdk.java.net/browse/JDK-8245714 >> http://cr.openjdk.java.net/~roland/8245714/webrev.00/ >> >> This triggers when data nodes are pinned on the uncommon trap path of a >> predicate. When a new predicate is added, a region is created to merge >> the paths comming from the place holder and the new predicate. Data >> nodes pinned on the uncommon path for the place holder are then updated >> to be pinned on the new region. That logic updates the control edge but >> not the control that loop opts keep track of. This causes a crash with >> the test case of the webrev where the predicate is a loop limit check. > > That fix is incomplete. If the Load that's pinned on the uncommon trap > path is a LoadN then there's a DecodeN between the uncommon trap and the > Load. The control of the DecodeN also needs to be updated. Here is an > updated fix: > > http://cr.openjdk.java.net/~roland/8245714/webrev.01/ > > This one uses lazy_replace. I'm concerned that other nodes (maybe an > AddP) would be assigned the projection as control and need to be > moved. With lazy_replace, all nodes are guaranteed to be properly > updated. > > Roland. > From tobias.hartmann at oracle.com Tue Jun 2 10:47:44 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 12:47:44 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> Message-ID: On 02.06.20 12:40, Tobias Hartmann wrote: > Maybe add a @requires vm.flavor == "server" to the test because the ArrayCopyLoadStoreMaxElem flag > is only available with C2 (no new webrev required). Or "requires vm.compiler2.enabled" Best regards, Tobias From tobias.hartmann at oracle.com Tue Jun 2 10:51:54 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 12:51:54 +0200 Subject: RFR(S): 8244086: Following 8241492, strip mined loop may run extra iterations In-Reply-To: References: <87wo5y8z2v.fsf@redhat.com> <878sid8jzn.fsf@redhat.com> <87zhat6voh.fsf@redhat.com> <87wo5s6tvs.fsf@redhat.com> <87eery7sr7.fsf@redhat.com> <87imggyvzg.fsf@redhat.com> Message-ID: <689d2f08-636e-2d2d-8516-8387c346accc@oracle.com> Hi Roland, >> http://cr.openjdk.java.net/~roland/8244086/webrev.01/ Looks good to me. Style comments: - line 1687: excess whitespace after "( ". Also, I would not linebreak inside the "stride < 0" expression. - line 1688: "the the" - line 1690: remove linebreak at end Will run some testing and report back. Best regards, Tobias From tobias.hartmann at oracle.com Tue Jun 2 11:00:16 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 13:00:16 +0200 Subject: RFR: 8245158: C2: Enable SLP for some manually unrolled loops In-Reply-To: References: <87ftbm26e8.fsf@redhat.com> <47d371c9-d647-4e97-19f5-330831181ceb@oracle.com> Message-ID: <408a3b7a-50b1-ebcf-e2fb-fe88e72089ac@oracle.com> Hi Pengfei, there was a problem with the test infra. I've submitted some testing for you. Will report back once it finished. Best regards, Tobias On 28.05.20 12:03, Pengfei Li wrote: > BTW: I've pushed twice to the submit repo > http://hg.openjdk.java.net/jdk/submit/rev/6ff334698002 (branch JDK-8245158) > http://hg.openjdk.java.net/jdk/submit/rev/b88caaa3f01d (branch JDK-8245158-1) > but got no report email from Mach5. > > -- > Thanks, > Pengfei > >> Thanks Roland and Tobias for looking at this. >> >> Hi Tobias, >> >> I've pushed this patch to the JDK submit repo but don't get the test report >> email. Could you or other Oracle engineer help have a check? >> >> -- >> Thanks, >> Pengfei >> >>> -----Original Message----- >>> From: Tobias Hartmann >>> Sent: Tuesday, May 26, 2020 21:29 >>> To: Roland Westrelin ; Pengfei Li >>> ; hotspot-compiler-dev at openjdk.java.net >>> Cc: Vladimir Kozlov ; nd >>> Subject: Re: RFR: 8245158: C2: Enable SLP for some manually unrolled >>> loops >>> >>> +1 >>> >>> Best regards, >>> Tobias >>> >>> On 26.05.20 15:05, Roland Westrelin wrote: >>>> >>>>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8245158/webrev.00/ >>>> >>>> That looks reasonable to me. >>>> >>>> Roland. >>>> From tobias.hartmann at oracle.com Tue Jun 2 11:01:05 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 13:01:05 +0200 Subject: RFR: 8245158: C2: Enable SLP for some manually unrolled loops In-Reply-To: <408a3b7a-50b1-ebcf-e2fb-fe88e72089ac@oracle.com> References: <87ftbm26e8.fsf@redhat.com> <47d371c9-d647-4e97-19f5-330831181ceb@oracle.com> <408a3b7a-50b1-ebcf-e2fb-fe88e72089ac@oracle.com> Message-ID: <4ec1e9af-10a3-94ac-a3a7-5c30a564be59@oracle.com> Okay, just noticed that you've already pushed it. Best regards, Tobias On 02.06.20 13:00, Tobias Hartmann wrote: > Hi Pengfei, > > there was a problem with the test infra. I've submitted some testing for you. Will report back once > it finished. > > Best regards, > Tobias > > On 28.05.20 12:03, Pengfei Li wrote: >> BTW: I've pushed twice to the submit repo >> http://hg.openjdk.java.net/jdk/submit/rev/6ff334698002 (branch JDK-8245158) >> http://hg.openjdk.java.net/jdk/submit/rev/b88caaa3f01d (branch JDK-8245158-1) >> but got no report email from Mach5. >> >> -- >> Thanks, >> Pengfei >> >>> Thanks Roland and Tobias for looking at this. >>> >>> Hi Tobias, >>> >>> I've pushed this patch to the JDK submit repo but don't get the test report >>> email. Could you or other Oracle engineer help have a check? >>> >>> -- >>> Thanks, >>> Pengfei >>> >>>> -----Original Message----- >>>> From: Tobias Hartmann >>>> Sent: Tuesday, May 26, 2020 21:29 >>>> To: Roland Westrelin ; Pengfei Li >>>> ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: Vladimir Kozlov ; nd >>>> Subject: Re: RFR: 8245158: C2: Enable SLP for some manually unrolled >>>> loops >>>> >>>> +1 >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 26.05.20 15:05, Roland Westrelin wrote: >>>>> >>>>>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8245158/webrev.00/ >>>>> >>>>> That looks reasonable to me. >>>>> >>>>> Roland. >>>>> From tobias.hartmann at oracle.com Tue Jun 2 12:50:43 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 14:50:43 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> Message-ID: <363917e3-72a7-8e2a-8c27-a934f21aa45a@oracle.com> On 02.06.20 12:40, Tobias Hartmann wrote: > I'll run some testing and report back. All green. Best regards, Tobias From zhuoren.wz at alibaba-inc.com Tue Jun 2 08:29:53 2020 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Tue, 02 Jun 2020 16:29:53 +0800 Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gUkZSOjgyNDYwNTE6W0FBcmNoNjRdU0lHQlVTIGJ5?= =?UTF-8?B?IHVuYWxpZ25lZCBVbnNhZmUgY29tcGFyZV9hbmRfc3dhcA==?= In-Reply-To: <497b376c-561c-c40c-add6-a63af8736a3c@redhat.com> References: , <497b376c-561c-c40c-add6-a63af8736a3c@redhat.com> Message-ID: <6a308572-98db-4603-81c7-159833ebc15e.zhuoren.wz@alibaba-inc.com> Updated the test. Not AArch64-only now. http://cr.openjdk.java.net/~wzhuo/8246051/webrev.02/ BTW, the behavior of unaligned Unsafe swap on aarch64(throw InternalError) are different from on X86(do the swap). Not sure whether the difference makes sence. Regards, Zhuoren ------------------------------------------------------------------ From:Andrew Haley Sent At:2020 May 31 (Sun.) 04:15 To:Sandler ; Nick Gasson Cc:hotspot-compiler-dev\@openjdk.java.net ; aarch64-port-dev Subject:Re: [aarch64-port-dev ] RFR:8246051:[AArch64]SIGBUS by unaligned Unsafe compare_and_swap On 29/05/2020 13:36, Wang Zhuo(Zhuoren) wrote: > Update patch. A jtreg test added > http://cr.openjdk.java.net/~wzhuo/8246051/webrev.01/ The test is AArch64-only but the patch is to shared code. This doesn't make sense to me. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Jun 2 13:57:44 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 15:57:44 +0200 Subject: RFR(S): 8244086: Following 8241492, strip mined loop may run extra iterations In-Reply-To: <689d2f08-636e-2d2d-8516-8387c346accc@oracle.com> References: <87wo5y8z2v.fsf@redhat.com> <878sid8jzn.fsf@redhat.com> <87zhat6voh.fsf@redhat.com> <87wo5s6tvs.fsf@redhat.com> <87eery7sr7.fsf@redhat.com> <87imggyvzg.fsf@redhat.com> <689d2f08-636e-2d2d-8516-8387c346accc@oracle.com> Message-ID: On 02.06.20 12:51, Tobias Hartmann wrote: > Will run some testing and report back. All green. Best regards, Tobias From rwestrel at redhat.com Tue Jun 2 14:37:29 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 02 Jun 2020 16:37:29 +0200 Subject: RFR(S): 8244086: Following 8241492, strip mined loop may run extra iterations In-Reply-To: <689d2f08-636e-2d2d-8516-8387c346accc@oracle.com> References: <87wo5y8z2v.fsf@redhat.com> <878sid8jzn.fsf@redhat.com> <87zhat6voh.fsf@redhat.com> <87wo5s6tvs.fsf@redhat.com> <87eery7sr7.fsf@redhat.com> <87imggyvzg.fsf@redhat.com> <689d2f08-636e-2d2d-8516-8387c346accc@oracle.com> Message-ID: <87a71lzg7q.fsf@redhat.com> > Looks good to me. Thanks for the review and testing. I will fix the issues you spotted below and push it. > Style comments: > - line 1687: excess whitespace after "( ". Also, I would not linebreak inside the "stride < 0" > expression. > - line 1688: "the the" > - line 1690: remove linebreak at end Roland. From rwestrel at redhat.com Tue Jun 2 14:37:50 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 02 Jun 2020 16:37:50 +0200 Subject: RFR(S): 8244086: Following 8241492, strip mined loop may run extra iterations In-Reply-To: References: <87wo5y8z2v.fsf@redhat.com> <878sid8jzn.fsf@redhat.com> <87zhat6voh.fsf@redhat.com> <87wo5s6tvs.fsf@redhat.com> <87eery7sr7.fsf@redhat.com> <87imggyvzg.fsf@redhat.com> Message-ID: <877dwpzg75.fsf@redhat.com> Hi Martin, > thanks for improving it and for adding the comment. I'm fine with it. Thanks for the review. Roland. From tobias.hartmann at oracle.com Tue Jun 2 14:51:21 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 2 Jun 2020 16:51:21 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <363917e3-72a7-8e2a-8c27-a934f21aa45a@oracle.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> <363917e3-72a7-8e2a-8c27-a934f21aa45a@oracle.com> Message-ID: And still looks trivial to me. Best regards, Tobias On 02.06.20 14:50, Tobias Hartmann wrote: > > On 02.06.20 12:40, Tobias Hartmann wrote: >> I'll run some testing and report back. > > All green. > > Best regards, > Tobias > From rwestrel at redhat.com Tue Jun 2 15:07:18 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 02 Jun 2020 17:07:18 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> <363917e3-72a7-8e2a-8c27-a934f21aa45a@oracle.com> Message-ID: <874krtzeu1.fsf@redhat.com> Thanks for the review (and testing) of the reworked fix. Roland. From vladimir.kozlov at oracle.com Tue Jun 2 15:54:16 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Jun 2020 08:54:16 -0700 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> Message-ID: Looks good to me too. I would suggest to use explicit check if you want to test when C2 is enabled: @requires vm.compiler2.enabled Thanks, Vladimir On 6/2/20 3:40 AM, Tobias Hartmann wrote: > Hi Roland, > > Looks good to me. I'll run some testing and report back. > > Maybe add a @requires vm.flavor == "server" to the test because the ArrayCopyLoadStoreMaxElem flag > is only available with C2 (no new webrev required). > > Best regards, > Tobias > > On 29.05.20 09:30, Roland Westrelin wrote: >> >>> https://bugs.openjdk.java.net/browse/JDK-8245714 >>> http://cr.openjdk.java.net/~roland/8245714/webrev.00/ >>> >>> This triggers when data nodes are pinned on the uncommon trap path of a >>> predicate. When a new predicate is added, a region is created to merge >>> the paths comming from the place holder and the new predicate. Data >>> nodes pinned on the uncommon path for the place holder are then updated >>> to be pinned on the new region. That logic updates the control edge but >>> not the control that loop opts keep track of. This causes a crash with >>> the test case of the webrev where the predicate is a loop limit check. >> >> That fix is incomplete. If the Load that's pinned on the uncommon trap >> path is a LoadN then there's a DecodeN between the uncommon trap and the >> Load. The control of the DecodeN also needs to be updated. Here is an >> updated fix: >> >> http://cr.openjdk.java.net/~roland/8245714/webrev.01/ >> >> This one uses lazy_replace. I'm concerned that other nodes (maybe an >> AddP) would be assigned the projection as control and need to be >> moved. With lazy_replace, all nodes are guaranteed to be properly >> updated. >> >> Roland. >> From rwestrel at redhat.com Tue Jun 2 15:58:17 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 02 Jun 2020 17:58:17 +0200 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> Message-ID: <871rmxzch2.fsf@redhat.com> Hi Vladimir, > Looks good to me too. Thanks for the review but I pushed this already as Tobias said it was trivial. > I would suggest to use explicit check if you want to test when C2 is enabled: > > @requires vm.compiler2.enabled Tobias made that suggestion too and I included it. Roland. From vladimir.kozlov at oracle.com Tue Jun 2 16:01:39 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Jun 2020 09:01:39 -0700 Subject: RFR(XS): 8245714: "Bad graph detected in build_loop_late" when loads are pinned on loop limit check uncommon branch In-Reply-To: <871rmxzch2.fsf@redhat.com> References: <87tv041ira.fsf@redhat.com> <87ftbjyz7y.fsf@redhat.com> <9b6aeec6-1274-55a8-4310-1f1756107823@oracle.com> <871rmxzch2.fsf@redhat.com> Message-ID: On 6/2/20 8:58 AM, Roland Westrelin wrote: > > Hi Vladimir, > >> Looks good to me too. > > Thanks for the review but I pushed this already as Tobias said it was > trivial. No problem. > >> I would suggest to use explicit check if you want to test when C2 is enabled: >> >> @requires vm.compiler2.enabled > > Tobias made that suggestion too and I included it. Good. Thanks, Vladimir > > Roland. > From yudi.zheng at oracle.com Tue Jun 2 16:38:46 2020 From: yudi.zheng at oracle.com (Yudi Zheng) Date: Tue, 2 Jun 2020 18:38:46 +0200 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller Message-ID: Hello, Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8246347 Many thanks, Yudi From serguei.spitsyn at oracle.com Tue Jun 2 16:54:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 2 Jun 2020 09:54:42 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Richard, This looks good to me. Thanks, Serguei On 5/28/20 09:02, Vladimir Kozlov wrote: > Vladimir Ivanov is on break currently. > It looks good to me. > > Thanks, > Vladimir K > > On 5/26/20 7:31 AM, Reingruber, Richard wrote: >> Hi Vladimir, >> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >> >>> ? From JIT-compilers perspective it looks good. >> >> I put out webrev.1 a while ago [1]: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >> Webrev(delta): >> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >> >> You originally suggested to use a handshake to switch a thread into >> interpreter mode [2]. I'm using >> a direct handshake now, because I think it is the best fit. >> >> May I ask if webrev.1 still looks good to you from JIT-compilers >> perspective? >> >> Can I list you as (partial) Reviewer? >> >> Thanks, Richard. >> >> [1] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >> [2] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Freitag, 7. Februar 2020 09:19 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for >> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >> compiled methods on stack not_entrant >> >> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >> Not an expert in JVMTI code base, so can't comment on the actual >> changes. >> >> ? From JIT-compilers perspective it looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>> >>> The change avoids making all compiled methods on stack not_entrant >>> when switching a java thread to >>> interpreter only execution for jvmti purposes. It is sufficient to >>> deoptimize the compiled frames on stack. >>> >>> Additionally a handshake is used instead of a vm operation to walk >>> the stack and do the deoptimizations. >>> >>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>> release builds on all platforms. >>> >>> Thanks, Richard. >>> >>> See also my question if anyone knows a reason for making the >>> compiled methods not_entrant: >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>> >>> From john.r.rose at oracle.com Tue Jun 2 17:28:40 2020 From: john.r.rose at oracle.com (John Rose) Date: Tue, 2 Jun 2020 10:28:40 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87d06nyoue.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> Message-ID: <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> On May 29, 2020, at 4:15 AM, Roland Westrelin wrote: > >> >> I suggest hardwiring it to 10 locally in loopnode.cpp, and >> making it a tunable parameter later on if we actually >> run into trouble with it. But we won?t; nobody is going >> to write loops with strides on the order of max_jint. >> In fact, you can leave out this suggestion altogether, >> if you are not comfortable with it, and we just take the >> odd performance hit if someone does something that >> strange. > > I thought about this a bit when I prepared the change and I left the > code as is so as many loop transformations as possible are performed to > shake out bugs thinking it could be revised later. FTR, I?m fine with this. I have no more review comments. ? John From richard.reingruber at sap.com Tue Jun 2 17:57:26 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 2 Jun 2020 17:57:26 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Serguei, > This looks good to me. Thanks! From an earlier mail: > I'm thinking it would be more safe to run full tier5. I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would like to push. Thanks, Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Dienstag, 2. Juni 2020 18:55 To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, This looks good to me. Thanks, Serguei On 5/28/20 09:02, Vladimir Kozlov wrote: > Vladimir Ivanov is on break currently. > It looks good to me. > > Thanks, > Vladimir K > > On 5/26/20 7:31 AM, Reingruber, Richard wrote: >> Hi Vladimir, >> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >> >>> ? From JIT-compilers perspective it looks good. >> >> I put out webrev.1 a while ago [1]: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >> Webrev(delta): >> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >> >> You originally suggested to use a handshake to switch a thread into >> interpreter mode [2]. I'm using >> a direct handshake now, because I think it is the best fit. >> >> May I ask if webrev.1 still looks good to you from JIT-compilers >> perspective? >> >> Can I list you as (partial) Reviewer? >> >> Thanks, Richard. >> >> [1] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >> [2] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Freitag, 7. Februar 2020 09:19 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for >> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >> compiled methods on stack not_entrant >> >> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >> Not an expert in JVMTI code base, so can't comment on the actual >> changes. >> >> ? From JIT-compilers perspective it looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>> >>> The change avoids making all compiled methods on stack not_entrant >>> when switching a java thread to >>> interpreter only execution for jvmti purposes. It is sufficient to >>> deoptimize the compiled frames on stack. >>> >>> Additionally a handshake is used instead of a vm operation to walk >>> the stack and do the deoptimizations. >>> >>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>> release builds on all platforms. >>> >>> Thanks, Richard. >>> >>> See also my question if anyone knows a reason for making the >>> compiled methods not_entrant: >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>> >>> From serguei.spitsyn at oracle.com Tue Jun 2 18:01:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 2 Jun 2020 11:01:42 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> Hi Richard, On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. Okay, I'll submit a mach5 job with your fix and let you know about the results. Thanks, Serguei > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From richard.reingruber at sap.com Tue Jun 2 19:14:08 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 2 Jun 2020 19:14:08 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> Message-ID: Excellent. Thanks! Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Dienstag, 2. Juni 2020 20:02 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. Okay, I'll submit a mach5 job with your fix and let you know about the results. Thanks, Serguei > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From xxinliu at amazon.com Tue Jun 2 20:57:20 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 2 Jun 2020 20:57:20 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph Message-ID: Hi, Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true, it guarantees that Compile::printer() is not NULL. If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", printer() will only turn on for that compiler thread. Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. Error occurred during initialization of VM Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference Thanks, --lx From shravya.rukmannagari at intel.com Tue Jun 2 22:56:48 2020 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Tue, 2 Jun 2020 22:56:48 +0000 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> Message-ID: Hi Vladimir, Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. Please let me know if you have questions or comments. Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov Sent: Monday, June 1, 2020 2:36 PM To: Rukmannagari, Shravya ; 'hotspot compiler' Cc: Tucker, Greg B Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions Hi Shravya, Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? This file is used only for AES intrinsic code - nothing else should be there. If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. It is difficult to judge the implementation code. I hope you ran all tests for it. Thanks, Vladimir On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: > Hi All, > > We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. > > > > Contributors: > > Shravya Rukmannagari(shravya.rukmannagari at intel.com) > > Greg B Tucker(greg.b.tucker at intel.com) > > > > I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. > > > > Please take a look and let me know if you have any questions or comments. > > > > Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 > > https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ > > > > Regards, > > Shravya Rukmannagari > From vladimir.kozlov at oracle.com Wed Jun 3 00:00:09 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Jun 2020 17:00:09 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> Message-ID: On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. > http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ Why you added UseSSE check in stubGenerator_x86_64.cpp? + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && > > The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. You are right about this. Thanks, Vladimir > Please let me know if you have questions or comments. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Monday, June 1, 2020 2:36 PM > To: Rukmannagari, Shravya ; 'hotspot compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions > > Hi Shravya, > > Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? > This file is used only for AES intrinsic code - nothing else should be there. > > If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. > > I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. > But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. > > It is difficult to judge the implementation code. I hope you ran all tests for it. > > Thanks, > Vladimir > > On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >> Hi All, >> >> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >> >> >> >> Contributors: >> >> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >> >> Greg B Tucker(greg.b.tucker at intel.com) >> >> >> >> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >> >> >> >> Please take a look and let me know if you have any questions or comments. >> >> >> >> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >> >> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >> >> >> >> Regards, >> >> Shravya Rukmannagari >> From Pengfei.Li at arm.com Wed Jun 3 01:07:23 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Wed, 3 Jun 2020 01:07:23 +0000 Subject: RFR: 8245158: C2: Enable SLP for some manually unrolled loops In-Reply-To: <4ec1e9af-10a3-94ac-a3a7-5c30a564be59@oracle.com> References: <87ftbm26e8.fsf@redhat.com> <47d371c9-d647-4e97-19f5-330831181ceb@oracle.com> <408a3b7a-50b1-ebcf-e2fb-fe88e72089ac@oracle.com> <4ec1e9af-10a3-94ac-a3a7-5c30a564be59@oracle.com> Message-ID: Yes, the test report works fine now. -- Thanks, Pengfei > Okay, just noticed that you've already pushed it. > > Best regards, > Tobias > > On 02.06.20 13:00, Tobias Hartmann wrote: > > Hi Pengfei, > > > > there was a problem with the test infra. I've submitted some testing > > for you. Will report back once it finished. > > > > Best regards, > > Tobias > > > > On 28.05.20 12:03, Pengfei Li wrote: > >> BTW: I've pushed twice to the submit repo > >> http://hg.openjdk.java.net/jdk/submit/rev/6ff334698002 (branch JDK- > 8245158) > >> http://hg.openjdk.java.net/jdk/submit/rev/b88caaa3f01d (branch > >> JDK-8245158-1) but got no report email from Mach5. > >> > >> -- > >> Thanks, > >> Pengfei > >> > >>> Thanks Roland and Tobias for looking at this. > >>> > >>> Hi Tobias, > >>> > >>> I've pushed this patch to the JDK submit repo but don't get the test > >>> report email. Could you or other Oracle engineer help have a check? > >>> > >>> -- > >>> Thanks, > >>> Pengfei > >>> > >>>> -----Original Message----- > >>>> From: Tobias Hartmann > >>>> Sent: Tuesday, May 26, 2020 21:29 > >>>> To: Roland Westrelin ; Pengfei Li > >>>> ; hotspot-compiler-dev at openjdk.java.net > >>>> Cc: Vladimir Kozlov ; nd > >>>> Subject: Re: RFR: 8245158: C2: Enable SLP for some manually > >>>> unrolled loops > >>>> > >>>> +1 > >>>> > >>>> Best regards, > >>>> Tobias > >>>> > >>>> On 26.05.20 15:05, Roland Westrelin wrote: > >>>>> > >>>>>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8245158/webrev.00/ > >>>>> > >>>>> That looks reasonable to me. > >>>>> > >>>>> Roland. > >>>>> From shravya.rukmannagari at intel.com Wed Jun 3 01:15:06 2020 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Wed, 3 Jun 2020 01:15:06 +0000 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> Message-ID: Hi Vladimir, The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. This test is run with SSE=3 as: run main/othervm -Xcomp -XX:UseSSE=3 compiler.cpuflags.TestSSE4Disabled Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, June 2, 2020 5:00 PM To: Rukmannagari, Shravya ; 'hotspot compiler' Cc: Tucker, Greg B Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. > http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ Why you added UseSSE check in stubGenerator_x86_64.cpp? + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && > > The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. You are right about this. Thanks, Vladimir > Please let me know if you have questions or comments. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Monday, June 1, 2020 2:36 PM > To: Rukmannagari, Shravya ; 'hotspot compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions > > Hi Shravya, > > Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? > This file is used only for AES intrinsic code - nothing else should be there. > > If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. > > I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. > But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. > > It is difficult to judge the implementation code. I hope you ran all tests for it. > > Thanks, > Vladimir > > On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >> Hi All, >> >> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >> >> >> >> Contributors: >> >> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >> >> Greg B Tucker(greg.b.tucker at intel.com) >> >> >> >> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >> >> >> >> Please take a look and let me know if you have any questions or comments. >> >> >> >> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >> >> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >> >> >> >> Regards, >> >> Shravya Rukmannagari >> From igor.ignatyev at oracle.com Wed Jun 3 03:48:18 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 2 Jun 2020 20:48:18 -0700 Subject: RFR(XS): 8244282: Add modules to a jtreg test. In-Reply-To: <26ab81b5-02f2-6e7b-1f71-0faa6e41a42d@oracle.com> References: <26ab81b5-02f2-6e7b-1f71-0faa6e41a42d@oracle.com> Message-ID: <33F3E0C2-3052-430C-A5D6-1E7957591435@oracle.com> Hi Evgeny, LGTM -- Igor > On May 11, 2020, at 8:07 AM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8244282 > Webrev: http://cr.openjdk.java.net/~enikitin/8244282/webrev.00/ > > Test fails with '--illegal-access=deny' due to necessary module being not specified. Fixed, tested with jtreg (fails without the change, passes with it). Please review. > > Thanks in advance, > /Evgeny Nikitin. From igor.ignatyev at oracle.com Wed Jun 3 03:54:37 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 2 Jun 2020 20:54:37 -0700 Subject: RFR(S): 8242923: Trigger interface MethodHandle resolve in test without Nashorn. In-Reply-To: References: Message-ID: Hi Evgeny, looks good to me, a couple editorial nits in CreatesInterfaceDotEqualsCallInfo.java: - at L#39, you have double space b/w throws and Throwable; - I don't feel like line breaks at L#41, L#42 and L#44 make it more readable; - can you use Path.resolve("/tmp/some_file") instead of new File..:toPath? - "/tmp/some_file" might confuse future readers into believe that's important that file is in /tmp or doesn't exist or smth else; so I'd prefer to just use "." Thanks, -- Igor > On May 28, 2020, at 12:22 PM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8242923 > Webrev: http://cr.openjdk.java.net/~enikitin/8242923/webrev.01/ > > The test used Nashorn to trigger incorrect MethodHandle resolve in the linkResolver.cpp (which in turn caused crash on the MethodHandle invokation). > > Test's functionality have been checked via rolling back the fix made in the https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012155.html, the test fails on 4 common platforms in mach5. > > The version with the bugfix reverted can be found here: http://cr.openjdk.java.net/~enikitin/8242923/webrev.00/ > > The change has been checked in mach5 for the 4 common platforms (passed). > > Please review, > /Evgeny Nikitin. From vladimir.kozlov at oracle.com Wed Jun 3 05:09:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Jun 2020 22:09:22 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> Message-ID: <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not checking for that. Vladimir On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. > This test is run with SSE=3 as: > run main/othervm -Xcomp -XX:UseSSE=3 compiler.cpuflags.TestSSE4Disabled > Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, June 2, 2020 5:00 PM > To: Rukmannagari, Shravya ; 'hotspot compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions > > On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. >> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ > > Why you added UseSSE check in stubGenerator_x86_64.cpp? > > + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && > >> >> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. > > You are right about this. > > Thanks, > Vladimir > >> Please let me know if you have questions or comments. >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Monday, June 1, 2020 2:36 PM >> To: Rukmannagari, Shravya ; 'hotspot compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions >> >> Hi Shravya, >> >> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >> This file is used only for AES intrinsic code - nothing else should be there. >> >> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >> >> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. >> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >> >> It is difficult to judge the implementation code. I hope you ran all tests for it. >> >> Thanks, >> Vladimir >> >> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>> Hi All, >>> >>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>> >>> >>> >>> Contributors: >>> >>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>> >>> Greg B Tucker(greg.b.tucker at intel.com) >>> >>> >>> >>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>> >>> >>> >>> Please take a look and let me know if you have any questions or comments. >>> >>> >>> >>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>> >>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>> >>> >>> >>> Regards, >>> >>> Shravya Rukmannagari >>> From vladimir.kozlov at oracle.com Wed Jun 3 05:12:09 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Jun 2020 22:12:09 -0700 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: References: Message-ID: <337693a8-bbd4-f968-b004-c9c042fc1c71@oracle.com> Looks good. Thanks, Vladimir On 6/2/20 9:38 AM, Yudi Zheng wrote: > Hello, > > Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. > > http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8246347 > > Many thanks, > Yudi > From tobias.hartmann at oracle.com Wed Jun 3 08:30:27 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 3 Jun 2020 10:30:27 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> Message-ID: <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> Hi Roland, First of all, very nice optimization! >> Tobias might wish to run some regression tests on the final changes. I've submitted some testing. Will report back once it finished. > http://cr.openjdk.java.net/~roland/8223051/webrev.02/ Looks good to me. Some style comments (no new webrev required): c2_globals.hpp: - line 785: "if > 0. convert" -> "If > 0, convert" loopnode.cpp: - line 504: "check_stride_overflow" is already in 8244504 - line 536: add assert message loopnode.hpp: - line 1462: Shouldn't _loop_invokes/_loop_work also be volatile and be incremented atomically? Best regards, Tobias From rwestrel at redhat.com Wed Jun 3 09:16:03 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 03 Jun 2020 11:16:03 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> Message-ID: <87y2p4y0fg.fsf@redhat.com> >>> Tobias might wish to run some regression tests on the final changes. > > I've submitted some testing. Will report back once it finished. Thanks. >> http://cr.openjdk.java.net/~roland/8223051/webrev.02/ > > Looks good to me. Thanks for the review. > loopnode.cpp: > - line 504: "check_stride_overflow" is already in 8244504 Right but the one in this webrev is for longs so doesn't have the same signature. > loopnode.hpp: > - line 1462: Shouldn't _loop_invokes/_loop_work also be volatile and be incremented atomically? I think so too. That they are not incremented atomically implies nobody uses them AFAICT. If we want all of them to be handled correctly then someone needs to go over Compile::print_statistics() and all methods it calls. Roland. From tobias.hartmann at oracle.com Wed Jun 3 13:21:03 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 3 Jun 2020 15:21:03 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" Message-ID: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8246453 http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't fold the array guard and emit code to allocate the array to be cloned into. The fix is to simply set deoptimize_on_exception for that allocation as well (we already set it for the non-array case [2]). Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8033626 http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 From zhuoren.wz at alibaba-inc.com Wed Jun 3 09:59:57 2020 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Wed, 03 Jun 2020 17:59:57 +0800 Subject: =?UTF-8?B?UmU6IFthYXJjaDY0LXBvcnQtZGV2IF0gUkZSOjgyNDYwNTE6W0FBcmNoNjRdU0lHQlVTIGJ5?= =?UTF-8?B?IHVuYWxpZ25lZCBVbnNhZmUgY29tcGFyZV9hbmRfc3dhcA==?= In-Reply-To: <88adc311-4d0e-4037-f2fb-f581e1a9d0b8@redhat.com> References: <497b376c-561c-c40c-add6-a63af8736a3c@redhat.com> <6a308572-98db-4603-81c7-159833ebc15e.zhuoren.wz@alibaba-inc.com>, <88adc311-4d0e-4037-f2fb-f581e1a9d0b8@redhat.com> Message-ID: <67c97176-bbc1-4e95-a81e-0115d5de011c.zhuoren.wz@alibaba-inc.com> Andrew, thanks for the detailed explanation. Could you help push this patch, or I need to wait for other reviews? Regards, Zhuoren ------------------------------------------------------------------ From:Andrew Haley Sent At:2020 Jun. 2 (Tue.) 17:44 To:Sandler ; Nick Gasson Cc:hotspot-compiler-dev at openjdk.java.net ; aarch64-port-dev Subject:Re: [aarch64-port-dev ] RFR:8246051:[AArch64]SIGBUS by unaligned Unsafe compare_and_swap On 02/06/2020 09:29, Wang Zhuo(Zhuoren) wrote: > Updated the test. Not AArch64-only now. > http://cr.openjdk.java.net/~wzhuo/8246051/webrev.02/ OK, looks good. It should work on PPC and similar as well. > BTW, the behavior of unaligned Unsafe swap on aarch64(throw > InternalError) are different from on X86(do the swap). Not sure > whether the difference makes sence. We need to make sure that the VM doesn't error out and exit; there's not much more we can do. It's all rather horrible. "Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel (r) AtomTM, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided." https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf Aiee! Run away! :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nils.eliasson at oracle.com Wed Jun 3 13:37:40 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 3 Jun 2020 15:37:40 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" In-Reply-To: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> References: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> Message-ID: <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> Hi Tobias, Looks good! Best regards, Nils On 2020-06-03 15:21, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8246453 > http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ > > Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different > reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't > fold the array guard and emit code to allocate the array to be cloned into. The fix is to simply set > deoptimize_on_exception for that allocation as well (we already set it for the non-array case [2]). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8033626 > http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 > [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 From tobias.hartmann at oracle.com Wed Jun 3 13:39:29 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 3 Jun 2020 15:39:29 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" In-Reply-To: <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> References: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> Message-ID: <48d43f24-73d9-ec99-746d-7685f7bf09b9@oracle.com> Hi Nils, thanks for the review! Best regards, Tobias On 03.06.20 15:37, Nils Eliasson wrote: > Hi Tobias, > > Looks good! > > Best regards, > Nils > > On 2020-06-03 15:21, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8246453 >> http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ >> >> Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different >> reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't >> fold the array guard and emit code to allocate the array to be cloned into. The fix is to simply set >> deoptimize_on_exception for that allocation as well (we already set it for the non-array case [2]). >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8033626 >> http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 >> [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 > From nils.eliasson at oracle.com Wed Jun 3 13:54:56 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 3 Jun 2020 15:54:56 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: Hi, I second Tobias. How often is this code path triggered? Would this be removing an optimization that is important is some cases? If this is extremely rare - I would be ok to remove it, otherwise we should evaluate Tobias suggestion. Best regards, Nils On 2020-05-26 16:23, Tobias Hartmann wrote: > Hi Felix, > > thanks for the details, makes sense to me. > > Isn't the root cause that we are loosing type information and wouldn't that be solved by selecting > the Phi with the more restrictive _adr_type? > > Best regards, > Tobias > > On 07.05.20 15:42, Yangfei (Felix) wrote: >> Hi Tobias, >> >>> -----Original Message----- >>> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >>> Sent: Thursday, May 7, 2020 5:10 PM >>> To: Yangfei (Felix) ; hotspot-compiler- >>> dev at openjdk.java.net >>> Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 >>> MergeMemNode::Ideal >>> >>> Hi Felix, >>> >>> were you able to figure out how we ended up with two Phis with same input >>> but different _adr_type? >> As I remembered, there are two major transformations which leads to this: >> >> 1. During Iter GVN1, a new phi is created with narrowed memory type through PhiNode::slice_memory. >> The new phi and the old phi have different _adr_type and different input. >> >> 2. Then C2 peel the first iteration of the given loop through PhaseIdealLoop::do_peeling. >> After that, the new phi and the old phi have same input but different _adr_type. >> >> Hope this helps. >> >> Thanks, >> Felix >> From nils.eliasson at oracle.com Wed Jun 3 14:08:19 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 3 Jun 2020 16:08:19 +0200 Subject: RFR:8243615 Continuous deoptimizations with Reason=unstable_if and Action=none In-Reply-To: References: <272f8207-0b1e-4b34-b1d4-0f562b4da9d1.zhuoren.wz@alibaba-inc.com> <4dc2e0ef-315b-a72b-bb8c-6b5f418765ed@oracle.com> Message-ID: <551936a3-a279-ef7e-af4e-f2571cdeaa45@oracle.com> I suggest adding a JFR event for catching these rare situations. Regards, Nils On 2020-05-11 22:15, John Rose wrote: > On May 11, 2020, at 5:58 AM, Wang Zhuo(Zhuoren) wrote: >> Theoretically speaking other optimizations, with Action_maybe_recompile or Action_reinterpret, can be affected, because in uncommon_trap, Action_maybe_recompile and Action_reinterpret will be changed to Action_none if too many recompiles happened. >> While I have only met this issue with Reason_unstable_if so far. > Here?s some background: > > The too_many_traps logic is like those barrels full of sand or water > at the edges of freeway intersections, or a backstop on a baseball field. > It?s better to have a backstop than to have none at all, but something > is wrong if you are hitting the backstop. > > In short, the too_many_traps logic is present to prevent trap storms > from lasting forever. But even short trap storms are a problem, if > they happen often enough. Also, the too_many_traps logic has > in the past failed to terminate trap storms. > > I think the bug here is probably whatever specific factor is causing > small trap storms, which in turn are triggering too_many_traps. > Maybe there?s a bytecode that is trapping too often, and that bytecode > individually is not throttling its own traps, and so the generic > backstop logic is being called into play. > > (Less likely, the generic throttling logic needs some fix. But usually > the right fix is at the root cause, with a single optimization or bytecode > that is going wrong too often.) > > Sometimes it?s one bytecode running one corner case optimization > that is trapping too many times, as if the JIT?s optimizer were saying > to itself ?last time this optimization failed, but this time for sure!?, > or as if the JIT?s optimizer has no feedback path at all to see that the > optimization has failed in the past. Sometimes it is a whole class of > bytecodes, such as ?all check-casts arising from generic erasure.? > > HTH > > ? John From john.r.rose at oracle.com Wed Jun 3 20:44:52 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Jun 2020 13:44:52 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87y2p4y0fg.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> Message-ID: <5FAEAB93-C8A9-4053-A3D0-CB1F3584172F@oracle.com> On Jun 3, 2020, at 2:16 AM, Roland Westrelin wrote: > >> loopnode.cpp: >> - line 504: "check_stride_overflow" is already in 8244504 > > Right but the one in this webrev is for longs so doesn't have the same > signature. It?s one of those very rare cases where copy/paste/edit is the right thing to do. A template or macro would also work, but that would be overkill, I think. The way to stay sane here is to keep the two copies right next to each other, so it?s blindingly obvious that they have to be maintained together. From igor.ignatyev at oracle.com Wed Jun 3 21:30:52 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 3 Jun 2020 14:30:52 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property Message-ID: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > 70 lines changed: 66 ins; 0 del; 4 mod Hi all, could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? the idea behind this patch is to have a way to clearly mark tests which ignore flags, so a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; b) they can be easily excluded from runs w/ flags. @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags [1] https://bugs.openjdk.java.net/browse/JDK-8151707 [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 [3] https://bugs.openjdk.java.net/browse/JDK-8246387 Thanks, -- Igor From dean.long at oracle.com Wed Jun 3 21:56:07 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 3 Jun 2020 14:56:07 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> Message-ID: <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> Hi Serguei, I like the latest changes so that JVMCI matches C2. Please get another review because this is not a trivial change. dl On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > The updated webrev is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ > > Probably, the JVMCI part can be simplified. > Only the compile_state line has to be moved up: > + JVMCICompileState compile_state(task); > // Skip redefined methods > - if (target_handle->is_old()) { > + if (compile_state.target_method_is_old()) { > failure_reason = "redefined method"; > retry_message = "not retryable"; > compilable = ciEnv::MethodCompilable_never; > } else { > - JVMCICompileState compile_state(task); > Fixes in the jvmciEnv.?pp are not really needed > > Please, let me know what do you think. > > This version does not fail at all (in 300 runs for both C2 and JVMCI). > It seems, other two issues disappeared as well: > > This was seen with the C2: > https://bugs.openjdk.java.net/browse/JDK-8245128 > > This was seen with the JVMCI: > https://bugs.openjdk.java.net/browse/JDK-8245446 > > Thanks, > Serguei > > > On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >> Hi Dean, >> >> Thank you for the reply. >> >> The problem is I do not fully understand your suggestion, especially >> the part >> about caching the method,is_old() value in the cache_jvmti_state(). >> >> This is a preliminary webrev where I tried to implement your suggestion: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >> >> This variant is failing in half of test runs for both C1/C2 and JVMCI. >> I think, the root cause is a safepoint in a ThreadInVMfromNative >> desctructor. >> Here: >> ?232 void ciEnv::cache_jvmti_state() { >> ?233 VM_ENTRY_MARK; >> >> Then we check for the target_method_is_old() value which is not >> up-to-date any more. >> I feel, it was correct and more simple before introducing this approach. >> Probably, I'm missing something here. >> >> >> I also have a question about the update fragment: >> 1696 { >> 1697 // Must switch to native to allocate ci_env >> 1698 ThreadToNativeFromVM ttn(thread); >> 1699 ciEnv ci_env((CompileTask*)NULL); >> 1700 >> 1701 // Switch back to VM state to do compiler initialization >> 1702 ThreadInVMfromNative tv(thread); >> 1703 ResetNoHandleMark rnhm; >> 1704 >> 1705 // Perform per-thread and global initializations >> 1706 comp->initialize(); >> 1707 } >> Can we remove the ciEnv object initialization above with the state >> transitions? >> Or it has some side effects? >> >> Please, let me know what you think. >> >> Thanks, >> Serguei >> >> >> On 6/1/20 15:10, Dean Long wrote: >>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> To check the is_old as you suggest the target method has to be passed >>>> to the cache_jvmti_state() as argument. Is it what you are suggesting? >>> >>> I believe you can use use _task->method()->is_old(), as the ciEnv >>> already has the task. >>> >>>> Just want to make sure I understand you correctly. >>>> >>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>> CompileTask >>>> which looks unnecessary (or I don't understand it): >>>> >>>> bool CompileBroker::init_compiler_runtime() { >>>> ? CompilerThread* thread = CompilerThread::current(); >>>> ? . . . >>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>> ??? // Cache Jvmti state >>>> ??? ci_env.cache_jvmti_state(); >>>> ??? // Cache DTrace flags >>>> ??? ci_env.cache_dtrace_flags(); >>>> >>> >>> These calls look unnecessary to me, as the ci_env will cache these >>> again before compiling a method. >>> I suggest removing these calls.? We should make sure the cache >>> fields are initialized to sane values >>> in the ciEnv ctor. >>> >>>> The JVMCI has a separate implementation for ciEnv which is jvmciEnv and >>>> its own set of cache_jvmti_state() and jvmti_state_changed() functions. >>>> Both are not called in the JVMCI case. >>>> So, these checks look as broken in JVMCI now. >>>> >>> JVMCI is in better shape, because it doesn't transition out of >>> _thread_in_vm state, >>> but yes it needs similar changes. >>> >>>> Not sure, I have enough compiler knowledge to fix this at this >>>> stage of release. >>>> Would it better to file a separate hotspot/compiler RFE targeted to 16? >>>> It can be assigned to me if it helps. >>>> >>> >>> This is a P3 so I believe we have time to fix it for 15. Please go >>> ahead and let's see if >>> we can get it in.? I can help with the JVMCI changes if they are not >>> straightforward. >>> >>> dl >>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 5/28/20 10:54, Dean Long wrote: >>>>> Sure, you could just have cache_jvmti_state() return a boolean to >>>>> bail out immediately for is_old. >>>>> >>>>> dl >>>>> >>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dean, >>>>>> >>>>>> Thank you for looking at this! >>>>>> Okay. Let me check what cab be done in this direction. >>>>>> There is no point to cache is_old. The compilation has to bail >>>>>> out if it is discovered to be true. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>> This seems OK as long as the memory barriers in the thread state >>>>>>> transitions prevent the C++ compiler from doing something like >>>>>>> reading is_old before reading redefinition_count.? I would feel >>>>>>> better if both JVMCI and C1/C2 cached is_old and >>>>>>> redefinition_count at the same time (making sure to be in the >>>>>>> _thread_in_vm state), then bail out based on the cached value of >>>>>>> is_old. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Please, review a fix for: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>>>>> enabled does >>>>>>>>> ? a lot of class retransformations in parallel with all other >>>>>>>>> stressing. >>>>>>>>> ? It provokes the assert at the compiled code installation time: >>>>>>>>> ??? assert(!method->is_old()) failed: Should not be installing >>>>>>>>> old methods >>>>>>>>> >>>>>>>>> ? The problem is that the >>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>> exists in the JVMCI >>>>>>>>> ? part of implementation: >>>>>>>>> 2148 // Skip redefined methods >>>>>>>>> 2149 if (target_handle->is_old()) { >>>>>>>>> 2150 failure_reason = "redefined method"; >>>>>>>>> 2151 retry_message = "not retryable"; >>>>>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>>>>> 2153 } else { >>>>>>>>> . . . >>>>>>>>> 2168 } >>>>>>>>> >>>>>>>>> ? The fix is to add this check. >>>>>>>> >>>>>>>> Sorry, forgot to explain one thing. >>>>>>>> Compiler code has a special mechanism to ensure the JVMTI class >>>>>>>> redefinition did >>>>>>>> not happen while the method was compiled, so all the >>>>>>>> assumptions remain correct. >>>>>>>> 2190 // Cache Jvmti state >>>>>>>> 2191 ci_env.cache_jvmti_state(); >>>>>>>> Part of this is a check that the value of >>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>> class redefinition >>>>>>>> happened which also implies some of methods may become old. >>>>>>>> However, the method being compiled can be already old at the >>>>>>>> point where the >>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>> check does not help much. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>>> Testing: >>>>>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>>>>> Will also submit hs tiers1-5. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > From shravya.rukmannagari at intel.com Wed Jun 3 22:03:49 2020 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Wed, 3 Jun 2020 22:03:49 +0000 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> Message-ID: Hi Vladimir, I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, June 2, 2020 10:09 PM To: Rukmannagari, Shravya ; 'hotspot compiler' Cc: Tucker, Greg B Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not checking for that. Vladimir On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. > This test is run with SSE=3 as: > run main/othervm -Xcomp -XX:UseSSE=3 > compiler.cpuflags.TestSSE4Disabled > Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, June 2, 2020 5:00 PM > To: Rukmannagari, Shravya ; 'hotspot > compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 > instructions > > On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. >> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ > > Why you added UseSSE check in stubGenerator_x86_64.cpp? > > + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && > >> >> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. > > You are right about this. > > Thanks, > Vladimir > >> Please let me know if you have questions or comments. >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Monday, June 1, 2020 2:36 PM >> To: Rukmannagari, Shravya ; 'hotspot >> compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >> instructions >> >> Hi Shravya, >> >> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >> This file is used only for AES intrinsic code - nothing else should be there. >> >> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >> >> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. >> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >> >> It is difficult to judge the implementation code. I hope you ran all tests for it. >> >> Thanks, >> Vladimir >> >> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>> Hi All, >>> >>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>> >>> >>> >>> Contributors: >>> >>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>> >>> Greg B Tucker(greg.b.tucker at intel.com) >>> >>> >>> >>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>> >>> >>> >>> Please take a look and let me know if you have any questions or comments. >>> >>> >>> >>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>> >>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>> >>> >>> >>> Regards, >>> >>> Shravya Rukmannagari >>> From vladimir.kozlov at oracle.com Wed Jun 3 22:28:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Jun 2020 15:28:00 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> Message-ID: <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> Then you have to check for VM_Version::supports_sse4_1() instead of UseSSE flag. Note, setting UseSSE flag lower disable 4.1 and 4.2 [1] the same way setting lower UseAVX disable AVX features [2]. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l644 [2] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l695 On 6/3/20 3:03 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, June 2, 2020 10:09 PM > To: Rukmannagari, Shravya ; 'hotspot compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions > > How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not checking for that. > > Vladimir > > On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. >> This test is run with SSE=3 as: >> run main/othervm -Xcomp -XX:UseSSE=3 >> compiler.cpuflags.TestSSE4Disabled >> Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Tuesday, June 2, 2020 5:00 PM >> To: Rukmannagari, Shravya ; 'hotspot >> compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >> instructions >> >> On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >>> Hi Vladimir, >>> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. >>> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ >> >> Why you added UseSSE check in stubGenerator_x86_64.cpp? >> >> + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && >> >>> >>> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. >> >> You are right about this. >> >> Thanks, >> Vladimir >> >>> Please let me know if you have questions or comments. >>> >>> Thanks, >>> Shravya. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Monday, June 1, 2020 2:36 PM >>> To: Rukmannagari, Shravya ; 'hotspot >>> compiler' >>> Cc: Tucker, Greg B >>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>> instructions >>> >>> Hi Shravya, >>> >>> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >>> This file is used only for AES intrinsic code - nothing else should be there. >>> >>> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >>> >>> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. >>> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >>> >>> It is difficult to judge the implementation code. I hope you ran all tests for it. >>> >>> Thanks, >>> Vladimir >>> >>> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>>> Hi All, >>>> >>>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>>> >>>> >>>> >>>> Contributors: >>>> >>>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>>> >>>> Greg B Tucker(greg.b.tucker at intel.com) >>>> >>>> >>>> >>>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>>> >>>> >>>> >>>> Please take a look and let me know if you have any questions or comments. >>>> >>>> >>>> >>>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>>> >>>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>>> >>>> >>>> >>>> Regards, >>>> >>>> Shravya Rukmannagari >>>> From dean.long at oracle.com Wed Jun 3 22:47:27 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 3 Jun 2020 15:47:27 -0700 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: References: Message-ID: Hi Yudi.? I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build.? Let me remove my changes and see if it still fails.? What testing did you do? dl On 6/2/20 9:38 AM, Yudi Zheng wrote: > Hello, > > Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. > > http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8246347 > > Many thanks, > Yudi From david.holmes at oracle.com Wed Jun 3 23:02:28 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jun 2020 09:02:28 +1000 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: References: Message-ID: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> Hi Igor, On 4/06/2020 7:30 am, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> 70 lines changed: 66 ins; 0 del; 4 mod > > Hi all, > > could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? > > the idea behind this patch is to have a way to clearly mark tests which ignore flags, so > a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; > b) they can be easily excluded from runs w/ flags. So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. Okay this seems reasonable in what it does. Thanks, David > @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. > > this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. > > please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 > webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags > > [1] https://bugs.openjdk.java.net/browse/JDK-8151707 > [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 > [3] https://bugs.openjdk.java.net/browse/JDK-8246387 > > Thanks, > -- Igor > From dean.long at oracle.com Wed Jun 3 23:22:51 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 3 Jun 2020 16:22:51 -0700 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: References: Message-ID: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> Does this require recent Graal change in order to work correctly? dl On 6/3/20 3:47 PM, Dean Long wrote: > Hi Yudi.? I'm seeing an assert in > test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me > remove my changes and see if it still fails.? What testing did you do? > > dl > > On 6/2/20 9:38 AM, Yudi Zheng wrote: >> Hello, >> >> Please review this patch that sets is_method_handle_invoke flag >> accordingly when describing scope at call site in jvmciCodeInstaller. >> >> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8246347 >> >> Many thanks, >> Yudi > From igor.ignatyev at oracle.com Thu Jun 4 01:05:07 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 3 Jun 2020 18:05:07 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> References: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> Message-ID: <5430D545-BE0C-4022-9468-D6EAFF7BAC78@oracle.com> Hi David, > So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. correct, and 8151707's subtasks are going to mark only such tests (and tests which should be using driver-mode, but can't due to external factors, remember these follow-up fixes for my use driver-mode? ;) ). there are two more (a bit controversial) use cases where we can consider usage of vm.flagless: - some of debugger-debuggee tests have debugger executed w/ external flags, but don't pass these flags to debuggee; and in most cases, it doesn't seem to be right, so arguable all such tests should be updated to use driver mode to run debugger and then marked w/ vm.flagless. I know that svc team was doing some cleanup in this area recently, and given it's require more investigation w.r.t the tests' intent, I don't plan to do it as a part of 8151707, and instead will create follow up RFEs/tasks. - a unit-like tests which don't ignore flags, but weren't designed to be run w/ external flags; most of jfr tests can be used as an example: you can run w/ any flags, but they might fail as they assert things which happen only in certain configurations and these configurations are written in jtreg test descriptions. currently, these tests are marked w/ jfr k/w and it's advised not to run them w/ any external flags, yet I know that some people successfully do that to test their configurations. given the set of configurations which satisfies needs of jfr tests is much bigger than the configurations listed in the tests, I kinda feel sympathetic to people doing that, on the other hand, it's unsupported and I'd prefer us to express (and enforce) that more clearly. again, given the possible controversiality and need for a broader discussion, I'm planning to file an issue for jfr tests and follow up later w/ interested parties. to sum up, 8151707's subtasks are going to mark *only* obvious and non-controversial cases. for all other cases, the JBS entries are to be filed and followed up on. Cheers, -- Igor > On Jun 3, 2020, at 4:02 PM, David Holmes wrote: > > Hi Igor, > > On 4/06/2020 7:30 am, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >>> 70 lines changed: 66 ins; 0 del; 4 mod >> Hi all, >> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? >> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so >> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; >> b) they can be easily excluded from runs w/ flags. > > So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. > > Okay this seems reasonable in what it does. > > Thanks, > David > >> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. >> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. >> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 >> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags >> [1] https://bugs.openjdk.java.net/browse/JDK-8151707 >> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 >> [3] https://bugs.openjdk.java.net/browse/JDK-8246387 >> Thanks, >> -- Igor From serguei.spitsyn at oracle.com Thu Jun 4 02:07:20 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 19:07:20 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Richard, The mach5 test run is good. Thanks, Serguei On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. > > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From vladimir.kozlov at oracle.com Thu Jun 4 02:54:04 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Jun 2020 19:54:04 -0700 Subject: [15] RFR(S) 8227647: [Graal] Test8009761.java fails due to "RuntimeException: static java.lang.Object compiler.uncommontrap.Test8009761.m3(boolean,boolean) not compiled" Message-ID: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> http://cr.openjdk.java.net/~kvn/8227647/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8227647 Test failed because JVMCI after 10 sec unblock [1] blocking (-Xbatch) compilation even so Graal continue compile the method. As result WB think compilation failed. LogCompilation shows that first Graal's compilation (which is not the tested method) can take > 10 sec on slow machine because Graal is run in Interpreter mode and is gradually compiled by C1 (which is also consumes CPU resources). I suggest to wait compilation finished if request came from testing environment (WB, CTW, replay,...) Tested hs-tier1,hs-tier3-graal and many runs of failed test. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/0a32396f7a69/src/hotspot/share/compiler/compileBroker.cpp#l1583 From tobias.hartmann at oracle.com Thu Jun 4 06:31:53 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 4 Jun 2020 08:31:53 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87y2p4y0fg.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> Message-ID: <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> Hi Roland, On 03.06.20 11:16, Roland Westrelin wrote: >>>> Tobias might wish to run some regression tests on the final changes. >> >> I've submitted some testing. Will report back once it finished. > > Thanks. I'm seeing some failures with -XX:StressLongCountedLoop=429496729. Will follow-up offline. >> loopnode.cpp: >> - line 504: "check_stride_overflow" is already in 8244504 > > Right but the one in this webrev is for longs so doesn't have the same > signature. Right and as John already mentioned it's probably not worth using templates here. >> loopnode.hpp: >> - line 1462: Shouldn't _loop_invokes/_loop_work also be volatile and be incremented atomically? > > I think so too. That they are not incremented atomically implies nobody > uses them AFAICT. If we want all of them to be handled correctly then > someone needs to go over Compile::print_statistics() and all methods it > calls. Right that's out of scope of this change. Best regards, Tobias From yudi.zheng at oracle.com Thu Jun 4 07:03:06 2020 From: yudi.zheng at oracle.com (Yudi Zheng) Date: Thu, 4 Jun 2020 09:03:06 +0200 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> Message-ID: I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge. -Yudi > On 4 Jun 2020, at 01:22, Dean Long wrote: > > Does this require recent Graal change in order to work correctly? > > dl > > On 6/3/20 3:47 PM, Dean Long wrote: >> Hi Yudi. I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails. What testing did you do? >> >> dl >> >> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>> Hello, >>> >>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. >>> >>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>> >>> Many thanks, >>> Yudi >> > From dean.long at oracle.com Thu Jun 4 08:03:28 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 4 Jun 2020 01:03:28 -0700 Subject: RFR(XL) 8243380: Update Graal Message-ID: <948d9a9f-22d4-b95d-d948-a67ead20ad14@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8243380 http://cr.openjdk.java.net/~dlong/8243380/webrev/ This is a Graal update.? Changes since the last update (JDK-8241231) are listed in the bug description. dl From Yang.Zhang at arm.com Thu Jun 4 08:27:33 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Thu, 4 Jun 2020 08:27:33 +0000 Subject: RFR: 8244926: Add absolute check for int/long to generate Abs nodes Message-ID: Hi, May I have a review of this enhancement of absolute check for int/long? JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: ((a < 0) ? -a : a) ((a <= 0) ? -a : a) ((a > 0) ? a : -a) ((a >= 0) ? a : -a) Test case: public static int absi(int a) { return ((a < 0) ? -a : a); } With c2, AbsI node is generated and matched. The following snippet is generated on x86: 0x00007f67c8b6155b: mov %ecx,%r11d 0x00007f67c8b6155e: sar $0x1f,%r11d 0x00007f67c8b61562: mov %ecx,%r10d 0x00007f67c8b61565: xor %r11d,%r10d 0x00007f67c8b61568: sub %r11d,%r10d On AArch64: 0x0000ffffa8b878e4: cmp w3, wzr 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop Note: AArch64 result is based on this patch which is in review [2]. Test: Full jtreg on x86 and AArch64, no new failure Performance: Jmh test is uploaded. http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java X86 Before: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op After: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op AArch64: Before: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op After: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op [1] http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/share/opto/cfgnode.cpp#l1519 [2] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html Regards, Yang From tobias.hartmann at oracle.com Thu Jun 4 11:40:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 4 Jun 2020 13:40:12 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" In-Reply-To: <48d43f24-73d9-ec99-746d-7685f7bf09b9@oracle.com> References: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> <48d43f24-73d9-ec99-746d-7685f7bf09b9@oracle.com> Message-ID: <6a404b4e-eacb-320d-6dd5-050c1780bfa1@oracle.com> Hi, I've missed to handle the case when 'array_copy_requires_gc_barriers' is true (for example, if ZGC is enabled). We need to set deoptimize_on_exception=true for the slow call to the arraycopy stub: http://cr.openjdk.java.net/~thartmann/8246453/webrev.01/ Thanks, Tobias On 03.06.20 15:39, Tobias Hartmann wrote: > Hi Nils, > > thanks for the review! > > Best regards, > Tobias > > On 03.06.20 15:37, Nils Eliasson wrote: >> Hi Tobias, >> >> Looks good! >> >> Best regards, >> Nils >> >> On 2020-06-03 15:21, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8246453 >>> http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ >>> >>> Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different >>> reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't >>> fold the array guard and emit code to allocate the array to be cloned into. The fix is to simply set >>> deoptimize_on_exception for that allocation as well (we already set it for the non-array case [2]). >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8033626 >>> http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 >>> [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 >> From nils.eliasson at oracle.com Thu Jun 4 11:43:24 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 4 Jun 2020 13:43:24 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" In-Reply-To: <6a404b4e-eacb-320d-6dd5-050c1780bfa1@oracle.com> References: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> <48d43f24-73d9-ec99-746d-7685f7bf09b9@oracle.com> <6a404b4e-eacb-320d-6dd5-050c1780bfa1@oracle.com> Message-ID: Good catch! Looks good. Regards, Nils On 2020-06-04 13:40, Tobias Hartmann wrote: > Hi, > > I've missed to handle the case when 'array_copy_requires_gc_barriers' is true (for example, if ZGC > is enabled). We need to set deoptimize_on_exception=true for the slow call to the arraycopy stub: > http://cr.openjdk.java.net/~thartmann/8246453/webrev.01/ > > Thanks, > Tobias > > On 03.06.20 15:39, Tobias Hartmann wrote: >> Hi Nils, >> >> thanks for the review! >> >> Best regards, >> Tobias >> >> On 03.06.20 15:37, Nils Eliasson wrote: >>> Hi Tobias, >>> >>> Looks good! >>> >>> Best regards, >>> Nils >>> >>> On 2020-06-03 15:21, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8246453 >>>> http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ >>>> >>>> Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different >>>> reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't >>>> fold the array guard and emit code to allocate the array to be cloned into. The fix is to simply set >>>> deoptimize_on_exception for that allocation as well (we already set it for the non-array case [2]). >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8033626 >>>> http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 >>>> [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 From tobias.hartmann at oracle.com Thu Jun 4 11:46:20 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 4 Jun 2020 13:46:20 +0200 Subject: [15] RFR(S): 8246453: TestClone crashes with "all collected exceptions must come from the same place" In-Reply-To: References: <361cf8bf-bc1c-39e5-c8a9-855040bfa807@oracle.com> <0209c2ed-0fdd-5501-0482-759db24f7dca@oracle.com> <48d43f24-73d9-ec99-746d-7685f7bf09b9@oracle.com> <6a404b4e-eacb-320d-6dd5-050c1780bfa1@oracle.com> Message-ID: <58e55201-6604-4e69-39a2-e35c4e8224bf@oracle.com> Thanks again for the review Nils! Best regards, Tobias On 04.06.20 13:43, Nils Eliasson wrote: > Good catch! > > Looks good. > > Regards, > Nils > > On 2020-06-04 13:40, Tobias Hartmann wrote: >> Hi, >> >> I've missed to handle the case when 'array_copy_requires_gc_barriers' is true (for example, if ZGC >> is enabled). We need to set deoptimize_on_exception=true for the slow call to the arraycopy stub: >> http://cr.openjdk.java.net/~thartmann/8246453/webrev.01/ >> >> Thanks, >> Tobias >> >> On 03.06.20 15:39, Tobias Hartmann wrote: >>> Hi Nils, >>> >>> thanks for the review! >>> >>> Best regards, >>> Tobias >>> >>> On 03.06.20 15:37, Nils Eliasson wrote: >>>> Hi Tobias, >>>> >>>> Looks good! >>>> >>>> Best regards, >>>> Nils >>>> >>>> On 2020-06-03 15:21, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8246453 >>>>> http://cr.openjdk.java.net/~thartmann/8246453/webrev.00/ >>>>> >>>>> Similar to old JDK-8033626 [1], we assert when trying to merge two JVMStates with different >>>>> reexecute settings. This only happens with -XX:+StressReflectiveCode because in that case we don't >>>>> fold the array guard and emit code to allocate the array to be cloned into. The fix is to >>>>> simply set >>>>> deoptimize_on_exception for that allocation as well (we already set it for the non-array case >>>>> [2]). >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8033626 >>>>> http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912 >>>>> [2] http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/00c8a1255912#l3.7 > From aph at redhat.com Thu Jun 4 15:34:23 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Jun 2020 16:34:23 +0100 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: <759f7e99-91a4-7c5f-36df-89b3ae96f74f@redhat.com> On 06/05/2020 09:46, Yang Zhang wrote: > Could you please help to review this patch? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8243597 > Webrev: http://cr.openjdk.java.net/~yzhang/8243597/webrev.00/ I'm working on it. Sorry for the delay. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Thu Jun 4 16:00:29 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 09:00:29 -0700 Subject: RFR(XL) 8243380: Update Graal In-Reply-To: <948d9a9f-22d4-b95d-d948-a67ead20ad14@oracle.com> References: <948d9a9f-22d4-b95d-d948-a67ead20ad14@oracle.com> Message-ID: Too many failures including builds. Vladimir On 6/4/20 1:03 AM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8243380 > http://cr.openjdk.java.net/~dlong/8243380/webrev/ > > This is a Graal update.? Changes since the last update (JDK-8241231) are listed in the bug description. > > dl > > From tobias.hartmann at oracle.com Thu Jun 4 16:09:20 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 4 Jun 2020 18:09:20 +0200 Subject: [15] RFR(S) 8227647: [Graal] Test8009761.java fails due to "RuntimeException: static java.lang.Object compiler.uncommontrap.Test8009761.m3(boolean,boolean) not compiled" In-Reply-To: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> References: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> Message-ID: Hi Vladimir, looks good to me but I would call the method "should_wait_for_compilation" (no new webrev required). Best regards, Tobias On 04.06.20 04:54, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8227647/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8227647 > > Test failed because JVMCI after 10 sec unblock [1] blocking (-Xbatch) compilation even so Graal > continue compile the method. As result WB think compilation failed. > LogCompilation shows that first Graal's compilation (which is not the tested method) can take > 10 > sec on slow machine because Graal is run in Interpreter mode and is gradually compiled by C1 (which > is also consumes CPU resources). > > I suggest to wait compilation finished if request came from testing environment (WB, CTW, replay,...) > > Tested hs-tier1,hs-tier3-graal and many runs of failed test. > Thanks, > Vladimir > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/0a32396f7a69/src/hotspot/share/compiler/compileBroker.cpp#l1583 From aph at redhat.com Thu Jun 4 16:09:41 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Jun 2020 17:09:41 +0100 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: On 18/05/2020 06:51, Yang Zhang wrote: > Testing: > Full jtreg test > Vector API tests which cover vector abs > > Test case: > public static void absvs(short[] a, short[] b, short[] c) { > for (int i = 0; i < a.length; i++) { > c[i] = (short)Math.abs((a[i] + b[i])); > } > } > > Assembly code generated by C2: > 0x0000ffffaca3f3ac: ldr q17, [x16, #16] > 0x0000ffffaca3f3b0: ldr q16, [x15, #16] > 0x0000ffffaca3f3b4: add v16.8h, v16.8h, v17.8h > 0x0000ffffaca3f3b8: abs v16.8h, v16.8h > 0x0000ffffaca3f3c0: str q16, [x12, #16] > > Similar test cases for byte/int/long are also tested and NEON abs instruction is generated by C2. Unfortunately the test cases you provided do not include the method absvs(short). I'm not seeing this result. All I get with your patch applied in the case of your test TestScalar @Benchmark public void testAbsI() { for (int n = 0; n < LOOP_CNT; n++) { for (int i = 0; i < ia.length; i += 4) { ic[i] = Math.abs(ia[i] + ib[i]); } } } is ;; B18: # out( B18 B19 ) <- in( B17 B18 ) Loop( B18-B18 inner main of N82 strip mined) Freq: 9.69583e+08 0x0000ffff78824da0: sbfiz x11, x4, #2, #32 0x0000ffff78824da4: add x7, x0, x11 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 36 (line 44) 0x0000ffff78824da8: add xmethod, x18, x11 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 30 (line 44) 0x0000ffff78824dac: ldr w2, [x7,#16] 0x0000ffff78824db0: ldr w13, [xmethod,#16] 0x0000ffff78824db4: add w13, w13, w2 0x0000ffff78824db8: cmp w13, wzr 0x0000ffff78824dbc: cneg w1, w13, lt 0x0000ffff78824dc0: add x11, x3, x11 0x0000ffff78824dc4: str w1, [x11,#16] 0x0000ffff78824dc8: ldr w2, [x7,#32] 0x0000ffff78824dcc: ldr w1, [xmethod,#32] 0x0000ffff78824dd0: add w13, w1, w2 0x0000ffff78824dd4: cmp w13, wzr 0x0000ffff78824dd8: cneg w1, w13, lt 0x0000ffff78824ddc: str w1, [x11,#32] 0x0000ffff78824de0: ldr w13, [x7,#48] 0x0000ffff78824de4: ldr w1, [xmethod,#48] 0x0000ffff78824de8: add w1, w1, w13 0x0000ffff78824dec: cmp w1, wzr 0x0000ffff78824df0: cneg w13, w1, lt 0x0000ffff78824df4: str w13, [x11,#48] ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 41 (line 44) 0x0000ffff78824df8: ldr w1, [x7,#64] 0x0000ffff78824dfc: ldr w12, [xmethod,#64] 0x0000ffff78824e00: add w12, w12, w1 0x0000ffff78824e04: cmp w12, wzr 0x0000ffff78824e08: cneg w13, w12, lt 0x0000ffff78824e0c: add w4, w4, #0x10 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 42 (line 43) 0x0000ffff78824e10: str w13, [x11,#64] ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 41 (line 44) 0x0000ffff78824e14: cmp w4, w6 0x0000ffff78824e18: b.lt 0x0000ffff78824da0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 17 (line 43) Please provide me with a Java program that reproduces the result above, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shravya.rukmannagari at intel.com Thu Jun 4 16:19:38 2020 From: shravya.rukmannagari at intel.com (Rukmannagari, Shravya) Date: Thu, 4 Jun 2020 16:19:38 +0000 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> Message-ID: Hi Vladimir, Please find the updated webrev below which now checks for sse4_1 support. http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.03/ Thanks, Shravya. -----Original Message----- From: Vladimir Kozlov Sent: Wednesday, June 3, 2020 3:28 PM To: Rukmannagari, Shravya ; 'hotspot compiler' Cc: Tucker, Greg B Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions Then you have to check for VM_Version::supports_sse4_1() instead of UseSSE flag. Note, setting UseSSE flag lower disable 4.1 and 4.2 [1] the same way setting lower UseAVX disable AVX features [2]. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l644 [2] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l695 On 6/3/20 3:03 PM, Rukmannagari, Shravya wrote: > Hi Vladimir, > I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, June 2, 2020 10:09 PM > To: Rukmannagari, Shravya ; 'hotspot > compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 > instructions > > How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not checking for that. > > Vladimir > > On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. >> This test is run with SSE=3 as: >> run main/othervm -Xcomp -XX:UseSSE=3 >> compiler.cpuflags.TestSSE4Disabled >> Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Tuesday, June 2, 2020 5:00 PM >> To: Rukmannagari, Shravya ; 'hotspot >> compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >> instructions >> >> On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >>> Hi Vladimir, >>> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. >>> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ >> >> Why you added UseSSE check in stubGenerator_x86_64.cpp? >> >> + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && >> >>> >>> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. >> >> You are right about this. >> >> Thanks, >> Vladimir >> >>> Please let me know if you have questions or comments. >>> >>> Thanks, >>> Shravya. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Monday, June 1, 2020 2:36 PM >>> To: Rukmannagari, Shravya ; 'hotspot >>> compiler' >>> Cc: Tucker, Greg B >>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>> instructions >>> >>> Hi Shravya, >>> >>> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >>> This file is used only for AES intrinsic code - nothing else should be there. >>> >>> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >>> >>> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. >>> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >>> >>> It is difficult to judge the implementation code. I hope you ran all tests for it. >>> >>> Thanks, >>> Vladimir >>> >>> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>>> Hi All, >>>> >>>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>>> >>>> >>>> >>>> Contributors: >>>> >>>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>>> >>>> Greg B Tucker(greg.b.tucker at intel.com) >>>> >>>> >>>> >>>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>>> >>>> >>>> >>>> Please take a look and let me know if you have any questions or comments. >>>> >>>> >>>> >>>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>>> >>>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>>> >>>> >>>> >>>> Regards, >>>> >>>> Shravya Rukmannagari >>>> From vladimir.kozlov at oracle.com Thu Jun 4 16:45:55 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 09:45:55 -0700 Subject: [15] RFR(S) 8227647: [Graal] Test8009761.java fails due to "RuntimeException: static java.lang.Object compiler.uncommontrap.Test8009761.m3(boolean,boolean) not compiled" In-Reply-To: References: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> Message-ID: <2869fb87-6e06-656a-59d7-3669938fa23d@oracle.com> Thank you, Tobias On 6/4/20 9:09 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me but I would call the method "should_wait_for_compilation" (no new webrev required). Changed. Thanks, Vladimir > > Best regards, > Tobias > > On 04.06.20 04:54, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8227647/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8227647 >> >> Test failed because JVMCI after 10 sec unblock [1] blocking (-Xbatch) compilation even so Graal >> continue compile the method. As result WB think compilation failed. >> LogCompilation shows that first Graal's compilation (which is not the tested method) can take > 10 >> sec on slow machine because Graal is run in Interpreter mode and is gradually compiled by C1 (which >> is also consumes CPU resources). >> >> I suggest to wait compilation finished if request came from testing environment (WB, CTW, replay,...) >> >> Tested hs-tier1,hs-tier3-graal and many runs of failed test. >> Thanks, >> Vladimir >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/0a32396f7a69/src/hotspot/share/compiler/compileBroker.cpp#l1583 From volker.simonis at gmail.com Thu Jun 4 17:13:09 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 4 Jun 2020 19:13:09 +0200 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: References: Message-ID: Hi Xin, Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: > Hi, > > Could you review this webrev? It fixes a minor problem when users only use > IGVPrintLevel in Compiler Directives. > Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 > Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ > > I move "bool should_print(int level)" from idealGraphPrinter to Compile > because the later has the information. > In this way, Compile can allocate _printer on demand. If > Compile::should_print(level) return true, it guarantees that > Compile::printer() is not NULL. > If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", > printer() will only turn on for that compiler thread. > > - Why do you need the extra check for "method() != NULL": 619 if (should_print(1) && method() != NULL) { 4584 if (should_print(level) && method() != NULL) { in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. - Why did you make 453 IdealGraphPrinter* printer() { return _printer; } a "const" function? 453 IdealGraphPrinter* printer() const { return _printer; } I don't think it is required? - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. Besides that, your change looks good. Thank you and best regards, Volker Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java > failed. > That's another issue. Currently, Openjdk can't execute any gtest because > of a linkage error. > Error occurred during initialization of VM > Unable to load native library: > /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: > symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in > file libjvm.so with link time reference > > Thanks, > --lx > > > > From vladimir.kozlov at oracle.com Thu Jun 4 17:16:35 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 10:16:35 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> Message-ID: <5e478bc9-2a36-b8f2-eca9-abcfc19a2239@oracle.com> Looks good. I submitted our testing. I will let you know when it finished. Regards, Vladimir On 6/4/20 9:19 AM, Rukmannagari, Shravya wrote: > Hi Vladimir, > Please find the updated webrev below which now checks for sse4_1 support. > http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.03/ > > Thanks, > Shravya. > > -----Original Message----- > From: Vladimir Kozlov > Sent: Wednesday, June 3, 2020 3:28 PM > To: Rukmannagari, Shravya ; 'hotspot compiler' > Cc: Tucker, Greg B > Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions > > Then you have to check for VM_Version::supports_sse4_1() instead of UseSSE flag. > > Note, setting UseSSE flag lower disable 4.1 and 4.2 [1] the same way setting lower UseAVX disable AVX features [2]. > > Thanks, > Vladimir > > [1] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l644 > [2] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l695 > > On 6/3/20 3:03 PM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Tuesday, June 2, 2020 10:09 PM >> To: Rukmannagari, Shravya ; 'hotspot >> compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >> instructions >> >> How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not checking for that. >> >> Vladimir >> >> On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: >>> Hi Vladimir, >>> The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. >>> This test is run with SSE=3 as: >>> run main/othervm -Xcomp -XX:UseSSE=3 >>> compiler.cpuflags.TestSSE4Disabled >>> Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. >>> >>> Thanks, >>> Shravya. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Tuesday, June 2, 2020 5:00 PM >>> To: Rukmannagari, Shravya ; 'hotspot >>> compiler' >>> Cc: Tucker, Greg B >>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>> instructions >>> >>> On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >>>> Hi Vladimir, >>>> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in macroAssembler_x86.cpp. >>>> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ >>> >>> Why you added UseSSE check in stubGenerator_x86_64.cpp? >>> >>> + if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && >>> >>>> >>>> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. >>> >>> You are right about this. >>> >>> Thanks, >>> Vladimir >>> >>>> Please let me know if you have questions or comments. >>>> >>>> Thanks, >>>> Shravya. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Monday, June 1, 2020 2:36 PM >>>> To: Rukmannagari, Shravya ; 'hotspot >>>> compiler' >>>> Cc: Tucker, Greg B >>>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>>> instructions >>>> >>>> Hi Shravya, >>>> >>>> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >>>> This file is used only for AES intrinsic code - nothing else should be there. >>>> >>>> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >>>> >>>> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef _LP64. >>>> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >>>> >>>> It is difficult to judge the implementation code. I hope you ran all tests for it. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>>>> Hi All, >>>>> >>>>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>>>> >>>>> >>>>> >>>>> Contributors: >>>>> >>>>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>>>> >>>>> Greg B Tucker(greg.b.tucker at intel.com) >>>>> >>>>> >>>>> >>>>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>>>> >>>>> >>>>> >>>>> Please take a look and let me know if you have any questions or comments. >>>>> >>>>> >>>>> >>>>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>>>> >>>>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Shravya Rukmannagari >>>>> From igor.ignatyev at oracle.com Thu Jun 4 17:25:10 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 4 Jun 2020 10:25:10 -0700 Subject: [15] RFR(S) 8227647: [Graal] Test8009761.java fails due to "RuntimeException: static java.lang.Object compiler.uncommontrap.Test8009761.m3(boolean,boolean) not compiled" In-Reply-To: <2869fb87-6e06-656a-59d7-3669938fa23d@oracle.com> References: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> <2869fb87-6e06-656a-59d7-3669938fa23d@oracle.com> Message-ID: <7225AEE6-84FD-4E14-8D7F-4B15CD6DF132@oracle.com> Hi Vladimir, looks good to me, Thanks, -- Igor > On Jun 4, 2020, at 9:45 AM, Vladimir Kozlov wrote: > > Thank you, Tobias > > On 6/4/20 9:09 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> looks good to me but I would call the method "should_wait_for_compilation" (no new webrev required). > > Changed. > > Thanks, > Vladimir > >> Best regards, >> Tobias >> On 04.06.20 04:54, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/8227647/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8227647 >>> >>> Test failed because JVMCI after 10 sec unblock [1] blocking (-Xbatch) compilation even so Graal >>> continue compile the method. As result WB think compilation failed. >>> LogCompilation shows that first Graal's compilation (which is not the tested method) can take > 10 >>> sec on slow machine because Graal is run in Interpreter mode and is gradually compiled by C1 (which >>> is also consumes CPU resources). >>> >>> I suggest to wait compilation finished if request came from testing environment (WB, CTW, replay,...) >>> >>> Tested hs-tier1,hs-tier3-graal and many runs of failed test. >>> Thanks, >>> Vladimir >>> >>> [1] >>> http://hg.openjdk.java.net/jdk/jdk/file/0a32396f7a69/src/hotspot/share/compiler/compileBroker.cpp#l1583 From vladimir.kozlov at oracle.com Thu Jun 4 17:31:23 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 10:31:23 -0700 Subject: [15] RFR(S) 8227647: [Graal] Test8009761.java fails due to "RuntimeException: static java.lang.Object compiler.uncommontrap.Test8009761.m3(boolean,boolean) not compiled" In-Reply-To: <7225AEE6-84FD-4E14-8D7F-4B15CD6DF132@oracle.com> References: <86b5b4f7-9e19-f14b-e1fc-9fc5b3cc8f9a@oracle.com> <2869fb87-6e06-656a-59d7-3669938fa23d@oracle.com> <7225AEE6-84FD-4E14-8D7F-4B15CD6DF132@oracle.com> Message-ID: <0c679cbd-39c7-9e72-c988-9bde893998d9@oracle.com> Thank you, Igor Vladimir On 6/4/20 10:25 AM, Igor Ignatyev wrote: > Hi Vladimir, > > looks good to me, > > Thanks, > -- Igor > >> On Jun 4, 2020, at 9:45 AM, Vladimir Kozlov wrote: >> >> Thank you, Tobias >> >> On 6/4/20 9:09 AM, Tobias Hartmann wrote: >>> Hi Vladimir, >>> looks good to me but I would call the method "should_wait_for_compilation" (no new webrev required). >> >> Changed. >> >> Thanks, >> Vladimir >> >>> Best regards, >>> Tobias >>> On 04.06.20 04:54, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/8227647/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8227647 >>>> >>>> Test failed because JVMCI after 10 sec unblock [1] blocking (-Xbatch) compilation even so Graal >>>> continue compile the method. As result WB think compilation failed. >>>> LogCompilation shows that first Graal's compilation (which is not the tested method) can take > 10 >>>> sec on slow machine because Graal is run in Interpreter mode and is gradually compiled by C1 (which >>>> is also consumes CPU resources). >>>> >>>> I suggest to wait compilation finished if request came from testing environment (WB, CTW, replay,...) >>>> >>>> Tested hs-tier1,hs-tier3-graal and many runs of failed test. >>>> Thanks, >>>> Vladimir >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk/jdk/file/0a32396f7a69/src/hotspot/share/compiler/compileBroker.cpp#l1583 > From vladimir.kozlov at oracle.com Thu Jun 4 21:07:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 14:07:38 -0700 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: <5e478bc9-2a36-b8f2-eca9-abcfc19a2239@oracle.com> References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> <5e478bc9-2a36-b8f2-eca9-abcfc19a2239@oracle.com> Message-ID: Testing passed. You can push it. Regards, Vladimir On 6/4/20 10:16 AM, Vladimir Kozlov wrote: > Looks good. I submitted our testing. I will let you know when it finished. > > Regards, > Vladimir > > On 6/4/20 9:19 AM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> Please find the updated webrev below which now checks for sse4_1 support. >> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.03/ >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Wednesday, June 3, 2020 3:28 PM >> To: Rukmannagari, Shravya ; 'hotspot compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions >> >> Then you have to check for VM_Version::supports_sse4_1() instead of UseSSE flag. >> >> Note, setting UseSSE flag lower disable 4.1 and 4.2 [1] the same way setting lower UseAVX disable AVX features [2]. >> >> Thanks, >> Vladimir >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l644 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/x86/vm_version_x86.cpp#l695 >> >> On 6/3/20 3:03 PM, Rukmannagari, Shravya wrote: >>> Hi Vladimir, >>> I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. >>> >>> Thanks, >>> Shravya. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Tuesday, June 2, 2020 10:09 PM >>> To: Rukmannagari, Shravya ; 'hotspot >>> compiler' >>> Cc: Tucker, Greg B >>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>> instructions >>> >>> How did it fail? UseSSE setting does not affect AVX settings. It seems you are using instructions from sse4.2 but not >>> checking for that. >>> >>> Vladimir >>> >>> On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: >>>> Hi Vladimir, >>>> The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. >>>> This test is run with SSE=3 as: >>>> run main/othervm -Xcomp -XX:UseSSE=3 >>>> compiler.cpuflags.TestSSE4Disabled >>>> Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. >>>> >>>> Thanks, >>>> Shravya. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Tuesday, June 2, 2020 5:00 PM >>>> To: Rukmannagari, Shravya ; 'hotspot >>>> compiler' >>>> Cc: Tucker, Greg B >>>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>>> instructions >>>> >>>> On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >>>>> Hi Vladimir, >>>>> Thanks a lot for the review. I have modified the patch as per your comments. The CRC32 code is now in >>>>> macroAssembler_x86.cpp. >>>>> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ >>>> >>>> Why you added UseSSE check in stubGenerator_x86_64.cpp? >>>> >>>> +??? if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && >>>> >>>>> >>>>> The stubGenerator_x86_64.cpp would be verified only for 64-bit builds. I have verified the 32-bit builds and also >>>>> ran the test cases to ensure no issues or failures. >>>> >>>> You are right about this. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> Please let me know if you have questions or comments. >>>>> >>>>> Thanks, >>>>> Shravya. >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov >>>>> Sent: Monday, June 1, 2020 2:36 PM >>>>> To: Rukmannagari, Shravya ; 'hotspot >>>>> compiler' >>>>> Cc: Tucker, Greg B >>>>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>>>> instructions >>>>> >>>>> Hi Shravya, >>>>> >>>>> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >>>>> This file is used only for AES intrinsic code - nothing else should be there. >>>>> >>>>> If you think CRC32 code is too large for macroAssembler_x86.cpp I would suggest to move all CRC32 code, old and >>>>> new, into new macroAssembler_x86_crc32.cpp file. >>>>> >>>>> I see that you want to implement new code only for 64 bit which is fine and you guarded it correctly wiht #ifrdef >>>>> _LP64. >>>>> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >>>>> >>>>> It is difficult to judge the implementation code. I hope you ran all tests for it. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>>>>> Hi All, >>>>>> >>>>>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>>>>> >>>>>> >>>>>> >>>>>> Contributors: >>>>>> >>>>>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>>>>> >>>>>> Greg B Tucker(greg.b.tucker at intel.com) >>>>>> >>>>>> >>>>>> >>>>>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>>>>> >>>>>> >>>>>> >>>>>> Please take a look and let me know if you have any questions or comments. >>>>>> >>>>>> >>>>>> >>>>>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>>>>> >>>>>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Shravya Rukmannagari >>>>>> From dean.long at oracle.com Thu Jun 4 23:46:37 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 4 Jun 2020 16:46:37 -0700 Subject: RFR(XL) 8243380: Update Graal In-Reply-To: References: <948d9a9f-22d4-b95d-d948-a67ead20ad14@oracle.com> Message-ID: I've updated my repo to the latest changes and I'll rerun testing. dl On 6/4/20 9:00 AM, Vladimir Kozlov wrote: > Too many failures including builds. > > Vladimir > > On 6/4/20 1:03 AM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8243380 >> http://cr.openjdk.java.net/~dlong/8243380/webrev/ >> >> This is a Graal update.? Changes since the last update (JDK-8241231) >> are listed in the bug description. >> >> dl >> >> From sandhya.viswanathan at intel.com Fri Jun 5 00:45:45 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 5 Jun 2020 00:45:45 +0000 Subject: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions In-Reply-To: References: <88e70fc5-e83c-1ae0-f4e7-ae80ce517a70@oracle.com> <7927d1bd-501e-cc3c-06a5-3e8c7e25f38e@oracle.com> <50e5295e-e5aa-9dc2-7435-c95cd4dce7fa@oracle.com> <5e478bc9-2a36-b8f2-eca9-abcfc19a2239@oracle.com> Message-ID: Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/d1cdbb790e8b Thanks a lot Vladimir! Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Thursday, June 04, 2020 2:08 PM To: Rukmannagari, Shravya ; 'hotspot compiler' Cc: Tucker, Greg B Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 instructions Testing passed. You can push it. Regards, Vladimir On 6/4/20 10:16 AM, Vladimir Kozlov wrote: > Looks good. I submitted our testing. I will let you know when it finished. > > Regards, > Vladimir > > On 6/4/20 9:19 AM, Rukmannagari, Shravya wrote: >> Hi Vladimir, >> Please find the updated webrev below which now checks for sse4_1 support. >> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.03/ >> >> Thanks, >> Shravya. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Wednesday, June 3, 2020 3:28 PM >> To: Rukmannagari, Shravya ; 'hotspot >> compiler' >> Cc: Tucker, Greg B >> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >> instructions >> >> Then you have to check for VM_Version::supports_sse4_1() instead of UseSSE flag. >> >> Note, setting UseSSE flag lower disable 4.1 and 4.2 [1] the same way setting lower UseAVX disable AVX features [2]. >> >> Thanks, >> Vladimir >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/ >> x86/vm_version_x86.cpp#l644 [2] >> http://hg.openjdk.java.net/jdk/jdk/file/839d49bd8d8d/src/hotspot/cpu/ >> x86/vm_version_x86.cpp#l695 >> >> On 6/3/20 3:03 PM, Rukmannagari, Shravya wrote: >>> Hi Vladimir, >>> I have verified that the code does not use SSE4.2, whereas it uses SSE4.1 instruction set. >>> >>> Thanks, >>> Shravya. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Tuesday, June 2, 2020 10:09 PM >>> To: Rukmannagari, Shravya ; 'hotspot >>> compiler' >>> Cc: Tucker, Greg B >>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>> instructions >>> >>> How did it fail? UseSSE setting does not affect AVX settings. It >>> seems you are using instructions from sse4.2 but not checking for that. >>> >>> Vladimir >>> >>> On 6/2/20 6:15 PM, Rukmannagari, Shravya wrote: >>>> Hi Vladimir, >>>> The compiler/cpuflags/TestSSE4Disabled.java jtreg test was failing without the check. >>>> This test is run with SSE=3 as: >>>> run main/othervm -Xcomp -XX:UseSSE=3 >>>> compiler.cpuflags.TestSSE4Disabled >>>> Without the UseSSE > 3 check, the JVM tries to generate the new AVX512 CRC32 stub. >>>> >>>> Thanks, >>>> Shravya. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Tuesday, June 2, 2020 5:00 PM >>>> To: Rukmannagari, Shravya ; >>>> 'hotspot compiler' >>>> Cc: Tucker, Greg B >>>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>>> instructions >>>> >>>> On 6/2/20 3:56 PM, Rukmannagari, Shravya wrote: >>>>> Hi Vladimir, >>>>> Thanks a lot for the review. I have modified the patch as per your >>>>> comments. The CRC32 code is now in macroAssembler_x86.cpp. >>>>> http://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.02/ >>>> >>>> Why you added UseSSE check in stubGenerator_x86_64.cpp? >>>> >>>> +??? if (UseSSE > 3 && VM_Version::supports_avx512_vpclmulqdq() && >>>> >>>>> >>>>> The stubGenerator_x86_64.cpp would be verified only for 64-bit >>>>> builds. I have verified the 32-bit builds and also ran the test cases to ensure no issues or failures. >>>> >>>> You are right about this. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> Please let me know if you have questions or comments. >>>>> >>>>> Thanks, >>>>> Shravya. >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov >>>>> Sent: Monday, June 1, 2020 2:36 PM >>>>> To: Rukmannagari, Shravya ; >>>>> 'hotspot compiler' >>>>> Cc: Tucker, Greg B >>>>> Subject: Re: [15] RFR(M): 8245512: CRC32 optimization using AVX512 >>>>> instructions >>>>> >>>>> Hi Shravya, >>>>> >>>>> Why you put new CRC32 avx512 code into macroAssembler_x86_aes.cpp file? >>>>> This file is used only for AES intrinsic code - nothing else should be there. >>>>> >>>>> If you think CRC32 code is too large for macroAssembler_x86.cpp I >>>>> would suggest to move all CRC32 code, old and new, into new macroAssembler_x86_crc32.cpp file. >>>>> >>>>> I see that you want to implement new code only for 64 bit which is >>>>> fine and you guarded it correctly wiht #ifrdef _LP64. >>>>> But you forgot guard in stubGenerator_x86_64.cpp which will cause build failure for 32-bit. >>>>> >>>>> It is difficult to judge the implementation code. I hope you ran all tests for it. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 5/20/20 4:01 PM, Rukmannagari, Shravya wrote: >>>>>> Hi All, >>>>>> >>>>>> We would like to contribute optimizations for CRC32 algorithm for upcoming Intel x86_64 platforms. >>>>>> >>>>>> >>>>>> >>>>>> Contributors: >>>>>> >>>>>> Shravya Rukmannagari(shravya.rukmannagari at intel.com) >>>>>> >>>>>> Greg B Tucker(greg.b.tucker at intel.com) >>>>>> >>>>>> >>>>>> >>>>>> I have tested the patch to confirm correctness and performance. The patch also passes compiler/jtreg tests. >>>>>> >>>>>> >>>>>> >>>>>> Please take a look and let me know if you have any questions or comments. >>>>>> >>>>>> >>>>>> >>>>>> Bug Id: https://bugs.openjdk.java.net/browse/JDK-8245512 >>>>>> >>>>>> https://cr.openjdk.java.net/~srukmannagar/CRC32/webrev.01/ >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Shravya Rukmannagari >>>>>> From xxinliu at amazon.com Fri Jun 5 01:02:42 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 5 Jun 2020 01:02:42 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: References: Message-ID: Hi, Volker, Thank you to review it. - Why do you need the extra check for "method() != NULL": Yes, I try to avoid a corner case. Previously, _printer is set to NULL when c2 compiles runtime stubs(http://hg.openjdk.java.net/jdk/jdk/file/b06f452c8d61/src/hotspot/share/opto/compile.cpp#l825). That will cause a problem for my patch because I initialize _printer to NULL and create it when we do need to print IdealGraph. If users specify PrintIdealGraphLevel>0, C2 will try to dump a NULL method. It is undefined to invoke begin/end_method() if the C->method() is NULL, so it better guard with the NULL check. Here is the context: " Command Line: -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=hello.xml Host: ip-172-31-94-125, Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 48 cores, 184G, Ubuntu 18.04.4 LTS Time: Thu Jun 4 23:25:00 2020 UTC elapsed time: 0.150427 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x00007f69a83f3150): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=32990, stack(0x00007f692bc8b000,0x00007f692bd8c000)] Stack: [0x00007f692bc8b000,0x00007f692bd8c000], sp=0x00007f692bd893c0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xc17c94] IdealGraphPrinter::begin_method()+0x434 V [libjvm.so+0x88e5b1] CompileWrapper::CompileWrapper(Compile*)+0x201 V [libjvm.so+0x89f689] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool, DirectiveSet*)+0x4c9 V [libjvm.so+0x1425586] OptoRuntime::generate_stub(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool)+0x156 " I took all your advices except for that. Could you take a look at the new revision: http://cr.openjdk.java.net/~xliu/8139046/01/webrev/ To get rid of the param 'bool need', I set PrintIdealGraph true by default. Previously, if PrintIdealGraph is set, hotspot will initialize a printer object for every Compiler thread. It's not true anymore, so no extra cost. After this patch, the only usage of that flag is to shut down IGVPrinter completely by -XX:-PrintIdealGraph. I wish we can vote it out in the future and use PrintIdealGraphLevel. Thanks, --lx From: Volker Simonis Date: Thursday, June 4, 2020 at 10:13 AM To: "Liu, Xin" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL] RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: Hi, Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true,? it guarantees that Compile::printer() is not NULL. If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3",? printer() will only turn on for that compiler thread. - Why do you need the extra check for? "method() != NULL": 619 if (should_print(1) && method() != NULL) { 4584 if (should_print(level) && method() != NULL) { in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. - Why did you make 453 IdealGraphPrinter* printer() { return _printer; } a "const" function? 453 IdealGraphPrinter* printer() const { return _printer; } I don't think it is required? - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. Besides that, your change looks good. Thank you and best regards, Volker Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. Error occurred during initialization of VM Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference Thanks, --lx From dean.long at oracle.com Fri Jun 5 04:27:38 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 4 Jun 2020 21:27:38 -0700 Subject: RFR(XL) 8243380: Update Graal In-Reply-To: References: <948d9a9f-22d4-b95d-d948-a67ead20ad14@oracle.com> Message-ID: <8ffb4f56-ea10-26de-feee-b4f060edf2df@oracle.com> tiers 1-4 results are much better now. dl On 6/4/20 4:46 PM, Dean Long wrote: > I've updated my repo to the latest changes and I'll rerun testing. > > dl > > On 6/4/20 9:00 AM, Vladimir Kozlov wrote: >> Too many failures including builds. >> >> Vladimir >> >> On 6/4/20 1:03 AM, Dean Long wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8243380 >>> http://cr.openjdk.java.net/~dlong/8243380/webrev/ >>> >>> This is a Graal update.? Changes since the last update (JDK-8241231) >>> are listed in the bug description. >>> >>> dl >>> >>> > From vladimir.kozlov at oracle.com Fri Jun 5 05:35:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Jun 2020 22:35:34 -0700 Subject: RFR(XL) 8243380: Update Graal In-Reply-To: <8ffb4f56-ea10-26de-feee-b4f060edf2df@oracle.com> References: <8ffb4f56-ea10-26de-feee-b4f060edf2df@oracle.com> Message-ID: <60F6E5EE-65A4-4A9A-85B4-2FCB7780E110@oracle.com> Good. Thanks Vladimir > On Jun 4, 2020, at 9:28 PM, Dean Long wrote: > > ?tiers 1-4 results are much better now. > > dl > >> On 6/4/20 4:46 PM, Dean Long wrote: >> I've updated my repo to the latest changes and I'll rerun testing. >> >> dl >> >>> On 6/4/20 9:00 AM, Vladimir Kozlov wrote: >>> Too many failures including builds. >>> >>> Vladimir >>> >>> On 6/4/20 1:03 AM, Dean Long wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8243380 >>>> http://cr.openjdk.java.net/~dlong/8243380/webrev/ >>>> >>>> This is a Graal update. Changes since the last update (JDK-8241231) are listed in the bug description. >>>> >>>> dl >>>> >>>> >> > From Yang.Zhang at arm.com Fri Jun 5 05:52:51 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Fri, 5 Jun 2020 05:52:51 +0000 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: Hi Andrew Thanks a lot for your review. The test cases in TestScalar.java are used to benchmark AbsI. @Benchmark public void testAbsI() { for (int n = 0; n < LOOP_CNT; n++) { for (int i = 0; i < ia.length; i += 4) { ----------> That stride is *4* will make auto-vectorization fail. Only AbsI node is generated. ic[i] = Math.abs(ia[i] + ib[i]); } } } To reproduce my result, you can try this test case: public static void absvs(short[] a, short[] b, short[] c) { for (int i = 0; i < a.length; i++) { c[i] = (short)Math.abs((a[i] + b[i])); } } Or you can also use jmh vector test cases which are used to benchmark vector abs. http://cr.openjdk.java.net/~yzhang/8243597/TestVect.java Regards Yang -----Original Message----- From: Andrew Haley Sent: Friday, June 5, 2020 12:10 AM To: Yang Zhang ; aarch64-port-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net Cc: nd Subject: Re: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs On 18/05/2020 06:51, Yang Zhang wrote: > Testing: > Full jtreg test > Vector API tests which cover vector abs > > Test case: > public static void absvs(short[] a, short[] b, short[] c) { > for (int i = 0; i < a.length; i++) { > c[i] = (short)Math.abs((a[i] + b[i])); > } > } > > Assembly code generated by C2: > 0x0000ffffaca3f3ac: ldr q17, [x16, #16] > 0x0000ffffaca3f3b0: ldr q16, [x15, #16] > 0x0000ffffaca3f3b4: add v16.8h, v16.8h, v17.8h > 0x0000ffffaca3f3b8: abs v16.8h, v16.8h > 0x0000ffffaca3f3c0: str q16, [x12, #16] > > Similar test cases for byte/int/long are also tested and NEON abs instruction is generated by C2. Unfortunately the test cases you provided do not include the method absvs(short). I'm not seeing this result. All I get with your patch applied in the case of your test TestScalar @Benchmark public void testAbsI() { for (int n = 0; n < LOOP_CNT; n++) { for (int i = 0; i < ia.length; i += 4) { ic[i] = Math.abs(ia[i] + ib[i]); } } } is ;; B18: # out( B18 B19 ) <- in( B17 B18 ) Loop( B18-B18 inner main of N82 strip mined) Freq: 9.69583e+08 0x0000ffff78824da0: sbfiz x11, x4, #2, #32 0x0000ffff78824da4: add x7, x0, x11 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 36 (line 44) 0x0000ffff78824da8: add xmethod, x18, x11 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 30 (line 44) 0x0000ffff78824dac: ldr w2, [x7,#16] 0x0000ffff78824db0: ldr w13, [xmethod,#16] 0x0000ffff78824db4: add w13, w13, w2 0x0000ffff78824db8: cmp w13, wzr 0x0000ffff78824dbc: cneg w1, w13, lt 0x0000ffff78824dc0: add x11, x3, x11 0x0000ffff78824dc4: str w1, [x11,#16] 0x0000ffff78824dc8: ldr w2, [x7,#32] 0x0000ffff78824dcc: ldr w1, [xmethod,#32] 0x0000ffff78824dd0: add w13, w1, w2 0x0000ffff78824dd4: cmp w13, wzr 0x0000ffff78824dd8: cneg w1, w13, lt 0x0000ffff78824ddc: str w1, [x11,#32] 0x0000ffff78824de0: ldr w13, [x7,#48] 0x0000ffff78824de4: ldr w1, [xmethod,#48] 0x0000ffff78824de8: add w1, w1, w13 0x0000ffff78824dec: cmp w1, wzr 0x0000ffff78824df0: cneg w13, w1, lt 0x0000ffff78824df4: str w13, [x11,#48] ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 41 (line 44) 0x0000ffff78824df8: ldr w1, [x7,#64] 0x0000ffff78824dfc: ldr w12, [xmethod,#64] 0x0000ffff78824e00: add w12, w12, w1 0x0000ffff78824e04: cmp w12, wzr 0x0000ffff78824e08: cneg w13, w12, lt 0x0000ffff78824e0c: add w4, w4, #0x10 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 42 (line 43) 0x0000ffff78824e10: str w13, [x11,#64] ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 41 (line 44) 0x0000ffff78824e14: cmp w4, w6 0x0000ffff78824e18: b.lt 0x0000ffff78824da0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.TestScalar::testAbsI at 17 (line 43) Please provide me with a Java program that reproduces the result above, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From richard.reingruber at sap.com Fri Jun 5 07:18:53 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 5 Jun 2020 07:18:53 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi, > The mach5 test run is good. Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Just curious: is mach5 an alias for tier5? And is this mach5 the same as in "Job: mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Thanks, Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Donnerstag, 4. Juni 2020 04:07 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, The mach5 test run is good. Thanks, Serguei On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. > > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From serguei.spitsyn at oracle.com Fri Jun 5 07:31:01 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 5 Jun 2020 00:31:01 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> Hi Richard, On 6/5/20 00:18, Reingruber, Richard wrote: > Hi, > >> The mach5 test run is good. > Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Great, thanks! > Just curious: is mach5 an alias for tier5? The mach5 is a build and test system which also provides CI. Tier5 is one of the testing levels. > And is this mach5 the same as in "Job: > mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Yes. I guess all mach5 jobs have this prefix. Thanks, Serguei > > Thanks, > Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Donnerstag, 4. Juni 2020 04:07 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > The mach5 test run is good. > > Thanks, > Serguei > > > On 6/2/20 10:57, Reingruber, Richard wrote: >> Hi Serguei, >> >>> This looks good to me. >> Thanks! >> >> From an earlier mail: >> >>> I'm thinking it would be more safe to run full tier5. >> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would >> like to push. >> >> Thanks, Richard. >> >> -----Original Message----- >> From: serguei.spitsyn at oracle.com >> Sent: Dienstag, 2. Juni 2020 18:55 >> To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant >> >> Hi Richard, >> >> This looks good to me. >> >> Thanks, >> Serguei >> >> >> On 5/28/20 09:02, Vladimir Kozlov wrote: >>> Vladimir Ivanov is on break currently. >>> It looks good to me. >>> >>> Thanks, >>> Vladimir K >>> >>> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>>> Hi Vladimir, >>>> >>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>>> Not an expert in JVMTI code base, so can't comment on the actual >>>>> changes. >>>>> ? From JIT-compilers perspective it looks good. >>>> I put out webrev.1 a while ago [1]: >>>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>>> Webrev(delta): >>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>>> >>>> You originally suggested to use a handshake to switch a thread into >>>> interpreter mode [2]. I'm using >>>> a direct handshake now, because I think it is the best fit. >>>> >>>> May I ask if webrev.1 still looks good to you from JIT-compilers >>>> perspective? >>>> >>>> Can I list you as (partial) Reviewer? >>>> >>>> Thanks, Richard. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>>> [2] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Freitag, 7. Februar 2020 09:19 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(S) 8238585: Use handshake for >>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>>> compiled methods on stack not_entrant >>>> >>>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> >>>> ? From JIT-compilers perspective it looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>>> >>>>> The change avoids making all compiled methods on stack not_entrant >>>>> when switching a java thread to >>>>> interpreter only execution for jvmti purposes. It is sufficient to >>>>> deoptimize the compiled frames on stack. >>>>> >>>>> Additionally a handshake is used instead of a vm operation to walk >>>>> the stack and do the deoptimizations. >>>>> >>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>>> release builds on all platforms. >>>>> >>>>> Thanks, Richard. >>>>> >>>>> See also my question if anyone knows a reason for making the >>>>> compiled methods not_entrant: >>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>>> >>>>> From aph at redhat.com Fri Jun 5 07:53:00 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 5 Jun 2020 08:53:00 +0100 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: On 05/06/2020 06:52, Yang Zhang wrote: > The test cases in TestScalar.java are used to benchmark AbsI. When I ask for a java program that reproduces your result, it's not unreasonable for me to expect you to send one. You still haven't, and I don't understand why your test case wasn't provided. Can you please add a benchmark to your JMH benchmarks that actually shows the result you claimed? Thank you? > @Benchmark > public void testAbsI() { > for (int n = 0; n < LOOP_CNT; n++) { > for (int i = 0; i < ia.length; i += 4) { ----------> That stride is *4* will make auto-vectorization fail. Only AbsI node is generated. > ic[i] = Math.abs(ia[i] + ib[i]); > } > } > } > Or you can also use jmh vector test cases which are used to benchmark vector abs. > http://cr.openjdk.java.net/~yzhang/8243597/TestVect.java That is what I have been doing. They do not show the result above. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From richard.reingruber at sap.com Fri Jun 5 08:05:46 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 5 Jun 2020 08:05:46 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> Message-ID: I see. Thanks for the explanation :) Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Freitag, 5. Juni 2020 09:31 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, On 6/5/20 00:18, Reingruber, Richard wrote: > Hi, > >> The mach5 test run is good. > Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Great, thanks! > Just curious: is mach5 an alias for tier5? The mach5 is a build and test system which also provides CI. Tier5 is one of the testing levels. > And is this mach5 the same as in "Job: > mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Yes. I guess all mach5 jobs have this prefix. Thanks, Serguei > > Thanks, > Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Donnerstag, 4. Juni 2020 04:07 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > The mach5 test run is good. > > Thanks, > Serguei > > > On 6/2/20 10:57, Reingruber, Richard wrote: >> Hi Serguei, >> >>> This looks good to me. >> Thanks! >> >> From an earlier mail: >> >>> I'm thinking it would be more safe to run full tier5. >> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would >> like to push. >> >> Thanks, Richard. >> >> -----Original Message----- >> From: serguei.spitsyn at oracle.com >> Sent: Dienstag, 2. Juni 2020 18:55 >> To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant >> >> Hi Richard, >> >> This looks good to me. >> >> Thanks, >> Serguei >> >> >> On 5/28/20 09:02, Vladimir Kozlov wrote: >>> Vladimir Ivanov is on break currently. >>> It looks good to me. >>> >>> Thanks, >>> Vladimir K >>> >>> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>>> Hi Vladimir, >>>> >>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>>> Not an expert in JVMTI code base, so can't comment on the actual >>>>> changes. >>>>> ? From JIT-compilers perspective it looks good. >>>> I put out webrev.1 a while ago [1]: >>>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>>> Webrev(delta): >>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>>> >>>> You originally suggested to use a handshake to switch a thread into >>>> interpreter mode [2]. I'm using >>>> a direct handshake now, because I think it is the best fit. >>>> >>>> May I ask if webrev.1 still looks good to you from JIT-compilers >>>> perspective? >>>> >>>> Can I list you as (partial) Reviewer? >>>> >>>> Thanks, Richard. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>>> [2] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Freitag, 7. Februar 2020 09:19 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(S) 8238585: Use handshake for >>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>>> compiled methods on stack not_entrant >>>> >>>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> >>>> ? From JIT-compilers perspective it looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>>> >>>>> The change avoids making all compiled methods on stack not_entrant >>>>> when switching a java thread to >>>>> interpreter only execution for jvmti purposes. It is sufficient to >>>>> deoptimize the compiled frames on stack. >>>>> >>>>> Additionally a handshake is used instead of a vm operation to walk >>>>> the stack and do the deoptimizations. >>>>> >>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>>> release builds on all platforms. >>>>> >>>>> Thanks, Richard. >>>>> >>>>> See also my question if anyone knows a reason for making the >>>>> compiled methods not_entrant: >>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>>> >>>>> From per.liden at oracle.com Fri Jun 5 08:20:22 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 5 Jun 2020 10:20:22 +0200 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: References: Message-ID: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> Hi Igor, When looking at the follow-up sub-tasks for this, I see for example this: http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html Maybe I'm misunderstanding how this is supposed to work, but it looks like this test would now _not_ be executed if I do: make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" Is that so? In that case, that seems incorrect. cheers, Per On 6/3/20 11:30 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> 70 lines changed: 66 ins; 0 del; 4 mod > > Hi all, > > could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? > > the idea behind this patch is to have a way to clearly mark tests which ignore flags, so > a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; > b) they can be easily excluded from runs w/ flags. > > @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. > > this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. > > please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 > webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags > > [1] https://bugs.openjdk.java.net/browse/JDK-8151707 > [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 > [3] https://bugs.openjdk.java.net/browse/JDK-8246387 > > Thanks, > -- Igor > From dean.long at oracle.com Fri Jun 5 08:57:52 2020 From: dean.long at oracle.com (Dean Long) Date: Fri, 5 Jun 2020 01:57:52 -0700 Subject: RFR(XL) 8243380: Update Graal In-Reply-To: <60F6E5EE-65A4-4A9A-85B4-2FCB7780E110@oracle.com> References: <8ffb4f56-ea10-26de-feee-b4f060edf2df@oracle.com> <60F6E5EE-65A4-4A9A-85B4-2FCB7780E110@oracle.com> Message-ID: <9c499ad1-0fa2-de8b-f2af-b0b7eaadd51c@oracle.com> Thanks Vladimir. dl On 6/4/20 10:35 PM, Vladimir Kozlov wrote: > Good. > > Thanks > Vladimir > >> On Jun 4, 2020, at 9:28 PM, Dean Long wrote: >> >> ?tiers 1-4 results are much better now. >> >> dl >> >>> On 6/4/20 4:46 PM, Dean Long wrote: >>> I've updated my repo to the latest changes and I'll rerun testing. >>> >>> dl >>> >>>> On 6/4/20 9:00 AM, Vladimir Kozlov wrote: >>>> Too many failures including builds. >>>> >>>> Vladimir >>>> >>>> On 6/4/20 1:03 AM, Dean Long wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8243380 >>>>> http://cr.openjdk.java.net/~dlong/8243380/webrev/ >>>>> >>>>> This is a Graal update. Changes since the last update (JDK-8241231) are listed in the bug description. >>>>> >>>>> dl >>>>> >>>>> From Yang.Zhang at arm.com Fri Jun 5 10:35:26 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Fri, 5 Jun 2020 10:35:26 +0000 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: Hi Andrew Please check this java program. http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java absvs is used to generate AbsVS node. Abss is used to generate AbsI node. I update the jmh benchmarks to make them aligned with absvs and abss above. The new results are as follows: New vector jmh: http://cr.openjdk.java.net/~yzhang/8243597/TestVectNew.java New scalar jmh: http://cr.openjdk.java.net/~yzhang/8243597/TestScalarNew.java Before: Benchmark (size) Mode Cnt Score Error Units TestVectNew.testVectAbsVB 1024 avgt 5 1221.852 ? 3.336 us/op TestVectNew.testVectAbsVI 1024 avgt 5 1450.422 ? 6.344 us/op TestVectNew.testVectAbsVL 1024 avgt 5 1429.934 ? 4.901 us/op TestVectNew.testVectAbsVS 1024 avgt 5 1227.134 ? 2.901 us/op TestScalarNew.testAbsI 1024 avgt 5 3777.007 ? 10.067 us/op TestScalarNew.testAbsL 1024 avgt 5 3776.717 ? 13.776 us/op TestScalarNew.testAbsS 1024 avgt 5 3153.195 ? 10.175 us/op After Benchmark (size) Mode Cnt Score Error Units TestVectNew.testVectAbsVB 1024 avgt 5 147.389 ? 0.921 us/op TestVectNew.testVectAbsVI 1024 avgt 5 444.318 ? 14.107 us/op TestVectNew.testVectAbsVL 1024 avgt 5 874.074 ? 2.224 us/op TestVectNew.testVectAbsVS 1024 avgt 5 224.559 ? 0.902 us/op TestScalarNew.testAbsI 1024 avgt 5 3087.172 ? 62.372 us/op TestScalarNew.testAbsL 1024 avgt 5 3113.322 ? 10.237 us/op TestScalarNew.testAbsS 1024 avgt 5 2723.048 ? 8.338 us/op Why the improvement of scalar abs is not as obvious as vector abs is because only one instruction is reduced than before. Before: 0x0000ffff80b763d8: cmp w12, #0x0 0x0000ffff80b763dc: neg w11, w12 0x0000ffff80b763e0: csel w11, w11, w12, lt // lt = tstop After: 0x0000ffffa0bd7a38: cmp w12, wzr 0x0000ffffa0bd7a3c: cneg w13, w12, lt // lt = tstop Ps. The generated assembly files are also attached. Before this patch http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.ori.asm After this patch: http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.asm Regards Yang -----Original Message----- From: Andrew Haley Sent: Friday, June 5, 2020 3:53 PM To: Yang Zhang ; aarch64-port-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net Cc: nd Subject: Re: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs On 05/06/2020 06:52, Yang Zhang wrote: > The test cases in TestScalar.java are used to benchmark AbsI. When I ask for a java program that reproduces your result, it's not unreasonable for me to expect you to send one. You still haven't, and I don't understand why your test case wasn't provided. Can you please add a benchmark to your JMH benchmarks that actually shows the result you claimed? Thank you? > @Benchmark > public void testAbsI() { > for (int n = 0; n < LOOP_CNT; n++) { > for (int i = 0; i < ia.length; i += 4) { ----------> That stride is *4* will make auto-vectorization fail. Only AbsI node is generated. > ic[i] = Math.abs(ia[i] + ib[i]); > } > } > } > Or you can also use jmh vector test cases which are used to benchmark vector abs. > http://cr.openjdk.java.net/~yzhang/8243597/TestVect.java That is what I have been doing. They do not show the result above. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From hohensee at amazon.com Fri Jun 5 13:38:18 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 5 Jun 2020 13:38:18 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: References: Message-ID: Re PrintIdealGraph, I'd consider removing its remaining reference in compile.hpp in favor of 1. Leave the default value of PrintIdealGraph at false. It's confusing to see it set true by default. 2. Make -XX:-PrintIdealGraph a synonym for PrintIdealGraphLevel == 0. I.e., set PrintIdealGraphLevel to 0 in ergo_initialize() if PrintIdealGraph was explicitly set false on the command line. Use if (FLAG_IS_CMDLINE(PrintIdealGraph) && !PrintIdealGraph)) { FLAG_SET_ERGO(PrintIdealGraphLevel, 0); } 3. Ignore PrintIdealGraph otherwise, which is what happens now (i.e., +PrintIdealGraph has no effect if PrintIdealGraphLevel == 0, which it is by default). Have should_print() in compile.hpp test for PrintIdealGraphLevel == 0 instead of !PrintIdealGraph. That way, a future deprecation/removal of PrintIdealGraph is isolated. And, you file an RFE to do that. Thanks, Paul ?On 6/4/20, 6:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Volker, Thank you to review it. - Why do you need the extra check for "method() != NULL": Yes, I try to avoid a corner case. Previously, _printer is set to NULL when c2 compiles runtime stubs(http://hg.openjdk.java.net/jdk/jdk/file/b06f452c8d61/src/hotspot/share/opto/compile.cpp#l825). That will cause a problem for my patch because I initialize _printer to NULL and create it when we do need to print IdealGraph. If users specify PrintIdealGraphLevel>0, C2 will try to dump a NULL method. It is undefined to invoke begin/end_method() if the C->method() is NULL, so it better guard with the NULL check. Here is the context: " Command Line: -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=hello.xml Host: ip-172-31-94-125, Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 48 cores, 184G, Ubuntu 18.04.4 LTS Time: Thu Jun 4 23:25:00 2020 UTC elapsed time: 0.150427 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x00007f69a83f3150): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=32990, stack(0x00007f692bc8b000,0x00007f692bd8c000)] Stack: [0x00007f692bc8b000,0x00007f692bd8c000], sp=0x00007f692bd893c0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xc17c94] IdealGraphPrinter::begin_method()+0x434 V [libjvm.so+0x88e5b1] CompileWrapper::CompileWrapper(Compile*)+0x201 V [libjvm.so+0x89f689] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool, DirectiveSet*)+0x4c9 V [libjvm.so+0x1425586] OptoRuntime::generate_stub(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool)+0x156 " I took all your advices except for that. Could you take a look at the new revision: http://cr.openjdk.java.net/~xliu/8139046/01/webrev/ To get rid of the param 'bool need', I set PrintIdealGraph true by default. Previously, if PrintIdealGraph is set, hotspot will initialize a printer object for every Compiler thread. It's not true anymore, so no extra cost. After this patch, the only usage of that flag is to shut down IGVPrinter completely by -XX:-PrintIdealGraph. I wish we can vote it out in the future and use PrintIdealGraphLevel. Thanks, --lx From: Volker Simonis Date: Thursday, June 4, 2020 at 10:13 AM To: "Liu, Xin" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL] RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: Hi, Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true, it guarantees that Compile::printer() is not NULL. If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", printer() will only turn on for that compiler thread. - Why do you need the extra check for "method() != NULL": 619 if (should_print(1) && method() != NULL) { 4584 if (should_print(level) && method() != NULL) { in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. - Why did you make 453 IdealGraphPrinter* printer() { return _printer; } a "const" function? 453 IdealGraphPrinter* printer() const { return _printer; } I don't think it is required? - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. Besides that, your change looks good. Thank you and best regards, Volker Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. Error occurred during initialization of VM Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference Thanks, --lx From aph at redhat.com Fri Jun 5 15:01:48 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 5 Jun 2020 16:01:48 +0100 Subject: [aarch64-port-dev ] RFR(S): 8243597: AArch64: Add support for integer vector abs In-Reply-To: References: Message-ID: <90584282-f46e-e45b-72e9-acea5d01b999@redhat.com> On 05/06/2020 11:35, Yang Zhang wrote: > Hi Andrew > > Please check this java program. > http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java > absvs is used to generate AbsVS node. > Abss is used to generate AbsI node. > > I update the jmh benchmarks to make them aligned with absvs and abss above. The new results are as follows: > New vector jmh: > http://cr.openjdk.java.net/~yzhang/8243597/TestVectNew.java > New scalar jmh: > http://cr.openjdk.java.net/~yzhang/8243597/TestScalarNew.java > > Before: > Benchmark (size) Mode Cnt Score Error Units > TestVectNew.testVectAbsVB 1024 avgt 5 1221.852 ? 3.336 us/op > TestVectNew.testVectAbsVI 1024 avgt 5 1450.422 ? 6.344 us/op > TestVectNew.testVectAbsVL 1024 avgt 5 1429.934 ? 4.901 us/op > TestVectNew.testVectAbsVS 1024 avgt 5 1227.134 ? 2.901 us/op > TestScalarNew.testAbsI 1024 avgt 5 3777.007 ? 10.067 us/op > TestScalarNew.testAbsL 1024 avgt 5 3776.717 ? 13.776 us/op > TestScalarNew.testAbsS 1024 avgt 5 3153.195 ? 10.175 us/op > > After > Benchmark (size) Mode Cnt Score Error Units > TestVectNew.testVectAbsVB 1024 avgt 5 147.389 ? 0.921 us/op > TestVectNew.testVectAbsVI 1024 avgt 5 444.318 ? 14.107 us/op > TestVectNew.testVectAbsVL 1024 avgt 5 874.074 ? 2.224 us/op > TestVectNew.testVectAbsVS 1024 avgt 5 224.559 ? 0.902 us/op > TestScalarNew.testAbsI 1024 avgt 5 3087.172 ? 62.372 us/op > TestScalarNew.testAbsL 1024 avgt 5 3113.322 ? 10.237 us/op > TestScalarNew.testAbsS 1024 avgt 5 2723.048 ? 8.338 us/op I tried TestAbs with a ThunderX2, and it certainly looks nice: great improvement across the board. Benchmark Mode Cnt Score Error Units TestAbs.absvb avgt 8 971.100 ? 1.544 ns/op TestAbs.absvs avgt 8 983.061 ? 1.626 ns/op TestAbs.absvi avgt 8 1170.826 ? 11.055 ns/op TestAbs.absvl avgt 8 1159.936 ? 3.747 ns/op Benchmark Mode Cnt Score Error Units TestAbs.absvb avgt 8 117.981 ? 1.048 ns/op TestAbs.absvs avgt 8 174.949 ? 4.158 ns/op TestAbs.absvi avgt 8 352.012 ? 0.884 ns/op TestAbs.absvl avgt 8 702.076 ? 0.116 ns/op OK, we're good to go. Thanks, approved. > Why the improvement of scalar abs is not as obvious as vector abs is because only one instruction is reduced than before. > Before: > 0x0000ffff80b763d8: cmp w12, #0x0 > 0x0000ffff80b763dc: neg w11, w12 > 0x0000ffff80b763e0: csel w11, w11, w12, lt // lt = tstop > > After: > 0x0000ffffa0bd7a38: cmp w12, wzr > 0x0000ffffa0bd7a3c: cneg w13, w12, lt // lt = tstop That's interesting, too: we don't have a cneg pattern, which is I guess an omission. > Ps. The generated assembly files are also attached. > Before this patch > http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.ori.asm > After this patch: > http://cr.openjdk.java.net/~yzhang/8243597/TestAbs.java.aarch64.asm Great. Again, sorry for the slow response. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Fri Jun 5 15:58:59 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 5 Jun 2020 17:58:59 +0200 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8244719 http://cr.openjdk.java.net/~chagedorn/8244719/webrev.00/ The assertion failure at [5] can be traced back to a wrong assumption made in Parse::Block::init_graph(). It explicitly states in a comment there that we never call next_path_num() along exception paths [1]. But it turns out that this is only true for bytecode generated by Javac which does not seem to produce bytecode where an exception handler is reached by an explicit jump or "fall through" bytecode. An exception handler is only reached with an athrow. However, it is possible to break that assumption with some custom bytecode. The jasm testcase generates such a valid bytecode sequence where an exception handler is reached by jumps from another exception handler: 69: astore_1 70: aload_1 71: aload_0 72: getfield #5 // Field loopCounter:I 75: bipush 10 77: if_icmpge 93 // Explicit jump to exception handler, non-Javac 90: goto 93 // Explicit jump to exception handler, non-Javac 93: astore_1 94: return Exception table: from to target type 0 66 69 Class java/lang/RuntimeException 0 66 93 Class java/lang/Throwable This means that the first time Parse::merge_exception() is called for the exception handler block at bci 93, pnum is set to 3 since there are 2 predecessors (2 jumps to it). In the very first call to merge_common(), is_merged() is still false and we record a state. All following calls to merge_common() for this exception block will take the else case [2]. Once we are processing the blocks for the exception handler at bci 69, we call merge() (and therefore next_path_num()) in do_one_block() [3] twice with target_bci = 93 (2 jumps to bci 93). The last time with pnum = 1 for bci 90: goto and we transform the phi with gvn and set the hash_lock for it to 1 at [4]. Now comes a second bytecode modification trick where we first hit a trap while parsing a block in do_all_blocks(). Therefore, all successor blocks on that path are not merged and skipped in the first iteration of the loop in do_all_blocks() (at this point these blocks seem to be dead). But later we can have a jump back to such a seemingly dead block again. Those are then processed in the second iteration of the loop in do_all_blocks(). If one of these blocks now additionally throw an exception, we can hit this assertion failure. An example could look as follows: Example: // First iteration in do_all_blocks() Parse B1; Parse B2; // Hit trap. Stop parsing on that path, skip on B3 and B4 which immediately follow B2 and have no other predecessors Skip B3; // Was not merged. Assumed to be dead at this point Skip B4; // Was not merged. Assumed to be dead at this point Parse B5; // Discover jump to B3 -> merge B3. Will be processed but only in the next iteration since rpo of B2 is smaller than the one of B5 Parse E1; // Parse exception handler 1 at bci 69 Parse E2; // Parse exception handler 2 at bci 93, apply gvn for phi // Next iteration in do_all_blocks() Parse B3; // Is now merged and ready to be parsed. Has exception to E2: call merge_exception() -> merge_common() with E2 as target and pnum > 1. We hit the assertion at [5] since we already applied a transformation for a phi in the last iteration and therefore have a non-zero hash_lock. As a solution to this problem, I suggest to fix the wrong assumption by changing Parse::Block::init_graph() to also count predecessors for exception blocks. This ensures that [4] is really the last merge for a phi. I did some additional performance testing with standard benchmarks and did not find any regressions. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1314 [2] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1678 [3] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1508 [4] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1773 [5] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1764 From igor.ignatyev at oracle.com Fri Jun 5 16:10:37 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 5 Jun 2020 09:10:37 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> References: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> Message-ID: Hi Per, you are reading this correctly, make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" won't execute gc/z/TestSmallHeap.java; and I don't see it to be incorrect. Let me try to explain why using gc/z/TestSmallHeap.java as a running example. A hotspot test is expected not to be just runnable in an out-of-box configuration, but also to serve its purpose as much as possible (which is not always 100% given some tests require special build flavor, environment setup, etc); in other words, a test is to at least have all necessary VM flags within it and not to hope that someone will provide them. gc/z/TestSmallHeap.java does that, it explicitly selects zGC, so there is no need for -XX:+UseZGC to achieve that. Given this test can be run only when zGC can be selected, it @requires vm.gc.Z, which is set to true if zGC is already explicitly selected or if zGC is available and no other GC is specified, and the latter holds for an out-of-box configuration (assuming that zGC is available in the JVM under test); thus, again, you don't have to specify -XX:+UseZGC to run this test. So there are no "technical" reasons to run gc/z/TestSmallHeap.java (or any other gc/z/ tests) with -XX:+UseZGC. The proposed patches don't change that fact in any way. The patches exclude the tests that ignore external VM flags from execution if any significant VM flags are specified. gc/z/TestSmallHeap.java ignores all externally provided VM flags, including -XX:+UseZGC. And although in the case of -XX:+UseZGC, it's harmless, in almost all other cases it's not. Just to give you a few examples: Let's say you are fixing a bug in zGC which could be reproduced by gc/z/TestSmallHeap.java. You came up with two alternative solutions, one of which is guarded by `if (UseNewCode)`. To test these solutions, you ran gc/z tests twice: with -XX:+UseZGC -XX:+UseNewCode, and all tests passed; with XX:+UseZGC, and many tests (but not gc/z/TestSmallHeap.java) failed. So based on these results, you decided that the guarded solution is perfect, cleaned up the code, sent it out for review, got it pushed, and minutes later found out that gc/z/TestSmallHeap.java and some other tests which ignore VM flags failed. It would take you some time, to realize that you hadn't tested your UseNewCode solution by these tests. Yet were these tests excluded from your testing, it would be much easier for you to spot that and react accordingly. Here is another scenario, you decided to change the default value of ZUncommit, so you ran different tests with `XX:+UseZGC -XX:-ZUncommit`, all green, you pushed a trivial change s/true/false in z_globals.hpp, next thing you knew a bunch of zGC specific tests failed in CI. And again, these were the tests that silently ignored `XX:+UseZGC -XX:-ZUncommit`. Or a slight variation, zGC-supported was added to a future JIT, gc/z tests were run with the flag combination which enabled the future JIT, all passed, the victory was declared; N releases later; default JIT got changed to the future JIT; the next CI build is a disaster, with lots of tests failing from the bugs which had not been found N/2 years ago. Although I understand that it might take some getting used to from you and others who used to run gc/x tests with -XX:+Use${X}GC, I am certain that this will improve the overall quality of hotspot, save not only machine time (from running these tests with other flags) but engineers time from analyzing surprising failures, and increase confidence and trust in the hotspot test suite. In a word, I can see how this can be a bit surprising, yet still less surprising than the current behavior, but I don't see it as incorrect, it just surfaces limitations of certain tests. From my (slightly biased) point of view, it's the right thing to do. Thanks. -- Igor > On Jun 5, 2020, at 1:20 AM, Per Liden wrote: > > Hi Igor, > > When looking at the follow-up sub-tasks for this, I see for example this: > > http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html > > Maybe I'm misunderstanding how this is supposed to work, but it looks like this test would now _not_ be executed if I do: > > make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" > > Is that so? In that case, that seems incorrect. > > cheers, > Per > > On 6/3/20 11:30 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >>> 70 lines changed: 66 ins; 0 del; 4 mod >> Hi all, >> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? >> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so >> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; >> b) they can be easily excluded from runs w/ flags. >> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. >> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. >> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 >> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags >> [1] https://bugs.openjdk.java.net/browse/JDK-8151707 >> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 >> [3] https://bugs.openjdk.java.net/browse/JDK-8246387 >> Thanks, >> -- Igor From vladimir.kozlov at oracle.com Fri Jun 5 17:54:25 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 5 Jun 2020 10:54:25 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> Message-ID: <68611167-a277-75c9-8e74-c173502ddeae@oracle.com> I assume it is latest webrev: http://cr.openjdk.java.net/~roland/8223051/webrev.02/ First, I think this change should wait until JDK 16 (which will be very soon). And I need more time to look through changes. In original RFR, Roland, you showed that loop will be transformed into: for (long l = long_start; l < long_stop; ) { int int_stride = (int)long_stride; int int_stop = MIN(long_stop - l, max_jint - int_stride); l += int_stop; for (int i = 0; i < int_stop; i += int_stride) { } } My question is about 'l += int_stop'. Do you optimize only loops which does not have any references to 'l' inside loop? Will you process expressions like (l + i) in inner loops? Thanks, Vladimir On 6/3/20 11:31 PM, Tobias Hartmann wrote: > Hi Roland, > > On 03.06.20 11:16, Roland Westrelin wrote: >>>>> Tobias might wish to run some regression tests on the final changes. >>> >>> I've submitted some testing. Will report back once it finished. >> >> Thanks. > > I'm seeing some failures with -XX:StressLongCountedLoop=429496729. Will follow-up offline. > >>> loopnode.cpp: >>> - line 504: "check_stride_overflow" is already in 8244504 >> >> Right but the one in this webrev is for longs so doesn't have the same >> signature. > > Right and as John already mentioned it's probably not worth using templates here. > >>> loopnode.hpp: >>> - line 1462: Shouldn't _loop_invokes/_loop_work also be volatile and be incremented atomically? >> >> I think so too. That they are not incremented atomically implies nobody >> uses them AFAICT. If we want all of them to be handled correctly then >> someone needs to go over Compile::print_statistics() and all methods it >> calls. > > Right that's out of scope of this change. We never consider to have accurate statistic counters. The main criteria is to have them as cheap as possible execution wise. And using atomics was expensive back in old days. Not today. Thanks, Vladimir > > Best regards, > Tobias > From vladimir.kozlov at oracle.com Fri Jun 5 17:56:30 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 5 Jun 2020 10:56:30 -0700 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" In-Reply-To: References: Message-ID: <709b0894-4c30-71ed-4025-a58c4d89fd59@oracle.com> Good. Thanks, Vladimir On 6/5/20 8:58 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8244719 > http://cr.openjdk.java.net/~chagedorn/8244719/webrev.00/ > > The assertion failure at [5] can be traced back to a wrong assumption made in Parse::Block::init_graph(). It explicitly > states in a comment there that we never call next_path_num() along exception paths [1]. But it turns out that this is > only true for bytecode generated by Javac which does not seem to produce bytecode where an exception handler is reached > by an explicit jump or "fall through" bytecode. An exception handler is only reached with an athrow. > > However, it is possible to break that assumption with some custom bytecode. The jasm testcase generates such a valid > bytecode sequence where an exception handler is reached by jumps from another exception handler: > > ????? 69: astore_1 > ????? 70: aload_1 > ????? 71: aload_0 > ????? 72: getfield????? #5????????????????? // Field loopCounter:I > ????? 75: bipush??????? 10 > ????? 77: if_icmpge???? 93 // Explicit jump to exception handler, non-Javac > ????? 90: goto????????? 93 // Explicit jump to exception handler, non-Javac > ????? 93: astore_1 > ????? 94: return > ??? Exception table: > ?????? from??? to? target type > ?????????? 0??? 66??? 69?? Class java/lang/RuntimeException > ?????????? 0??? 66??? 93?? Class java/lang/Throwable > > This means that the first time Parse::merge_exception() is called for the exception handler block at bci 93, pnum is set > to 3 since there are 2 predecessors (2 jumps to it). In the very first call to merge_common(), is_merged() is still > false and we record a state. All following calls to merge_common() for this exception block will take the else case [2]. > Once we are processing the blocks for the exception handler at bci 69, we call merge() (and therefore next_path_num()) > in do_one_block() [3] twice with target_bci = 93 (2 jumps to bci 93). The last time with pnum = 1 for bci 90: goto and > we transform the phi with gvn and set the hash_lock for it to 1 at [4]. > > Now comes a second bytecode modification trick where we first hit a trap while parsing a block in do_all_blocks(). > Therefore, all successor blocks on that path are not merged and skipped in the first iteration of the loop in > do_all_blocks() (at this point these blocks seem to be dead). But later we can have a jump back to such a seemingly dead > block again. Those are then processed in the second iteration of the loop in do_all_blocks(). If one of these blocks now > additionally throw an exception, we can hit this assertion failure. An example could look as follows: > > Example: > // First iteration in do_all_blocks() > Parse B1; > Parse B2; // Hit trap. Stop parsing on that path, skip on B3 and B4 which immediately follow B2 and have no other > predecessors > Skip B3; // Was not merged. Assumed to be dead at this point > Skip B4; // Was not merged. Assumed to be dead at this point > Parse B5; // Discover jump to B3 -> merge B3. Will be processed but only in the next iteration since rpo of B2 is > smaller than the one of B5 > Parse E1; // Parse exception handler 1 at bci 69 > Parse E2; // Parse exception handler 2 at bci 93, apply gvn for phi > > // Next iteration in do_all_blocks() > Parse B3; // Is now merged and ready to be parsed. Has exception to E2: call merge_exception() -> merge_common() with E2 > as target and pnum > 1. We hit the assertion at [5] since we already applied a transformation for a phi in the last > iteration and therefore have a non-zero hash_lock. > > As a solution to this problem, I suggest to fix the wrong assumption by changing Parse::Block::init_graph() to also > count predecessors for exception blocks. This ensures that [4] is really the last merge for a phi. > > I did some additional performance testing with standard benchmarks and did not find any regressions. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1314 > [2] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1678 > [3] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1508 > [4] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1773 > [5] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1764 From john.r.rose at oracle.com Fri Jun 5 22:56:48 2020 From: john.r.rose at oracle.com (John Rose) Date: Fri, 5 Jun 2020 15:56:48 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <68611167-a277-75c9-8e74-c173502ddeae@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <68611167-a277-75c9-8e74-c173502ddeae@oracle.com> Message-ID: On Jun 5, 2020, at 10:54 AM, Vladimir Kozlov wrote: > > In original RFR, Roland, you showed that loop will be transformed into: > > for (long l = long_start; l < long_stop; ) { > int int_stride = (int)long_stride; > int int_stop = MIN(long_stop - l, max_jint - int_stride); > l += int_stop; > for (int i = 0; i < int_stop; i += int_stride) { > } > } That formula is out of date; there is a new one in the source code which is more accurate (see loopnode.cpp): If the loop has the shape of a counted loop but with a long induction variable, transform the loop in a loop nest: an inner loop that iterates for at most max int iterations with an integer induction variable and an outer loop that iterates over the full range of long values from the initial loop in (at most) max int steps. That is: L: for (long phi = init; phi < limit; phi += stride) { // phi := Phi(L, init, phi + stride) ? use phi and (phi + stride) ? } ==transform=> const long inner_iters_limit = INT_MAX - stride; assert(stride <= inner_iters_limit); // else deopt assert(limit + stride <= LONG_MAX); L1: for (long phi1 = init; phi1 < limit; phi1 += stride) { // phi1 := Phi(L1, init, phi1 + stride) long inner_iters_max = MAX(0, limit + stride - phi1); long inner_iters_actual = MIN(inner_iters_max, inner_iters_limit); L2: for (int phi2 = 0; phi2 < inner_iters_actual; phi2 += stride) { ? use (phi1 + phi2) and (phi1 + phi2 + stride) ? } } > > My question is about 'l += int_stop'. Do you optimize only loops which does not have any references to 'l' inside loop? > Will you process expressions like (l + i) in inner loops? So to answer your questions, occurrences of the long tripcount ?phi? are replaced by a sum of a long (inner loop invariant) and an int (inner trip count), as ?phi1+phi2?. The expression (l + i) should transform also, since the ?l? part is just the original ?phi?. ? John From dean.long at oracle.com Sat Jun 6 09:33:08 2020 From: dean.long at oracle.com (Dean Long) Date: Sat, 6 Jun 2020 02:33:08 -0700 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> Message-ID: I found a problem.? You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction. dl On 6/4/20 12:03 AM, Yudi Zheng wrote: > I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge. > > -Yudi > >> On 4 Jun 2020, at 01:22, Dean Long wrote: >> >> Does this require recent Graal change in order to work correctly? >> >> dl >> >> On 6/3/20 3:47 PM, Dean Long wrote: >>> Hi Yudi. I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails. What testing did you do? >>> >>> dl >>> >>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>> Hello, >>>> >>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. >>>> >>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>> >>>> Many thanks, >>>> Yudi From dean.long at oracle.com Sun Jun 7 06:49:27 2020 From: dean.long at oracle.com (Dean Long) Date: Sat, 6 Jun 2020 23:49:27 -0700 Subject: RFR(trivial) 8246719: remove LambdaStableNameTest from problem list Message-ID: https://bugs.openjdk.java.net/browse/JDK-8246719 http://cr.openjdk.java.net/~dlong/8246719/ This trivial change removes LambdaStableNameTest from the problem list now that it has been fixed by the last Graal update. dl From yudi.zheng at oracle.com Sun Jun 7 20:06:52 2020 From: yudi.zheng at oracle.com (Yudi Zheng) Date: Sun, 7 Jun 2020 22:06:52 +0200 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> Message-ID: <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> Thanks Dean! Here is a revision including your suggestion: http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/ -Yudi > On 6 Jun 2020, at 11:33, Dean Long wrote: > > I found a problem. You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction. > > dl > > On 6/4/20 12:03 AM, Yudi Zheng wrote: >> I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge. >> >> -Yudi >> >>> On 4 Jun 2020, at 01:22, Dean Long wrote: >>> >>> Does this require recent Graal change in order to work correctly? >>> >>> dl >>> >>> On 6/3/20 3:47 PM, Dean Long wrote: >>>> Hi Yudi. I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails. What testing did you do? >>>> >>>> dl >>>> >>>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>>> Hello, >>>>> >>>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. >>>>> >>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>>> >>>>> Many thanks, >>>>> Yudi > From dean.long at oracle.com Sun Jun 7 21:14:27 2020 From: dean.long at oracle.com (Dean Long) Date: Sun, 7 Jun 2020 14:14:27 -0700 Subject: RFR: 8246347: [JVMCI] Set is_method_handle_invoke flag accordingly when describing scope in jvmciCodeInstaller In-Reply-To: <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> Message-ID: <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com> Looks good! dl On 6/7/20 1:06 PM, Yudi Zheng wrote: > Thanks Dean! > Here is a revision including your suggestion: > http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/ > > -Yudi > >> On 6 Jun 2020, at 11:33, Dean Long > > wrote: >> >> I found a problem.? You need to make >> CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by >> adding the JVMCI logic that looks backwards by the size of the call >> instruction. >> >> dl >> >> On 6/4/20 12:03 AM, Yudi Zheng wrote: >>> I did not push this yet. It might require changes on the Graal side. >>> I am still thinking about how to merge. >>> >>> -Yudi >>> >>>> On 4 Jun 2020, at 01:22, Dean Long >>> > wrote: >>>> >>>> Does this require recent Graal change in order to work correctly? >>>> >>>> dl >>>> >>>> On 6/3/20 3:47 PM, Dean Long wrote: >>>>> Hi Yudi. ?I'm seeing an assert in >>>>> test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. >>>>> Let me remove my changes and see if it still fails. ?What testing >>>>> did you do? >>>>> >>>>> dl >>>>> >>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>>>> Hello, >>>>>> >>>>>> Please review this patch that sets is_method_handle_invoke flag >>>>>> accordingly when describing scope at call site in jvmciCodeInstaller. >>>>>> >>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>>>> >>>>>> Many thanks, >>>>>> Yudi >> > From christian.hagedorn at oracle.com Mon Jun 8 06:48:18 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 8 Jun 2020 08:48:18 +0200 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" In-Reply-To: <709b0894-4c30-71ed-4025-a58c4d89fd59@oracle.com> References: <709b0894-4c30-71ed-4025-a58c4d89fd59@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 05.06.20 19:56, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/5/20 8:58 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8244719 >> http://cr.openjdk.java.net/~chagedorn/8244719/webrev.00/ >> >> The assertion failure at [5] can be traced back to a wrong assumption >> made in Parse::Block::init_graph(). It explicitly states in a comment >> there that we never call next_path_num() along exception paths [1]. >> But it turns out that this is only true for bytecode generated by >> Javac which does not seem to produce bytecode where an exception >> handler is reached by an explicit jump or "fall through" bytecode. An >> exception handler is only reached with an athrow. >> >> However, it is possible to break that assumption with some custom >> bytecode. The jasm testcase generates such a valid bytecode sequence >> where an exception handler is reached by jumps from another exception >> handler: >> >> ?????? 69: astore_1 >> ?????? 70: aload_1 >> ?????? 71: aload_0 >> ?????? 72: getfield????? #5????????????????? // Field loopCounter:I >> ?????? 75: bipush??????? 10 >> ?????? 77: if_icmpge???? 93 // Explicit jump to exception handler, >> non-Javac >> ?????? 90: goto????????? 93 // Explicit jump to exception handler, >> non-Javac >> ?????? 93: astore_1 >> ?????? 94: return >> ???? Exception table: >> ??????? from??? to? target type >> ??????????? 0??? 66??? 69?? Class java/lang/RuntimeException >> ??????????? 0??? 66??? 93?? Class java/lang/Throwable >> >> This means that the first time Parse::merge_exception() is called for >> the exception handler block at bci 93, pnum is set to 3 since there >> are 2 predecessors (2 jumps to it). In the very first call to >> merge_common(), is_merged() is still false and we record a state. All >> following calls to merge_common() for this exception block will take >> the else case [2]. Once we are processing the blocks for the exception >> handler at bci 69, we call merge() (and therefore next_path_num()) in >> do_one_block() [3] twice with target_bci = 93 (2 jumps to bci 93). The >> last time with pnum = 1 for bci 90: goto and we transform the phi with >> gvn and set the hash_lock for it to 1 at [4]. >> >> Now comes a second bytecode modification trick where we first hit a >> trap while parsing a block in do_all_blocks(). Therefore, all >> successor blocks on that path are not merged and skipped in the first >> iteration of the loop in do_all_blocks() (at this point these blocks >> seem to be dead). But later we can have a jump back to such a >> seemingly dead block again. Those are then processed in the second >> iteration of the loop in do_all_blocks(). If one of these blocks now >> additionally throw an exception, we can hit this assertion failure. An >> example could look as follows: >> >> Example: >> // First iteration in do_all_blocks() >> Parse B1; >> Parse B2; // Hit trap. Stop parsing on that path, skip on B3 and B4 >> which immediately follow B2 and have no other predecessors >> Skip B3; // Was not merged. Assumed to be dead at this point >> Skip B4; // Was not merged. Assumed to be dead at this point >> Parse B5; // Discover jump to B3 -> merge B3. Will be processed but >> only in the next iteration since rpo of B2 is smaller than the one of B5 >> Parse E1; // Parse exception handler 1 at bci 69 >> Parse E2; // Parse exception handler 2 at bci 93, apply gvn for phi >> >> // Next iteration in do_all_blocks() >> Parse B3; // Is now merged and ready to be parsed. Has exception to >> E2: call merge_exception() -> merge_common() with E2 as target and >> pnum > 1. We hit the assertion at [5] since we already applied a >> transformation for a phi in the last iteration and therefore have a >> non-zero hash_lock. >> >> As a solution to this problem, I suggest to fix the wrong assumption >> by changing Parse::Block::init_graph() to also count predecessors for >> exception blocks. This ensures that [4] is really the last merge for a >> phi. >> >> I did some additional performance testing with standard benchmarks and >> did not find any regressions. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1314 >> >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1678 >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1508 >> >> [4] >> http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1773 >> >> [5] >> http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1764 >> From christian.hagedorn at oracle.com Mon Jun 8 07:34:12 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 8 Jun 2020 09:34:12 +0200 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> Message-ID: <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> Hi Serguei Thanks for fixing this. I don't have official reviewer status but the changes look good to me. As we've already discussed, this does not fix JDK-8245128, unfortunately. Best regards, Christian On 04.06.20 01:05, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > Thank you a lot for the review! > I hope, Christian will have a chance to look at it. > > Thanks, > Serguei > > > On 6/3/20 14:56, Dean Long wrote: >> Hi Serguei, I like the latest changes so that JVMCI matches C2. Please >> get another review because this is not a trivial change. >> >> dl >> >> On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: >>> Hi Dean, >>> >>> The updated webrev is: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ >>> >>> Probably, the JVMCI part can be simplified. >>> Only the compile_state line has to be moved up: >>> + JVMCICompileState compile_state(task); >>> // Skip redefined methods >>> - if (target_handle->is_old()) { >>> + if (compile_state.target_method_is_old()) { >>> failure_reason = "redefined method"; >>> retry_message = "not retryable"; >>> compilable = ciEnv::MethodCompilable_never; >>> } else { >>> - JVMCICompileState compile_state(task); >>> Fixes in the jvmciEnv.?pp are not really needed >>> >>> Please, let me know what do you think. >>> >>> This version does not fail at all (in 300 runs for both C2 and JVMCI). >>> It seems, other two issues disappeared as well: >>> >>> This was seen with the C2: >>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>> >>> This was seen with the JVMCI: >>> https://bugs.openjdk.java.net/browse/JDK-8245446 >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> Thank you for the reply. >>>> >>>> The problem is I do not fully understand your suggestion, especially >>>> the part >>>> about caching the method,is_old() value in the cache_jvmti_state(). >>>> >>>> This is a preliminary webrev where I tried to implement your suggestion: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >>>> >>>> This variant is failing in half of test runs for both C1/C2 and JVMCI. >>>> I think, the root cause is a safepoint in a ThreadInVMfromNative >>>> desctructor. >>>> Here: >>>> ?232 void ciEnv::cache_jvmti_state() { >>>> ?233 VM_ENTRY_MARK; >>>> >>>> Then we check for the target_method_is_old() value which is not >>>> up-to-date any more. >>>> I feel, it was correct and more simple before introducing this approach. >>>> Probably, I'm missing something here. >>>> >>>> >>>> I also have a question about the update fragment: >>>> 1696 { >>>> 1697 // Must switch to native to allocate ci_env >>>> 1698 ThreadToNativeFromVM ttn(thread); >>>> 1699 ciEnv ci_env((CompileTask*)NULL); >>>> 1700 >>>> 1701 // Switch back to VM state to do compiler initialization >>>> 1702 ThreadInVMfromNative tv(thread); >>>> 1703 ResetNoHandleMark rnhm; >>>> 1704 >>>> 1705 // Perform per-thread and global initializations >>>> 1706 comp->initialize(); >>>> 1707 } >>>> Can we remove the ciEnv object initialization above with the state >>>> transitions? >>>> Or it has some side effects? >>>> >>>> Please, let me know what you think. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/1/20 15:10, Dean Long wrote: >>>>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dean, >>>>>> >>>>>> To check the is_old as you suggest the target method has to be passed >>>>>> to the cache_jvmti_state() as argument. Is it what you are suggesting? >>>>> >>>>> I believe you can use use _task->method()->is_old(), as the ciEnv >>>>> already has the task. >>>>> >>>>>> Just want to make sure I understand you correctly. >>>>>> >>>>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>>>> CompileTask >>>>>> which looks unnecessary (or I don't understand it): >>>>>> >>>>>> bool CompileBroker::init_compiler_runtime() { >>>>>> ? CompilerThread* thread = CompilerThread::current(); >>>>>> ? . . . >>>>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>>>> ??? // Cache Jvmti state >>>>>> ??? ci_env.cache_jvmti_state(); >>>>>> ??? // Cache DTrace flags >>>>>> ??? ci_env.cache_dtrace_flags(); >>>>>> >>>>> >>>>> These calls look unnecessary to me, as the ci_env will cache these >>>>> again before compiling a method. >>>>> I suggest removing these calls.? We should make sure the cache >>>>> fields are initialized to sane values >>>>> in the ciEnv ctor. >>>>> >>>>>> The JVMCI has a separate implementation for ciEnv which is >>>>>> jvmciEnv and >>>>>> its own set of cache_jvmti_state() and jvmti_state_changed() >>>>>> functions. >>>>>> Both are not called in the JVMCI case. >>>>>> So, these checks look as broken in JVMCI now. >>>>>> >>>>> JVMCI is in better shape, because it doesn't transition out of >>>>> _thread_in_vm state, >>>>> but yes it needs similar changes. >>>>> >>>>>> Not sure, I have enough compiler knowledge to fix this at this >>>>>> stage of release. >>>>>> Would it better to file a separate hotspot/compiler RFE targeted >>>>>> to 16? >>>>>> It can be assigned to me if it helps. >>>>>> >>>>> >>>>> This is a P3 so I believe we have time to fix it for 15. Please go >>>>> ahead and let's see if >>>>> we can get it in.? I can help with the JVMCI changes if they are >>>>> not straightforward. >>>>> >>>>> dl >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 5/28/20 10:54, Dean Long wrote: >>>>>>> Sure, you could just have cache_jvmti_state() return a boolean to >>>>>>> bail out immediately for is_old. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi Dean, >>>>>>>> >>>>>>>> Thank you for looking at this! >>>>>>>> Okay. Let me check what cab be done in this direction. >>>>>>>> There is no point to cache is_old. The compilation has to bail >>>>>>>> out if it is discovered to be true. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>>>> This seems OK as long as the memory barriers in the thread >>>>>>>>> state transitions prevent the C++ compiler from doing something >>>>>>>>> like reading is_old before reading redefinition_count.? I would >>>>>>>>> feel better if both JVMCI and C1/C2 cached is_old and >>>>>>>>> redefinition_count at the same time (making sure to be in the >>>>>>>>> _thread_in_vm state), then bail out based on the cached value >>>>>>>>> of is_old. >>>>>>>>> >>>>>>>>> dl >>>>>>>>> >>>>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Please, review a fix for: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Summary: >>>>>>>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>>>>>>> enabled does >>>>>>>>>>> ? a lot of class retransformations in parallel with all other >>>>>>>>>>> stressing. >>>>>>>>>>> ? It provokes the assert at the compiled code installation time: >>>>>>>>>>> ??? assert(!method->is_old()) failed: Should not be >>>>>>>>>>> installing old methods >>>>>>>>>>> >>>>>>>>>>> ? The problem is that the >>>>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>>>> exists in the JVMCI >>>>>>>>>>> ? part of implementation: >>>>>>>>>>> 2148 // Skip redefined methods >>>>>>>>>>> 2149 if (target_handle->is_old()) { >>>>>>>>>>> 2150 failure_reason = "redefined method"; >>>>>>>>>>> 2151 retry_message = "not retryable"; >>>>>>>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>>>>>>> 2153 } else { >>>>>>>>>>> . . . >>>>>>>>>>> 2168 } >>>>>>>>>>> >>>>>>>>>>> ? The fix is to add this check. >>>>>>>>>> >>>>>>>>>> Sorry, forgot to explain one thing. >>>>>>>>>> Compiler code has a special mechanism to ensure the JVMTI >>>>>>>>>> class redefinition did >>>>>>>>>> not happen while the method was compiled, so all the >>>>>>>>>> assumptions remain correct. >>>>>>>>>> 2190 // Cache Jvmti state >>>>>>>>>> 2191 ci_env.cache_jvmti_state(); >>>>>>>>>> Part of this is a check that the value of >>>>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>>>> class redefinition >>>>>>>>>> happened which also implies some of methods may become old. >>>>>>>>>> However, the method being compiled can be already old at the >>>>>>>>>> point where the >>>>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>>>> check does not help much. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>>> Testing: >>>>>>>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>>>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>>>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>>>>>>> Will also submit hs tiers1-5. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > From rwestrel at redhat.com Mon Jun 8 08:09:21 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 08 Jun 2020 10:09:21 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <68611167-a277-75c9-8e74-c173502ddeae@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <68611167-a277-75c9-8e74-c173502ddeae@oracle.com> Message-ID: <87r1uqx9la.fsf@redhat.com> Hi Vladimir, > First, I think this change should wait until JDK 16 (which will be very soon). > And I need more time to look through changes. Sure. That's fine with me. Actually, I will have to rework the change quite a bit because of one of the test failures that Tobias found. Roland. From serguei.spitsyn at oracle.com Mon Jun 8 08:59:25 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jun 2020 01:59:25 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> Message-ID: <3d648db5-293f-595c-3f1b-b361080207e7@oracle.com> Thank you a lot for review, Christian! Serguei On 6/8/20 00:34, Christian Hagedorn wrote: > Hi Serguei > > Thanks for fixing this. I don't have official reviewer status but the > changes look good to me. > > As we've already discussed, this does not fix JDK-8245128, unfortunately. > > Best regards, > Christian > > On 04.06.20 01:05, serguei.spitsyn at oracle.com wrote: >> Hi Dean, >> >> Thank you a lot for the review! >> I hope, Christian will have a chance to look at it. >> >> Thanks, >> Serguei >> >> >> On 6/3/20 14:56, Dean Long wrote: >>> Hi Serguei, I like the latest changes so that JVMCI matches C2. >>> Please get another review because this is not a trivial change. >>> >>> dl >>> >>> On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> The updated webrev is: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ >>>> >>>> Probably, the JVMCI part can be simplified. >>>> Only the compile_state line has to be moved up: >>>> + JVMCICompileState compile_state(task); >>>> ????? // Skip redefined methods >>>> - if (target_handle->is_old()) { >>>> + if (compile_state.target_method_is_old()) { >>>> ??????? failure_reason = "redefined method"; >>>> ??????? retry_message = "not retryable"; >>>> ??????? compilable = ciEnv::MethodCompilable_never; >>>> ????? } else { >>>> - JVMCICompileState compile_state(task); >>>> Fixes in the jvmciEnv.?pp are not really needed >>>> >>>> Please, let me know what do you think. >>>> >>>> This version does not fail at all (in 300 runs for both C2 and JVMCI). >>>> It seems, other two issues disappeared as well: >>>> >>>> This was seen with the C2: >>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>> >>>> This was seen with the JVMCI: >>>> https://bugs.openjdk.java.net/browse/JDK-8245446 >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >>>>> Hi Dean, >>>>> >>>>> Thank you for the reply. >>>>> >>>>> The problem is I do not fully understand your suggestion, >>>>> especially the part >>>>> about caching the method,is_old() value in the cache_jvmti_state(). >>>>> >>>>> This is a preliminary webrev where I tried to implement your >>>>> suggestion: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >>>>> >>>>> This variant is failing in half of test runs for both C1/C2 and >>>>> JVMCI. >>>>> I think, the root cause is a safepoint in a ThreadInVMfromNative >>>>> desctructor. >>>>> Here: >>>>> ?232 void ciEnv::cache_jvmti_state() { >>>>> ?233 VM_ENTRY_MARK; >>>>> >>>>> Then we check for the target_method_is_old() value which is not >>>>> up-to-date any more. >>>>> I feel, it was correct and more simple before introducing this >>>>> approach. >>>>> Probably, I'm missing something here. >>>>> >>>>> >>>>> I also have a question about the update fragment: >>>>> 1696?? { >>>>> 1697???? // Must switch to native to allocate ci_env >>>>> 1698???? ThreadToNativeFromVM ttn(thread); >>>>> 1699???? ciEnv ci_env((CompileTask*)NULL); >>>>> 1700 >>>>> 1701???? // Switch back to VM state to do compiler initialization >>>>> 1702???? ThreadInVMfromNative tv(thread); >>>>> 1703???? ResetNoHandleMark rnhm; >>>>> 1704 >>>>> 1705???? // Perform per-thread and global initializations >>>>> 1706???? comp->initialize(); >>>>> 1707?? } >>>>> Can we remove the ciEnv object initialization above with the state >>>>> transitions? >>>>> Or it has some side effects? >>>>> >>>>> Please, let me know what you think. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/1/20 15:10, Dean Long wrote: >>>>>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>> To check the is_old as you suggest the target method has to be >>>>>>> passed >>>>>>> to the cache_jvmti_state() as argument. Is it what you are >>>>>>> suggesting? >>>>>> >>>>>> I believe you can use use _task->method()->is_old(), as the ciEnv >>>>>> already has the task. >>>>>> >>>>>>> Just want to make sure I understand you correctly. >>>>>>> >>>>>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>>>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>>>>> CompileTask >>>>>>> which looks unnecessary (or I don't understand it): >>>>>>> >>>>>>> bool CompileBroker::init_compiler_runtime() { >>>>>>> ? CompilerThread* thread = CompilerThread::current(); >>>>>>> ? . . . >>>>>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>>>>> ??? // Cache Jvmti state >>>>>>> ??? ci_env.cache_jvmti_state(); >>>>>>> ??? // Cache DTrace flags >>>>>>> ??? ci_env.cache_dtrace_flags(); >>>>>>> >>>>>> >>>>>> These calls look unnecessary to me, as the ci_env will cache >>>>>> these again before compiling a method. >>>>>> I suggest removing these calls.? We should make sure the cache >>>>>> fields are initialized to sane values >>>>>> in the ciEnv ctor. >>>>>> >>>>>>> The JVMCI has a separate implementation for ciEnv which is >>>>>>> jvmciEnv and >>>>>>> its own set of cache_jvmti_state() and jvmti_state_changed() >>>>>>> functions. >>>>>>> Both are not called in the JVMCI case. >>>>>>> So, these checks look as broken in JVMCI now. >>>>>>> >>>>>> JVMCI is in better shape, because it doesn't transition out of >>>>>> _thread_in_vm state, >>>>>> but yes it needs similar changes. >>>>>> >>>>>>> Not sure, I have enough compiler knowledge to fix this at this >>>>>>> stage of release. >>>>>>> Would it better to file a separate hotspot/compiler RFE targeted >>>>>>> to 16? >>>>>>> It can be assigned to me if it helps. >>>>>>> >>>>>> >>>>>> This is a P3 so I believe we have time to fix it for 15. Please >>>>>> go ahead and let's see if >>>>>> we can get it in.? I can help with the JVMCI changes if they are >>>>>> not straightforward. >>>>>> >>>>>> dl >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 5/28/20 10:54, Dean Long wrote: >>>>>>>> Sure, you could just have cache_jvmti_state() return a boolean >>>>>>>> to bail out immediately for is_old. >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Dean, >>>>>>>>> >>>>>>>>> Thank you for looking at this! >>>>>>>>> Okay. Let me check what cab be done in this direction. >>>>>>>>> There is no point to cache is_old. The compilation has to bail >>>>>>>>> out if it is discovered to be true. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>>>>> This seems OK as long as the memory barriers in the thread >>>>>>>>>> state transitions prevent the C++ compiler from doing >>>>>>>>>> something like reading is_old before reading >>>>>>>>>> redefinition_count.? I would feel better if both JVMCI and >>>>>>>>>> C1/C2 cached is_old and redefinition_count at the same time >>>>>>>>>> (making sure to be in the _thread_in_vm state), then bail out >>>>>>>>>> based on the cached value of is_old. >>>>>>>>>> >>>>>>>>>> dl >>>>>>>>>> >>>>>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>>>>> >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Summary: >>>>>>>>>>>> ? The Kitchensink stress test with the Instrumentation >>>>>>>>>>>> module enabled does >>>>>>>>>>>> ? a lot of class retransformations in parallel with all >>>>>>>>>>>> other stressing. >>>>>>>>>>>> ? It provokes the assert at the compiled code installation >>>>>>>>>>>> time: >>>>>>>>>>>> ??? assert(!method->is_old()) failed: Should not be >>>>>>>>>>>> installing old methods >>>>>>>>>>>> >>>>>>>>>>>> ? The problem is that the >>>>>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>>>>> exists in the JVMCI >>>>>>>>>>>> ? part of implementation: >>>>>>>>>>>> 2148???? // Skip redefined methods >>>>>>>>>>>> 2149???? if (target_handle->is_old()) { >>>>>>>>>>>> 2150?????? failure_reason = "redefined method"; >>>>>>>>>>>> 2151?????? retry_message = "not retryable"; >>>>>>>>>>>> 2152?????? compilable = ciEnv::MethodCompilable_never; >>>>>>>>>>>> 2153???? } else { >>>>>>>>>>>> . . . >>>>>>>>>>>> 2168???? } >>>>>>>>>>>> >>>>>>>>>>>> ?? The fix is to add this check. >>>>>>>>>>> >>>>>>>>>>> Sorry, forgot to explain one thing. >>>>>>>>>>> Compiler code has a special mechanism to ensure the JVMTI >>>>>>>>>>> class redefinition did >>>>>>>>>>> not happen while the method was compiled, so all the >>>>>>>>>>> assumptions remain correct. >>>>>>>>>>> ?? 2190???? // Cache Jvmti state >>>>>>>>>>> ?? 2191???? ci_env.cache_jvmti_state(); >>>>>>>>>>> Part of this is a check that the value of >>>>>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>>>>> class redefinition >>>>>>>>>>> happened which also implies some of methods may become old. >>>>>>>>>>> However, the method being compiled can be already old at the >>>>>>>>>>> point where the >>>>>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>>>>> check does not help much. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>>> Testing: >>>>>>>>>>>> ?? Ran Kitchensink test with the Instrumentation module >>>>>>>>>>>> enabled in mach5 >>>>>>>>>>>> ? ?multiple times for 100 times. Without the fix the test >>>>>>>>>>>> normally fails >>>>>>>>>>>> ?? a couple of times in 200 runs. It does not fail with the >>>>>>>>>>>> fix anymore. >>>>>>>>>>>> ?? Will also submit hs tiers1-5. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> From vladimir.kozlov at oracle.com Mon Jun 8 16:09:36 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 8 Jun 2020 09:09:36 -0700 Subject: RFR(trivial) 8246719: remove LambdaStableNameTest from problem list In-Reply-To: References: Message-ID: <3f144ed2-b40b-c5d7-50ab-ca8507f5e75f@oracle.com> Good. Thanks, Vladimir On 6/6/20 11:49 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8246719 > http://cr.openjdk.java.net/~dlong/8246719/ > > This trivial change removes LambdaStableNameTest from the problem list now that it has been fixed by the last Graal update. > > dl From dean.long at oracle.com Mon Jun 8 22:07:14 2020 From: dean.long at oracle.com (Dean Long) Date: Mon, 8 Jun 2020 15:07:14 -0700 Subject: RFR(trivial) 8246719: remove LambdaStableNameTest from problem list In-Reply-To: <3f144ed2-b40b-c5d7-50ab-ca8507f5e75f@oracle.com> References: <3f144ed2-b40b-c5d7-50ab-ca8507f5e75f@oracle.com> Message-ID: <9e143af7-0125-d37e-7635-43a70187b609@oracle.com> Thanks Vladimir. dl On 6/8/20 9:09 AM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/6/20 11:49 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8246719 >> http://cr.openjdk.java.net/~dlong/8246719/ >> >> This trivial change removes LambdaStableNameTest from the problem >> list now that it has been fixed by the last Graal update. >> >> dl From xxinliu at amazon.com Mon Jun 8 22:59:05 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 8 Jun 2020 22:59:05 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: References: Message-ID: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> Hi, Paul and Volker, Yes, it's misleading to set PrintIdealGraph true by default. I set it back and here is a new revision: http://cr.openjdk.java.net/~xliu/8139046/02/webrev/ PrintIdealGraphLevel=0 doesn't work because users might use compiler directive IGVPrintLevel > 0. I extent the range of PrintIdealGraphLevel a little bit. if PrintIdealGraphLevel=-1, it means user don't want to have any Ideal Graph dumped. I make -XX:-PrintIdealGraph a synonym of PrintIdealGraphLevel == -1. The feature is useful when developers start using a lot of compiler directives with IGVPrintLevel. Without it, developers have to modify many directives to turn off PrintIdealGraph. I still think PrintIdealGraph should go away because PrintIdealGraphLevel can do better job. As Paul suggested, PrintIdealGraph has been wiped out from share/opto directory. Here is the new comment of PrintIdealGraphLevel. intx PrintIdealGraphLevel = 0 {C2 notproduct} {default} Level of detail of the ideal graph printout. System-wide value, -1=absolutely nothing is printed, 0=nothing except IGVPrintLevel directives, 4=all details printed. Level of detail of printouts can be set on a per-method level as well by using CompileCommand=option. Thanks, --lx ?On 6/5/20, 6:38 AM, "Hohensee, Paul" wrote: Re PrintIdealGraph, I'd consider removing its remaining reference in compile.hpp in favor of 1. Leave the default value of PrintIdealGraph at false. It's confusing to see it set true by default. 2. Make -XX:-PrintIdealGraph a synonym for PrintIdealGraphLevel == 0. I.e., set PrintIdealGraphLevel to 0 in ergo_initialize() if PrintIdealGraph was explicitly set false on the command line. Use if (FLAG_IS_CMDLINE(PrintIdealGraph) && !PrintIdealGraph)) { FLAG_SET_ERGO(PrintIdealGraphLevel, 0); } 3. Ignore PrintIdealGraph otherwise, which is what happens now (i.e., +PrintIdealGraph has no effect if PrintIdealGraphLevel == 0, which it is by default). Have should_print() in compile.hpp test for PrintIdealGraphLevel == 0 instead of !PrintIdealGraph. That way, a future deprecation/removal of PrintIdealGraph is isolated. And, you file an RFE to do that. Thanks, Paul On 6/4/20, 6:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Volker, Thank you to review it. - Why do you need the extra check for "method() != NULL": Yes, I try to avoid a corner case. Previously, _printer is set to NULL when c2 compiles runtime stubs(http://hg.openjdk.java.net/jdk/jdk/file/b06f452c8d61/src/hotspot/share/opto/compile.cpp#l825). That will cause a problem for my patch because I initialize _printer to NULL and create it when we do need to print IdealGraph. If users specify PrintIdealGraphLevel>0, C2 will try to dump a NULL method. It is undefined to invoke begin/end_method() if the C->method() is NULL, so it better guard with the NULL check. Here is the context: " Command Line: -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=hello.xml Host: ip-172-31-94-125, Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 48 cores, 184G, Ubuntu 18.04.4 LTS Time: Thu Jun 4 23:25:00 2020 UTC elapsed time: 0.150427 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x00007f69a83f3150): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=32990, stack(0x00007f692bc8b000,0x00007f692bd8c000)] Stack: [0x00007f692bc8b000,0x00007f692bd8c000], sp=0x00007f692bd893c0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xc17c94] IdealGraphPrinter::begin_method()+0x434 V [libjvm.so+0x88e5b1] CompileWrapper::CompileWrapper(Compile*)+0x201 V [libjvm.so+0x89f689] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool, DirectiveSet*)+0x4c9 V [libjvm.so+0x1425586] OptoRuntime::generate_stub(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool)+0x156 " I took all your advices except for that. Could you take a look at the new revision: http://cr.openjdk.java.net/~xliu/8139046/01/webrev/ To get rid of the param 'bool need', I set PrintIdealGraph true by default. Previously, if PrintIdealGraph is set, hotspot will initialize a printer object for every Compiler thread. It's not true anymore, so no extra cost. After this patch, the only usage of that flag is to shut down IGVPrinter completely by -XX:-PrintIdealGraph. I wish we can vote it out in the future and use PrintIdealGraphLevel. Thanks, --lx From: Volker Simonis Date: Thursday, June 4, 2020 at 10:13 AM To: "Liu, Xin" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL] RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: Hi, Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true, it guarantees that Compile::printer() is not NULL. If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", printer() will only turn on for that compiler thread. - Why do you need the extra check for "method() != NULL": 619 if (should_print(1) && method() != NULL) { 4584 if (should_print(level) && method() != NULL) { in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. - Why did you make 453 IdealGraphPrinter* printer() { return _printer; } a "const" function? 453 IdealGraphPrinter* printer() const { return _printer; } I don't think it is required? - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. Besides that, your change looks good. Thank you and best regards, Volker Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. Error occurred during initialization of VM Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference Thanks, --lx From hohensee at amazon.com Tue Jun 9 14:36:32 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 9 Jun 2020 14:36:32 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> References: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> Message-ID: Looks good for my part. Thanks, Paul ?On 6/8/20, 3:59 PM, "Liu, Xin" wrote: Hi, Paul and Volker, Yes, it's misleading to set PrintIdealGraph true by default. I set it back and here is a new revision: http://cr.openjdk.java.net/~xliu/8139046/02/webrev/ PrintIdealGraphLevel=0 doesn't work because users might use compiler directive IGVPrintLevel > 0. I extent the range of PrintIdealGraphLevel a little bit. if PrintIdealGraphLevel=-1, it means user don't want to have any Ideal Graph dumped. I make -XX:-PrintIdealGraph a synonym of PrintIdealGraphLevel == -1. The feature is useful when developers start using a lot of compiler directives with IGVPrintLevel. Without it, developers have to modify many directives to turn off PrintIdealGraph. I still think PrintIdealGraph should go away because PrintIdealGraphLevel can do better job. As Paul suggested, PrintIdealGraph has been wiped out from share/opto directory. Here is the new comment of PrintIdealGraphLevel. intx PrintIdealGraphLevel = 0 {C2 notproduct} {default} Level of detail of the ideal graph printout. System-wide value, -1=absolutely nothing is printed, 0=nothing except IGVPrintLevel directives, 4=all details printed. Level of detail of printouts can be set on a per-method level as well by using CompileCommand=option. Thanks, --lx On 6/5/20, 6:38 AM, "Hohensee, Paul" wrote: Re PrintIdealGraph, I'd consider removing its remaining reference in compile.hpp in favor of 1. Leave the default value of PrintIdealGraph at false. It's confusing to see it set true by default. 2. Make -XX:-PrintIdealGraph a synonym for PrintIdealGraphLevel == 0. I.e., set PrintIdealGraphLevel to 0 in ergo_initialize() if PrintIdealGraph was explicitly set false on the command line. Use if (FLAG_IS_CMDLINE(PrintIdealGraph) && !PrintIdealGraph)) { FLAG_SET_ERGO(PrintIdealGraphLevel, 0); } 3. Ignore PrintIdealGraph otherwise, which is what happens now (i.e., +PrintIdealGraph has no effect if PrintIdealGraphLevel == 0, which it is by default). Have should_print() in compile.hpp test for PrintIdealGraphLevel == 0 instead of !PrintIdealGraph. That way, a future deprecation/removal of PrintIdealGraph is isolated. And, you file an RFE to do that. Thanks, Paul On 6/4/20, 6:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Volker, Thank you to review it. - Why do you need the extra check for "method() != NULL": Yes, I try to avoid a corner case. Previously, _printer is set to NULL when c2 compiles runtime stubs(http://hg.openjdk.java.net/jdk/jdk/file/b06f452c8d61/src/hotspot/share/opto/compile.cpp#l825). That will cause a problem for my patch because I initialize _printer to NULL and create it when we do need to print IdealGraph. If users specify PrintIdealGraphLevel>0, C2 will try to dump a NULL method. It is undefined to invoke begin/end_method() if the C->method() is NULL, so it better guard with the NULL check. Here is the context: " Command Line: -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=hello.xml Host: ip-172-31-94-125, Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 48 cores, 184G, Ubuntu 18.04.4 LTS Time: Thu Jun 4 23:25:00 2020 UTC elapsed time: 0.150427 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x00007f69a83f3150): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=32990, stack(0x00007f692bc8b000,0x00007f692bd8c000)] Stack: [0x00007f692bc8b000,0x00007f692bd8c000], sp=0x00007f692bd893c0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xc17c94] IdealGraphPrinter::begin_method()+0x434 V [libjvm.so+0x88e5b1] CompileWrapper::CompileWrapper(Compile*)+0x201 V [libjvm.so+0x89f689] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool, DirectiveSet*)+0x4c9 V [libjvm.so+0x1425586] OptoRuntime::generate_stub(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool)+0x156 " I took all your advices except for that. Could you take a look at the new revision: http://cr.openjdk.java.net/~xliu/8139046/01/webrev/ To get rid of the param 'bool need', I set PrintIdealGraph true by default. Previously, if PrintIdealGraph is set, hotspot will initialize a printer object for every Compiler thread. It's not true anymore, so no extra cost. After this patch, the only usage of that flag is to shut down IGVPrinter completely by -XX:-PrintIdealGraph. I wish we can vote it out in the future and use PrintIdealGraphLevel. Thanks, --lx From: Volker Simonis Date: Thursday, June 4, 2020 at 10:13 AM To: "Liu, Xin" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL] RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: Hi, Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true, it guarantees that Compile::printer() is not NULL. If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", printer() will only turn on for that compiler thread. - Why do you need the extra check for "method() != NULL": 619 if (should_print(1) && method() != NULL) { 4584 if (should_print(level) && method() != NULL) { in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. - Why did you make 453 IdealGraphPrinter* printer() { return _printer; } a "const" function? 453 IdealGraphPrinter* printer() const { return _printer; } I don't think it is required? - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. Besides that, your change looks good. Thank you and best regards, Volker Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. Error occurred during initialization of VM Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference Thanks, --lx From christian.hagedorn at oracle.com Tue Jun 9 15:26:22 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 9 Jun 2020 17:26:22 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8246203 http://cr.openjdk.java.net/~chagedorn/8246203/webrev.00/ The testcase creates a deep graph with a lot of nodes on a chain. When running with -XX:+VerifyIterativeGVN, it recursively calls Node::verify_recur() for each input node discovered which eventually results in a segmentation fault due to a stack overflow (around 10000 recursive calls due to such a long chain of nodes). The fix just converts the recursive algorithm into an iterative one to avoid a segmentation fault. Thank you! Best regards, Christian From vladimir.kozlov at oracle.com Tue Jun 9 16:54:56 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Jun 2020 09:54:56 -0700 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: Message-ID: I think the check should be 'verify_depth > 0' because verify_depth can be negative: + verify_depth--; // Visiting the first node on depth 1 + bool add_to_worklist = verify_depth != 0; Or it is intentional for negative value to visit all nodes? Then it needs comment. In such case you need restore verify_depth == 0 check to return otherwise with 0 the code will work as with negative value: if (verify_depth == 0) { return; } bool add_to_worklist = true; Or may be use assert(verify_depth != 0, "sanity") instead of check. Thanks, Vladimir On 6/9/20 8:26 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8246203 > http://cr.openjdk.java.net/~chagedorn/8246203/webrev.00/ > > The testcase creates a deep graph with a lot of nodes on a chain. When running with -XX:+VerifyIterativeGVN, it > recursively calls Node::verify_recur() for each input node discovered which eventually results in a segmentation fault > due to a stack overflow (around 10000 recursive calls due to such a long chain of nodes). The fix just converts the > recursive algorithm into an iterative one to avoid a segmentation fault. > > Thank you! > > Best regards, > Christian From christian.hagedorn at oracle.com Tue Jun 9 17:32:39 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 9 Jun 2020 19:32:39 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: Message-ID: Hi Vladimir On 09.06.20 18:54, Vladimir Kozlov wrote: > I think the check should be 'verify_depth > 0' because verify_depth can > be negative: > > +? verify_depth--; // Visiting the first node on depth 1 > +? bool add_to_worklist = verify_depth != 0; > > Or it is intentional for negative value to visit all nodes? Then it > needs comment. Yes, with negative values it visit all nodes. There is a comment about it above: 2155 // Verify all nodes if verify_depth is negative 2156 void Node::verify(Node* n, int verify_depth) { But maybe I should add another comment for add_to_worklist as well to make it more clear. > In such case you need restore verify_depth == 0 check to return > otherwise with 0 the code will work as with negative value: > ? if (verify_depth == 0) { > ??? return; > ? } > ? bool add_to_worklist = true; > > Or may be use assert(verify_depth != 0, "sanity") instead of check. I like the solution of an assert. I added it at the start of the method together with the additional comment in a new webrev. It only initializes add_to_worklist with false if verify() is called with verify_depth = 1. http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ Best regards, Christian > On 6/9/20 8:26 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8246203 >> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.00/ >> >> The testcase creates a deep graph with a lot of nodes on a chain. When >> running with -XX:+VerifyIterativeGVN, it recursively calls >> Node::verify_recur() for each input node discovered which eventually >> results in a segmentation fault due to a stack overflow (around 10000 >> recursive calls due to such a long chain of nodes). The fix just >> converts the recursive algorithm into an iterative one to avoid a >> segmentation fault. >> >> Thank you! >> >> Best regards, >> Christian From vladimir.kozlov at oracle.com Tue Jun 9 17:54:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Jun 2020 10:54:34 -0700 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: Message-ID: <1dd809b6-ad6d-dfbf-9c9c-e83804197a75@oracle.com> Good. Thanks, Vladimir On 6/9/20 10:32 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 09.06.20 18:54, Vladimir Kozlov wrote: >> I think the check should be 'verify_depth > 0' because verify_depth can be negative: >> >> +? verify_depth--; // Visiting the first node on depth 1 >> +? bool add_to_worklist = verify_depth != 0; >> >> Or it is intentional for negative value to visit all nodes? Then it needs comment. > > Yes, with negative values it visit all nodes. There is a comment about it above: > > 2155 // Verify all nodes if verify_depth is negative > 2156 void Node::verify(Node* n, int verify_depth) { > > But maybe I should add another comment for add_to_worklist as well to make it more clear. > >> In such case you need restore verify_depth == 0 check to return otherwise with 0 the code will work as with negative >> value: >> ?? if (verify_depth == 0) { >> ???? return; >> ?? } >> ?? bool add_to_worklist = true; >> >> Or may be use assert(verify_depth != 0, "sanity") instead of check. > > I like the solution of an assert. I added it at the start of the method together with the additional comment in a new > webrev. It only initializes add_to_worklist with false if verify() is called with verify_depth = 1. > > http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ > > Best regards, > Christian > > >> On 6/9/20 8:26 AM, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8246203 >>> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.00/ >>> >>> The testcase creates a deep graph with a lot of nodes on a chain. When running with -XX:+VerifyIterativeGVN, it >>> recursively calls Node::verify_recur() for each input node discovered which eventually results in a segmentation >>> fault due to a stack overflow (around 10000 recursive calls due to such a long chain of nodes). The fix just converts >>> the recursive algorithm into an iterative one to avoid a segmentation fault. >>> >>> Thank you! >>> >>> Best regards, >>> Christian From maurizio.cimadamore at oracle.com Wed Jun 10 01:12:49 2020 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 10 Jun 2020 02:12:49 +0100 Subject: testing support loops with long (64b) trip counts Message-ID: Hi, first of all, thanks Roland for working on this; this has been a considerable bottleneck for the Panama Foreign Memory Access API over the last year or so. In the implementation of the API we are actually applying few tricks, so that we detect "small" segment at creation (e.g. segment whose size fits into an int). If that's the case, then we use some logic to perform int computations instead of long computations, which speeds up thing a bit. But this trick alone is not enough: it only works if the client writes the loop using an `int` loop variable; if a `long` variable is used, performance is significantly degraded. This is a pity, because the VarHandle used by the Foreign Memory Access API use long coordinates, so the client has to be very careful in casting the int variable back into a long (or an inexact var handle call will take place). I tried several combination w/ and w/o your patch, to see if any improvements were possible. I also tried, for each combo, to run with and without the small segment hack, to see the difference. The patch I applied on top of the Panama foreign-memaccess branch [1] can be found at [2]. Here are some results: baseline w/ workaround Benchmark????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstant.segment_loop?????????? avgt?? 30? 0.330 ? 0.015? ms/op LoopOverNonConstant.segment_loop_readonly? avgt?? 30? 0.354 ? 0.004? ms/op LoopOverNonConstant.segment_loop_slice???? avgt?? 30? 0.347 ? 0.006? ms/op Benchmark????????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstantLong.segment_loop?????????? avgt?? 30? 1.695 ? 0.061? ms/op LoopOverNonConstantLong.segment_loop_readonly? avgt?? 30? 1.660 ? 0.089? ms/op LoopOverNonConstantLong.segment_loop_slice???? avgt?? 30? 1.684 ? 0.057? ms/op baseline w/o workaround Benchmark????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstant.segment_loop?????????? avgt?? 30? 0.484 ? 0.034? ms/op LoopOverNonConstant.segment_loop_readonly? avgt?? 30? 0.502 ? 0.012? ms/op LoopOverNonConstant.segment_loop_slice???? avgt?? 30? 0.501 ? 0.012? ms/op Benchmark????????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstantLong.segment_loop?????????? avgt?? 30? 1.377 ? 0.026? ms/op LoopOverNonConstantLong.segment_loop_readonly? avgt?? 30? 1.173 ? 0.023? ms/op LoopOverNonConstantLong.segment_loop_slice???? avgt?? 30? 1.170 ? 0.029? ms/op baseline w/o workaround + proposed patch Benchmark????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstant.segment_loop?????????? avgt?? 30? 0.530 ? 0.042? ms/op LoopOverNonConstant.segment_loop_readonly? avgt?? 30? 0.508 ? 0.013? ms/op LoopOverNonConstant.segment_loop_slice???? avgt?? 30? 0.520 ? 0.016? ms/op Benchmark????????????????????????????????????? Mode? Cnt? Score Error? Units LoopOverNonConstantLong.segment_loop?????????? avgt?? 30? 1.575 ? 0.066? ms/op LoopOverNonConstantLong.segment_loop_readonly? avgt?? 30? 1.517 ? 0.020? ms/op LoopOverNonConstantLong.segment_loop_slice???? avgt?? 30? 1.496 ? 0.042? ms/op Overall, unless I did some mistake, it doesn't look like the patch is changing much. The baseline + workaround for small segment remains the fastest version around, which performs on par with unsafe. If we remove the workaround, we get some 1.5-2x slower; but if we start looping on longs (see the LoopOverNonConstantLong suite), then performances get much, much worse. Am I missing something? Is our implementation doing something that is confusing your optimization? Cheers Maurizio [1] - https://github.com/openjdk/panama-foreign/tree/foreign-memaccess [2] - http://cr.openjdk.java.net/~mcimadamore/panama/long_loop%2bpanama.patch From christian.hagedorn at oracle.com Wed Jun 10 06:39:57 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 10 Jun 2020 08:39:57 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: <1dd809b6-ad6d-dfbf-9c9c-e83804197a75@oracle.com> References: <1dd809b6-ad6d-dfbf-9c9c-e83804197a75@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 09.06.20 19:54, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/9/20 10:32 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 09.06.20 18:54, Vladimir Kozlov wrote: >>> I think the check should be 'verify_depth > 0' because verify_depth >>> can be negative: >>> >>> +? verify_depth--; // Visiting the first node on depth 1 >>> +? bool add_to_worklist = verify_depth != 0; >>> >>> Or it is intentional for negative value to visit all nodes? Then it >>> needs comment. >> >> Yes, with negative values it visit all nodes. There is a comment about >> it above: >> >> 2155 // Verify all nodes if verify_depth is negative >> 2156 void Node::verify(Node* n, int verify_depth) { >> >> But maybe I should add another comment for add_to_worklist as well to >> make it more clear. >> >>> In such case you need restore verify_depth == 0 check to return >>> otherwise with 0 the code will work as with negative value: >>> ?? if (verify_depth == 0) { >>> ???? return; >>> ?? } >>> ?? bool add_to_worklist = true; >>> >>> Or may be use assert(verify_depth != 0, "sanity") instead of check. >> >> I like the solution of an assert. I added it at the start of the >> method together with the additional comment in a new webrev. It only >> initializes add_to_worklist with false if verify() is called with >> verify_depth = 1. >> >> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ >> >> Best regards, >> Christian >> >> >>> On 6/9/20 8:26 AM, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8246203 >>>> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.00/ >>>> >>>> The testcase creates a deep graph with a lot of nodes on a chain. >>>> When running with -XX:+VerifyIterativeGVN, it recursively calls >>>> Node::verify_recur() for each input node discovered which eventually >>>> results in a segmentation fault due to a stack overflow (around >>>> 10000 recursive calls due to such a long chain of nodes). The fix >>>> just converts the recursive algorithm into an iterative one to avoid >>>> a segmentation fault. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Christian From evgeny.nikitin at oracle.com Wed Jun 10 11:25:52 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Wed, 10 Jun 2020 13:25:52 +0200 Subject: RFR(S): 8242923: Trigger interface MethodHandle resolve in test without Nashorn. In-Reply-To: References: Message-ID: <147937e6-abca-82b0-29b4-c8399744fcc8@oracle.com> Hi Igor, Please find fixed version at http://cr.openjdk.java.net/~enikitin/8242923/webrev.02/ > - can you use Path.resolve("/tmp/some_file") instead of new File..:toPath? Well, Path.of(...), I guess. Path.resolve is not static. Fixed. Thanks in advance, //Evgeny. On 2020-06-03 05:54, Igor Ignatyev wrote: > Hi Evgeny, > > looks good to me, a couple editorial nits in CreatesInterfaceDotEqualsCallInfo.java: > - at L#39, you have double space b/w throws and Throwable; > - I don't feel like line breaks at L#41, L#42 and L#44 make it more readable; > - can you use Path.resolve("/tmp/some_file") instead of new File..:toPath? > - "/tmp/some_file" might confuse future readers into believe that's important that file is in /tmp or doesn't exist or smth else; so I'd prefer to just use "." > > Thanks, > -- Igor > >> On May 28, 2020, at 12:22 PM, Evgeny Nikitin wrote: >> >> Hi, >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8242923 >> Webrev: http://cr.openjdk.java.net/~enikitin/8242923/webrev.01/ >> >> The test used Nashorn to trigger incorrect MethodHandle resolve in the linkResolver.cpp (which in turn caused crash on the MethodHandle invokation). >> >> Test's functionality have been checked via rolling back the fix made in the https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012155.html, the test fails on 4 common platforms in mach5. >> >> The version with the bugfix reverted can be found here: http://cr.openjdk.java.net/~enikitin/8242923/webrev.00/ >> >> The change has been checked in mach5 for the 4 common platforms (passed). >> >> Please review, >> /Evgeny Nikitin. > From rwestrel at redhat.com Wed Jun 10 11:43:48 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 10 Jun 2020 13:43:48 +0200 Subject: testing support loops with long (64b) trip counts In-Reply-To: References: Message-ID: <87o8prxi17.fsf@redhat.com> Hi Maurizio, Thanks for giving the patch a try. > Overall, unless I did some mistake, it doesn't look like the patch is > changing much. The baseline + workaround for small segment remains the > fastest version around, which performs on par with unsafe. If we remove > the workaround, we get some 1.5-2x slower; but if we start looping on > longs (see the LoopOverNonConstantLong suite), then performances get > much, much worse. > > Am I missing something? Is our implementation doing something that is > confusing your optimization? I'll take a look at the benchmarks. With the current patch, elimination of range checks are unlikely to happen. That's follow up work. Could that be the problem? Roland. From maurizio.cimadamore at oracle.com Wed Jun 10 11:46:55 2020 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 10 Jun 2020 12:46:55 +0100 Subject: testing support loops with long (64b) trip counts In-Reply-To: <87o8prxi17.fsf@redhat.com> References: <87o8prxi17.fsf@redhat.com> Message-ID: <23db4c59-7a42-d1ce-ddb6-b132c9da1b4f@oracle.com> On 10/06/2020 12:43, Roland Westrelin wrote: > Hi Maurizio, > > Thanks for giving the patch a try. > >> Overall, unless I did some mistake, it doesn't look like the patch is >> changing much. The baseline + workaround for small segment remains the >> fastest version around, which performs on par with unsafe. If we remove >> the workaround, we get some 1.5-2x slower; but if we start looping on >> longs (see the LoopOverNonConstantLong suite), then performances get >> much, much worse. >> >> Am I missing something? Is our implementation doing something that is >> confusing your optimization? > I'll take a look at the benchmarks. With the current patch, elimination > of range checks are unlikely to happen. That's follow up work. Could > that be the problem? Definitively :-) Range checks are always the issue with performance potholes in the foreign memory API. Maurizio > > Roland. > From tobias.hartmann at oracle.com Wed Jun 10 12:13:33 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 10 Jun 2020 14:13:33 +0200 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" In-Reply-To: References: Message-ID: Hi Christian, Looks good to me too. In TestExceptionBlockWithPredecessorsMain:40/41, the brackets should be around the "i % 2 == 0" expression and in parse1.cpp:1297 there is an excess whitespace. No new webrev required. Best regards, Tobias On 05.06.20 17:58, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8244719 > http://cr.openjdk.java.net/~chagedorn/8244719/webrev.00/ > > The assertion failure at [5] can be traced back to a wrong assumption made in > Parse::Block::init_graph(). It explicitly states in a comment there that we never call > next_path_num() along exception paths [1]. But it turns out that this is only true for bytecode > generated by Javac which does not seem to produce bytecode where an exception handler is reached by > an explicit jump or "fall through" bytecode. An exception handler is only reached with an athrow. > > However, it is possible to break that assumption with some custom bytecode. The jasm testcase > generates such a valid bytecode sequence where an exception handler is reached by jumps from another > exception handler: > > ????? 69: astore_1 > ????? 70: aload_1 > ????? 71: aload_0 > ????? 72: getfield????? #5????????????????? // Field loopCounter:I > ????? 75: bipush??????? 10 > ????? 77: if_icmpge???? 93 // Explicit jump to exception handler, non-Javac > ????? 90: goto????????? 93 // Explicit jump to exception handler, non-Javac > ????? 93: astore_1 > ????? 94: return > ??? Exception table: > ?????? from??? to? target type > ?????????? 0??? 66??? 69?? Class java/lang/RuntimeException > ?????????? 0??? 66??? 93?? Class java/lang/Throwable > > This means that the first time Parse::merge_exception() is called for the exception handler block at > bci 93, pnum is set to 3 since there are 2 predecessors (2 jumps to it). In the very first call to > merge_common(), is_merged() is still false and we record a state. All following calls to > merge_common() for this exception block will take the else case [2]. Once we are processing the > blocks for the exception handler at bci 69, we call merge() (and therefore next_path_num()) in > do_one_block() [3] twice with target_bci = 93 (2 jumps to bci 93). The last time with pnum = 1 for > bci 90: goto and we transform the phi with gvn and set the hash_lock for it to 1 at [4]. > > Now comes a second bytecode modification trick where we first hit a trap while parsing a block in > do_all_blocks(). Therefore, all successor blocks on that path are not merged and skipped in the > first iteration of the loop in do_all_blocks() (at this point these blocks seem to be dead). But > later we can have a jump back to such a seemingly dead block again. Those are then processed in the > second iteration of the loop in do_all_blocks(). If one of these blocks now additionally throw an > exception, we can hit this assertion failure. An example could look as follows: > > Example: > // First iteration in do_all_blocks() > Parse B1; > Parse B2; // Hit trap. Stop parsing on that path, skip on B3 and B4 which immediately follow B2 and > have no other predecessors > Skip B3; // Was not merged. Assumed to be dead at this point > Skip B4; // Was not merged. Assumed to be dead at this point > Parse B5; // Discover jump to B3 -> merge B3. Will be processed but only in the next iteration since > rpo of B2 is smaller than the one of B5 > Parse E1; // Parse exception handler 1 at bci 69 > Parse E2; // Parse exception handler 2 at bci 93, apply gvn for phi > > // Next iteration in do_all_blocks() > Parse B3; // Is now merged and ready to be parsed. Has exception to E2: call merge_exception() -> > merge_common() with E2 as target and pnum > 1. We hit the assertion at [5] since we already applied > a transformation for a phi in the last iteration and therefore have a non-zero hash_lock. > > As a solution to this problem, I suggest to fix the wrong assumption by changing > Parse::Block::init_graph() to also count predecessors for exception blocks. This ensures that [4] is > really the last merge for a phi. > > I did some additional performance testing with standard benchmarks and did not find any regressions. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1314 > [2] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1678 > [3] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1508 > [4] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1773 > [5] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1764 From christian.hagedorn at oracle.com Wed Jun 10 12:29:49 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 10 Jun 2020 14:29:49 +0200 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" In-Reply-To: References: Message-ID: Thank you Tobias for your review! I updated my webrev in place with your comments. Best regards Christian On 10.06.20 14:13, Tobias Hartmann wrote: > Hi Christian, > > Looks good to me too. > > In TestExceptionBlockWithPredecessorsMain:40/41, the brackets should be around the "i % 2 == 0" > expression and in parse1.cpp:1297 there is an excess whitespace. No new webrev required. > > Best regards, > Tobias > > On 05.06.20 17:58, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8244719 >> http://cr.openjdk.java.net/~chagedorn/8244719/webrev.00/ >> >> The assertion failure at [5] can be traced back to a wrong assumption made in >> Parse::Block::init_graph(). It explicitly states in a comment there that we never call >> next_path_num() along exception paths [1]. But it turns out that this is only true for bytecode >> generated by Javac which does not seem to produce bytecode where an exception handler is reached by >> an explicit jump or "fall through" bytecode. An exception handler is only reached with an athrow. >> >> However, it is possible to break that assumption with some custom bytecode. The jasm testcase >> generates such a valid bytecode sequence where an exception handler is reached by jumps from another >> exception handler: >> >> ????? 69: astore_1 >> ????? 70: aload_1 >> ????? 71: aload_0 >> ????? 72: getfield????? #5????????????????? // Field loopCounter:I >> ????? 75: bipush??????? 10 >> ????? 77: if_icmpge???? 93 // Explicit jump to exception handler, non-Javac >> ????? 90: goto????????? 93 // Explicit jump to exception handler, non-Javac >> ????? 93: astore_1 >> ????? 94: return >> ??? Exception table: >> ?????? from??? to? target type >> ?????????? 0??? 66??? 69?? Class java/lang/RuntimeException >> ?????????? 0??? 66??? 93?? Class java/lang/Throwable >> >> This means that the first time Parse::merge_exception() is called for the exception handler block at >> bci 93, pnum is set to 3 since there are 2 predecessors (2 jumps to it). In the very first call to >> merge_common(), is_merged() is still false and we record a state. All following calls to >> merge_common() for this exception block will take the else case [2]. Once we are processing the >> blocks for the exception handler at bci 69, we call merge() (and therefore next_path_num()) in >> do_one_block() [3] twice with target_bci = 93 (2 jumps to bci 93). The last time with pnum = 1 for >> bci 90: goto and we transform the phi with gvn and set the hash_lock for it to 1 at [4]. >> >> Now comes a second bytecode modification trick where we first hit a trap while parsing a block in >> do_all_blocks(). Therefore, all successor blocks on that path are not merged and skipped in the >> first iteration of the loop in do_all_blocks() (at this point these blocks seem to be dead). But >> later we can have a jump back to such a seemingly dead block again. Those are then processed in the >> second iteration of the loop in do_all_blocks(). If one of these blocks now additionally throw an >> exception, we can hit this assertion failure. An example could look as follows: >> >> Example: >> // First iteration in do_all_blocks() >> Parse B1; >> Parse B2; // Hit trap. Stop parsing on that path, skip on B3 and B4 which immediately follow B2 and >> have no other predecessors >> Skip B3; // Was not merged. Assumed to be dead at this point >> Skip B4; // Was not merged. Assumed to be dead at this point >> Parse B5; // Discover jump to B3 -> merge B3. Will be processed but only in the next iteration since >> rpo of B2 is smaller than the one of B5 >> Parse E1; // Parse exception handler 1 at bci 69 >> Parse E2; // Parse exception handler 2 at bci 93, apply gvn for phi >> >> // Next iteration in do_all_blocks() >> Parse B3; // Is now merged and ready to be parsed. Has exception to E2: call merge_exception() -> >> merge_common() with E2 as target and pnum > 1. We hit the assertion at [5] since we already applied >> a transformation for a phi in the last iteration and therefore have a non-zero hash_lock. >> >> As a solution to this problem, I suggest to fix the wrong assumption by changing >> Parse::Block::init_graph() to also count predecessors for exception blocks. This ensures that [4] is >> really the last merge for a phi. >> >> I did some additional performance testing with standard benchmarks and did not find any regressions. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1314 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1678 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1508 >> [4] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1773 >> [5] http://hg.openjdk.java.net/jdk/jdk/file/71ec718a0bd0/src/hotspot/share/opto/parse1.cpp#l1764 From tobias.hartmann at oracle.com Wed Jun 10 12:30:17 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 10 Jun 2020 14:30:17 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: Message-ID: Hi Christian, On 09.06.20 19:32, Christian Hagedorn wrote: > http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ Looks good to me. Did you run testing with -XX:+VerifyIterativeGVN? Best regards, Tobias From tobias.hartmann at oracle.com Wed Jun 10 12:31:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 10 Jun 2020 14:31:12 +0200 Subject: [15] RFR(S): 8244719: CTW: C2 compilation fails with "assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it" In-Reply-To: References: Message-ID: On 10.06.20 14:29, Christian Hagedorn wrote: > I updated my webrev in place with your comments. Looks good. Thanks. Best regards, Tobias From igor.ignatyev at oracle.com Wed Jun 10 20:55:36 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 10 Jun 2020 13:55:36 -0700 Subject: RFR(S): 8242923: Trigger interface MethodHandle resolve in test without Nashorn. In-Reply-To: <147937e6-abca-82b0-29b4-c8399744fcc8@oracle.com> References: <147937e6-abca-82b0-29b4-c8399744fcc8@oracle.com> Message-ID: <1C5E54B3-88FD-4247-BBE6-7AA4C1265358@oracle.com> Hi Evgeny, LGTM -- Igor > On Jun 10, 2020, at 4:25 AM, Evgeny Nikitin wrote: > > Hi Igor, > > Please find fixed version at http://cr.openjdk.java.net/~enikitin/8242923/webrev.02/ > > > - can you use Path.resolve("/tmp/some_file") instead of new File..:toPath? > > Well, Path.of(...), I guess. Path.resolve is not static. Fixed. yes, I meant Path::of; don't know why I wrote Path.resolve... > > Thanks in advance, > //Evgeny. > > > On 2020-06-03 05:54, Igor Ignatyev wrote: >> Hi Evgeny, >> looks good to me, a couple editorial nits in CreatesInterfaceDotEqualsCallInfo.java: >> - at L#39, you have double space b/w throws and Throwable; >> - I don't feel like line breaks at L#41, L#42 and L#44 make it more readable; >> - can you use Path.resolve("/tmp/some_file") instead of new File..:toPath? >> - "/tmp/some_file" might confuse future readers into believe that's important that file is in /tmp or doesn't exist or smth else; so I'd prefer to just use "." >> Thanks, >> -- Igor >>> On May 28, 2020, at 12:22 PM, Evgeny Nikitin wrote: >>> >>> Hi, >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8242923 >>> Webrev: http://cr.openjdk.java.net/~enikitin/8242923/webrev.01/ >>> >>> The test used Nashorn to trigger incorrect MethodHandle resolve in the linkResolver.cpp (which in turn caused crash on the MethodHandle invokation). >>> >>> Test's functionality have been checked via rolling back the fix made in the https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012155.html, the test fails on 4 common platforms in mach5. >>> >>> The version with the bugfix reverted can be found here: http://cr.openjdk.java.net/~enikitin/8242923/webrev.00/ >>> >>> The change has been checked in mach5 for the 4 common platforms (passed). >>> >>> Please review, >>> /Evgeny Nikitin. From vladimir.kozlov at oracle.com Thu Jun 11 00:40:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Jun 2020 17:40:19 -0700 Subject: [15] RFR(XS) 8247350: [aarch64] assert(false) failed: wrong size of mach node Message-ID: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8247350 https://cr.openjdk.java.net/~kvn/8247350/webrev.00/ Code size for decodeHeapOop is different in scratch buffer with -XX:+VerifyOops flag on. The difference comes from MacroAssembler::mov_immediate64() which generates different code if distance to message string 'b' in mov(rscratch1, (address)b) is different. Use movptr() instead of mov() in verify_oop() method to generate consistent code for loading C string address. Added debug dump code in case of mismatching code generation. Tested with failing test case. Thanks, Vladimir From javajiva at amazon.com Thu Jun 11 00:47:50 2020 From: javajiva at amazon.com (Jiva, Azeem) Date: Thu, 11 Jun 2020 00:47:50 +0000 Subject: [15] RFR(XS) 8247350: [aarch64] assert(false) failed: wrong size of mach node In-Reply-To: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> References: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> Message-ID: <4E8B20A3-F150-45E6-9CF5-95D4CD8DA905@amazon.com> Looks good to me. -- Azeem Jiva ?On 6/10/20, 5:42 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. https://bugs.openjdk.java.net/browse/JDK-8247350 https://cr.openjdk.java.net/~kvn/8247350/webrev.00/ Code size for decodeHeapOop is different in scratch buffer with -XX:+VerifyOops flag on. The difference comes from MacroAssembler::mov_immediate64() which generates different code if distance to message string 'b' in mov(rscratch1, (address)b) is different. Use movptr() instead of mov() in verify_oop() method to generate consistent code for loading C string address. Added debug dump code in case of mismatching code generation. Tested with failing test case. Thanks, Vladimir From vladimir.kozlov at oracle.com Thu Jun 11 01:06:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Jun 2020 18:06:19 -0700 Subject: [15] RFR(XS) 8247350: [aarch64] assert(false) failed: wrong size of mach node In-Reply-To: <4E8B20A3-F150-45E6-9CF5-95D4CD8DA905@amazon.com> References: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> <4E8B20A3-F150-45E6-9CF5-95D4CD8DA905@amazon.com> Message-ID: <6765b33e-b22a-4312-b6bf-eac2e6142d5e@oracle.com> Thank you, Azeem Vladimir On 6/10/20 5:47 PM, Jiva, Azeem wrote: > Looks good to me. > From Yang.Zhang at arm.com Thu Jun 11 05:36:45 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Thu, 11 Jun 2020 05:36:45 +0000 Subject: 8244926: Add absolute check for int/long to generate Abs nodes In-Reply-To: References: Message-ID: Hi, Ping it again. Could anyone help to review this patch? Regards Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of Yang Zhang Sent: Thursday, June 4, 2020 4:28 PM To: hotspot-compiler-dev at openjdk.java.net Cc: nd Subject: RFR: 8244926: Add absolute check for int/long to generate Abs nodes Hi, May I have a review of this enhancement of absolute check for int/long? JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: ((a < 0) ? -a : a) ((a <= 0) ? -a : a) ((a > 0) ? a : -a) ((a >= 0) ? a : -a) Test case: public static int absi(int a) { return ((a < 0) ? -a : a); } With c2, AbsI node is generated and matched. The following snippet is generated on x86: 0x00007f67c8b6155b: mov %ecx,%r11d 0x00007f67c8b6155e: sar $0x1f,%r11d 0x00007f67c8b61562: mov %ecx,%r10d 0x00007f67c8b61565: xor %r11d,%r10d 0x00007f67c8b61568: sub %r11d,%r10d On AArch64: 0x0000ffffa8b878e4: cmp w3, wzr 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop Note: AArch64 result is based on this patch which is in review [2]. Test: Full jtreg on x86 and AArch64, no new failure Performance: Jmh test is uploaded. http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java X86 Before: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op After: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op AArch64: Before: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op After: Benchmark (size) Mode Cnt Score Error Units TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op [1] http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/share/opto/cfgnode.cpp#l1519 [2] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html Regards, Yang From christian.hagedorn at oracle.com Thu Jun 11 07:38:27 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 11 Jun 2020 09:38:27 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: Message-ID: <55e24d91-e483-3d09-f211-6bf41aa752f5@oracle.com> Hi Tobias Thank you for your review! I ran some more testing with -XX:+VerifyIterativeGVN over night and compared the iterative solution with the old recursive version. We hit some more test timeouts with the new iterative solution because we are now really looking at all nodes for a requested depth. There are cases where the recursive solution would not do that. For example, if verify_depth = 4 and given a node chain 1->2->3->4->5, the recursive DFS solution visits nodes 1-4 and then at node 4 when it wants to visit node 5 it immediately returns because the depth 4 is reached. Node 4 is now marked as visited. If, however, there is an additional path 1->4->5->6->7 to look at later, the recursive DFS solution will just stop at node 4 because that node was already visited. The iterative solution, on the other hand, processes the nodes in a BFS and will visit all nodes up to depth 4, including node 5 and 6. This results in spending more time (as seen with more timeouts for more complex graphs/tests). We could now try to simulate the same recursive DFS behavior in an iterative approach with a stack. But as we seem to be missing some nodes at a requested depth this is probably not what we really want? Alternatively, we could go with this BFS solution as it is and decrement verify_depth = 4 in the call Node::verify(n, 4) to reduce the time spent. Or just leave webrev.01 as it is and treat the additional timeouts as expected. What do you think? Best regards, Christian On 10.06.20 14:30, Tobias Hartmann wrote: > Hi Christian, > > On 09.06.20 19:32, Christian Hagedorn wrote: >> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ > > Looks good to me. > > Did you run testing with -XX:+VerifyIterativeGVN? > > Best regards, > Tobias > From patric.hedlin at oracle.com Thu Jun 11 08:24:17 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Thu, 11 Jun 2020 10:24:17 +0200 Subject: RFR(S): 8247200: assert((unsigned)fpargs < 32) Message-ID: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ Removing assert and some associated dead code. Testing: tier1-3,6 Best regards, Patric From adinn at redhat.com Thu Jun 11 08:32:53 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 11 Jun 2020 09:32:53 +0100 Subject: [aarch64-port-dev ] [15] RFR(XS) 8247350: [aarch64] assert(false) failed: wrong size of mach node In-Reply-To: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> References: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> Message-ID: <9d651b83-3a63-d4ae-37e9-0a664bb702cd@redhat.com> On 11/06/2020 01:40, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8247350 > https://cr.openjdk.java.net/~kvn/8247350/webrev.00/ > > Code size for decodeHeapOop is different in scratch buffer with > -XX:+VerifyOops flag on. The difference comes from > MacroAssembler::mov_immediate64() which generates different code if > distance to message string 'b' in mov(rscratch1, (address)b) is different. > > Use movptr() instead of mov() in verify_oop() method to generate > consistent code for loading C string address. > Added debug dump code in case of mismatching code generation. > > Tested with failing test case. Looks ok to me, Vladimir. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From nils.eliasson at oracle.com Thu Jun 11 09:20:37 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 11 Jun 2020 11:20:37 +0200 Subject: 8244926: Add absolute check for int/long to generate Abs nodes In-Reply-To: References: Message-ID: <8b67dcb0-d710-82fd-b99d-875096b8f4db@oracle.com> Hi Yang, The patch looks good. Reviewed. Best regards, Nils Eliasson On 2020-06-11 07:36, Yang Zhang wrote: > Hi, > > Ping it again. Could anyone help to review this patch? > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Yang Zhang > Sent: Thursday, June 4, 2020 4:28 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: nd > Subject: RFR: 8244926: Add absolute check for int/long to generate Abs nodes > > Hi, > > May I have a review of this enhancement of absolute check for int/long? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 > Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ > > There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: > ((a < 0) ? -a : a) > ((a <= 0) ? -a : a) > ((a > 0) ? a : -a) > ((a >= 0) ? a : -a) > > Test case: > public static int absi(int a) { > return ((a < 0) ? -a : a); > } > > With c2, AbsI node is generated and matched. The following snippet is generated on x86: > 0x00007f67c8b6155b: mov %ecx,%r11d > 0x00007f67c8b6155e: sar $0x1f,%r11d > 0x00007f67c8b61562: mov %ecx,%r10d > 0x00007f67c8b61565: xor %r11d,%r10d > 0x00007f67c8b61568: sub %r11d,%r10d > > On AArch64: > 0x0000ffffa8b878e4: cmp w3, wzr > 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop > > Note: AArch64 result is based on this patch which is in review [2]. > > Test: > Full jtreg on x86 and AArch64, no new failure > > Performance: > Jmh test is uploaded. > http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java > > X86 > Before: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op > TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op > TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op > TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op > TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op > TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op > TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op > TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op > > After: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op > TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op > TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op > TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op > TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op > TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op > TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op > TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op > > AArch64: > Before: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op > TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op > TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op > TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op > TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op > TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op > TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op > TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op > > After: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op > TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op > TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op > TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op > TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op > TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op > TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op > TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op > > [1] http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/share/opto/cfgnode.cpp#l1519 > [2] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html > > > Regards, > Yang > From nils.eliasson at oracle.com Thu Jun 11 13:46:30 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 11 Jun 2020 15:46:30 +0200 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> References: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> Message-ID: Hi Xin, src/hotspot/share/opto/c2_globals.hpp 367 notproduct(intx, PrintIdealGraphLevel, 0, \ 368 "Level of detail of the ideal graph printout. " \ 369 "System-wide value, -1=absolutely nothing is printed, " \ 370 "0=nothing except IGVPrintLevel directives, 4=all details printed. " \ 371 "Level of detail of printouts can be set on a per-method level " \ 372 "as well by using CompileCommand=option.") \ 373 range(-1, 4) Can you change the text for -1 to "printing is disabled" - because that is the practical difference between 0 and -1. Otherwise - it looks good. For a future fix I would suggest creating the printer on demand. Then the -1 value wouldn't be needed anymore. Regards, Nils Eliasson On 2020-06-09 00:59, Liu, Xin wrote: > Hi, Paul and Volker, > > Yes, it's misleading to set PrintIdealGraph true by default. I set it back and here is a new revision: > http://cr.openjdk.java.net/~xliu/8139046/02/webrev/ > > PrintIdealGraphLevel=0 doesn't work because users might use compiler directive IGVPrintLevel > 0. > I extent the range of PrintIdealGraphLevel a little bit. if PrintIdealGraphLevel=-1, it means user don't want to have any Ideal Graph dumped. I make -XX:-PrintIdealGraph a synonym of PrintIdealGraphLevel == -1. > > The feature is useful when developers start using a lot of compiler directives with IGVPrintLevel. Without it, developers have to modify many directives to turn off PrintIdealGraph. > > I still think PrintIdealGraph should go away because PrintIdealGraphLevel can do better job. As Paul suggested, PrintIdealGraph has been wiped out from share/opto directory. > Here is the new comment of PrintIdealGraphLevel. > intx PrintIdealGraphLevel = 0 {C2 notproduct} {default} Level of detail of the ideal graph printout. System-wide value, -1=absolutely nothing is printed, 0=nothing except IGVPrintLevel directives, 4=all details printed. Level of detail of printouts can be set on a per-method level as well by using CompileCommand=option. > > Thanks, > --lx > > > ?On 6/5/20, 6:38 AM, "Hohensee, Paul" wrote: > > Re PrintIdealGraph, I'd consider removing its remaining reference in compile.hpp in favor of > > 1. Leave the default value of PrintIdealGraph at false. It's confusing to see it set true by default. > > 2. Make -XX:-PrintIdealGraph a synonym for PrintIdealGraphLevel == 0. I.e., set PrintIdealGraphLevel to 0 in ergo_initialize() if PrintIdealGraph was explicitly set false on the command line. Use > > if (FLAG_IS_CMDLINE(PrintIdealGraph) && !PrintIdealGraph)) { > FLAG_SET_ERGO(PrintIdealGraphLevel, 0); > } > > 3. Ignore PrintIdealGraph otherwise, which is what happens now (i.e., +PrintIdealGraph has no effect if PrintIdealGraphLevel == 0, which it is by default). Have should_print() in compile.hpp test for PrintIdealGraphLevel == 0 instead of !PrintIdealGraph. > > That way, a future deprecation/removal of PrintIdealGraph is isolated. And, you file an RFE to do that. > > Thanks, > Paul > > On 6/4/20, 6:06 PM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: > > Hi, Volker, > > Thank you to review it. > > - Why do you need the extra check for "method() != NULL": > Yes, I try to avoid a corner case. Previously, _printer is set to NULL when c2 compiles runtime stubs(http://hg.openjdk.java.net/jdk/jdk/file/b06f452c8d61/src/hotspot/share/opto/compile.cpp#l825). > That will cause a problem for my patch because I initialize _printer to NULL and create it when we do need to print IdealGraph. > If users specify PrintIdealGraphLevel>0, C2 will try to dump a NULL method. It is undefined to invoke begin/end_method() if the C->method() is NULL, so it better guard with the NULL check. > > Here is the context: > " > Command Line: -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=hello.xml > > Host: ip-172-31-94-125, Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 48 cores, 184G, Ubuntu 18.04.4 LTS > Time: Thu Jun 4 23:25:00 2020 UTC elapsed time: 0.150427 seconds (0d 0h 0m 0s) > > --------------- T H R E A D --------------- > > Current thread (0x00007f69a83f3150): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=32990, stack(0x00007f692bc8b000,0x00007f692bd8c000)] > > Stack: [0x00007f692bc8b000,0x00007f692bd8c000], sp=0x00007f692bd893c0, free space=1016k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xc17c94] IdealGraphPrinter::begin_method()+0x434 > V [libjvm.so+0x88e5b1] CompileWrapper::CompileWrapper(Compile*)+0x201 > V [libjvm.so+0x89f689] Compile::Compile(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool, DirectiveSet*)+0x4c9 > V [libjvm.so+0x1425586] OptoRuntime::generate_stub(ciEnv*, TypeFunc const* (*)(), unsigned char*, char const*, int, bool, bool, bool)+0x156 > " > > I took all your advices except for that. > Could you take a look at the new revision: > http://cr.openjdk.java.net/~xliu/8139046/01/webrev/ > > To get rid of the param 'bool need', I set PrintIdealGraph true by default. Previously, if PrintIdealGraph is set, hotspot will initialize a printer object for every Compiler thread. It's not true anymore, so no extra cost. > After this patch, the only usage of that flag is to shut down IGVPrinter completely by -XX:-PrintIdealGraph. I wish we can vote it out in the future and use PrintIdealGraphLevel. > > Thanks, > --lx > > > > From: Volker Simonis > Date: Thursday, June 4, 2020 at 10:13 AM > To: "Liu, Xin" > Cc: "hotspot-compiler-dev at openjdk.java.net" > Subject: RE: [EXTERNAL] RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > Hi Xin, > > Thanks for addressing this issue. It looks like a nice cleanup. Please find my further comments inline: > > On Tue, Jun 2, 2020 at 10:57 PM Liu, Xin wrote: > Hi, > > Could you review this webrev? It fixes a minor problem when users only use IGVPrintLevel in Compiler Directives. > Jbs: https://bugs.openjdk.java.net/browse/JDK-8139046 > Webrev: http://cr.openjdk.java.net/~xliu/8139046/00/webrev/ > > I move "bool should_print(int level)" from idealGraphPrinter to Compile because the later has the information. > In this way, Compile can allocate _printer on demand. If Compile::should_print(level) return true, it guarantees that Compile::printer() is not NULL. > If users pass in CompileCommand="option,Hello::add,intx,IGVPrintLevel,3", printer() will only turn on for that compiler thread. > > - Why do you need the extra check for "method() != NULL": > 619 if (should_print(1) && method() != NULL) { > 4584 if (should_print(level) && method() != NULL) { > in "Compile::{begin,end_method}". This check wasn't there before. Does it fix an issue? > > - I don't see why you need the additional "need" parameter in "IdealGraphPrinter::printer()". The function only gets called with "need == true" anyway, so I think you can remove it. > > - Why did you make > 453 IdealGraphPrinter* printer() { return _printer; } > a "const" function? > 453 IdealGraphPrinter* printer() const { return _printer; } > I don't think it is required? > > - As an additional cleanup, you can change all "should_print(1)" calls to "should_print()" because "1" is the default parameter anyway. > > Besides that, your change looks good. > > Thank you and best regards, > Volker > > Ran hotspot:tier1 using fastdebug build. Only gtest/GTestWrapper.java failed. > That's another issue. Currently, Openjdk can't execute any gtest because of a linkage error. > Error occurred during initialization of VM > Unable to load native library: /backup/jdk/build/linux-x86_64-server-fastdebug/images/jdk/lib/libjava.so: symbol JVM_GetPermittedSubclasses version SUNWprivate_1.1 not defined in file libjvm.so with link time reference > > Thanks, > --lx > > > > > From vladimir.kozlov at oracle.com Thu Jun 11 16:27:08 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Jun 2020 09:27:08 -0700 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: <55e24d91-e483-3d09-f211-6bf41aa752f5@oracle.com> References: <55e24d91-e483-3d09-f211-6bf41aa752f5@oracle.com> Message-ID: I would keep changes as they are - they provide correct testing. And treat timeouts as expected. Regards, Vladimir On 6/11/20 12:38 AM, Christian Hagedorn wrote: > Hi Tobias > > Thank you for your review! > > I ran some more testing with -XX:+VerifyIterativeGVN over night and compared the iterative solution with the old > recursive version. We hit some more test timeouts with the new iterative solution because we are now really looking at > all nodes for a requested depth. There are cases where the recursive solution would not do that. For example, if > verify_depth = 4 and given a node chain 1->2->3->4->5, the recursive DFS solution visits nodes 1-4 and then at node 4 > when it wants to visit node 5 it immediately returns because the depth 4 is reached. Node 4 is now marked as visited. > If, however, there is an additional path 1->4->5->6->7 to look at later, the recursive DFS solution will just stop at > node 4 because that node was already visited. The iterative solution, on the other hand, processes the nodes in a BFS > and will visit all nodes up to depth 4, including node 5 and 6. This results in spending more time (as seen with more > timeouts for more complex graphs/tests). > > We could now try to simulate the same recursive DFS behavior in an iterative approach with a stack. But as we seem to be > missing some nodes at a requested depth this is probably not what we really want? Alternatively, we could go with this > BFS solution as it is and decrement verify_depth = 4 in the call Node::verify(n, 4) to reduce the time spent. Or just > leave webrev.01 as it is and treat the additional timeouts as expected. > > What do you think? > > Best regards, > Christian > > On 10.06.20 14:30, Tobias Hartmann wrote: >> Hi Christian, >> >> On 09.06.20 19:32, Christian Hagedorn wrote: >>> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ >> >> Looks good to me. >> >> Did you run testing with -XX:+VerifyIterativeGVN? >> >> Best regards, >> Tobias >> From vladimir.kozlov at oracle.com Thu Jun 11 16:35:16 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Jun 2020 09:35:16 -0700 Subject: [aarch64-port-dev ] [15] RFR(XS) 8247350: [aarch64] assert(false) failed: wrong size of mach node In-Reply-To: <9d651b83-3a63-d4ae-37e9-0a664bb702cd@redhat.com> References: <612fcc45-0886-eb21-ef25-723d934a358b@oracle.com> <9d651b83-3a63-d4ae-37e9-0a664bb702cd@redhat.com> Message-ID: Thank you, Andrew Vladimir On 6/11/20 1:32 AM, Andrew Dinn wrote: > On 11/06/2020 01:40, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8247350 >> https://cr.openjdk.java.net/~kvn/8247350/webrev.00/ >> >> Code size for decodeHeapOop is different in scratch buffer with >> -XX:+VerifyOops flag on. The difference comes from >> MacroAssembler::mov_immediate64() which generates different code if >> distance to message string 'b' in mov(rscratch1, (address)b) is different. >> >> Use movptr() instead of mov() in verify_oop() method to generate >> consistent code for loading C string address. >> Added debug dump code in case of mismatching code generation. >> >> Tested with failing test case. > Looks ok to me, Vladimir. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From vladimir.kozlov at oracle.com Thu Jun 11 16:56:23 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Jun 2020 09:56:23 -0700 Subject: RFR(S): 8247200: [aarch64] assert((unsigned)fpargs < 32) In-Reply-To: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> References: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> Message-ID: CCing to aarch64 mailing list. Vladimir On 6/11/20 1:24 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ > > > Removing assert and some associated dead code. > > > Testing: tier1-3,6 > > > Best regards, > Patric From xxinliu at amazon.com Thu Jun 11 19:30:01 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 11 Jun 2020 19:30:01 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: References: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> Message-ID: <1EFE6B0C-F158-4D37-89E0-C698A89E410A@amazon.com> Hi, Nils, Thank you to look into the webrev. Can you change the text for -1 to "printing is disabled" - because that is the practical difference between 0 and -1. Got it. here is the revision change the description. http://cr.openjdk.java.net/~xliu/8139046/03/webrev/src/hotspot/share/opto/c2_globals.hpp.udiff.html For a future fix I would suggest creating the printer on demand. Then the -1 value wouldn't be needed anymore. I don't quite understand here. This patch does create the printer on demand. A compiler thread only creates the instance of printer when should_print(level) returns true. PrintIdealGraphLevel=-1 roles a global switch to disable IdealGraph dumping. I think in this way. JDK-8139046 can also be treated as a feature, right? Previously, we can use PrintIdealGraph as a global switch. Without it, c2 just ignore whatever you write in directives. After c2 picks up IGVPrintLevel directive automatically, hotspot loses that feature. -XX:PrintIdealGraphLevel=-1 and its synonym -XX:-PrintIdealGraph serve as the global switch. Thanks, --lx From nils.eliasson at oracle.com Thu Jun 11 21:24:27 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 11 Jun 2020 23:24:27 +0200 Subject: RFR[M]: 8151779: Some intrinsic flags could be replaced with one general flag In-Reply-To: <2982174F-DBB6-4316-93C3-1B4DFDF34C88@amazon.com> References: <19CD3956-4DC6-4908-8626-27D48A9AB4A4@amazon.com> <0EDAAC88-E5D9-424F-A19E-5E20C689C2F3@amazon.com> <801D878C-CAE5-4EBE-8AFE-4E35346CD5BD@amazon.com> <58ff5b66-1dce-d4ad-8f21-254abd1b887b@oracle.com> <65dcfd1f-5e7e-b9e1-8298-5daafcda8a81@oracle.com> <1EBE66E6-9AA7-4EC5-9B91-45F884071FAC@amazon.com> <2982174F-DBB6-4316-93C3-1B4DFDF34C88@amazon.com> Message-ID: <0365691c-5f80-9a3a-e47f-9852ef66f217@oracle.com> Hi Xin, In general I think the patch looks good. I am missing strict name checking. (I want to see an error on startup if the user has specified unknown intrinsic names.) I see that the lazy initialization of the intrinsic name tables might make it non-trivial to find a good place to do that. I am ok if you follow up on that in a future patch. Best regards, Nils Eliasson > Incremental diff: http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff > > I verified it in submit repo a week ago. I also double-check the patch still can patch to TIP and pass both hotspot:tier1 and gtest:all. > > Here is log message I got from mach-5. > Job: mach5-one-phh-JDK-8151779-1-20200513-1821-11015755 > > BuildId: 2020-05-13-1820211.hohensee.source > > No failed tests > > Tasks Summary > > EXECUTED_WITH_FAILURE: 0 > NOTHING_TO_RUN: 0 > KILLED: 0 > HARNESS_ERROR: 0 > FAILED: 0 > PASSED: 101 > UNABLE_TO_RUN: 0 > NA: 0 > > > Thanks, > --lx > > > > ?On 5/13/20, 12:03 AM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: > > Hi, Vladimir, > > > 2. add +/- UseCRC32Intrinsics to IntrinsicAvailableTest.java > > The purpose of that test is not to generate a CRC32 intrinsic. Its purpose is to check if compilers determine to intrinsify _updateCRC32 or not. > > Mathematically, "UseCRC32Intrinsics" is a set = [_updateCRC32, _updateBytesCRC32, _updateByteBufferCRC32]. > > "-XX:-UseCRC32Intrinsics" disables all 3 of them. If users use -XX:ControlIntrinsic=+_updateCRC32 and -XX:-UseCRC32Intrinsics, _updateCRC32 should be enabled individually. > > No, I think we should preserve current behavior when UseCRC32Intrinsics is off then all corresponding intrinsics are > also should be off. This is the purpose of such flags - to be able control several intrinsics with one flag. > Otherwise you have to check each individual intrinsic if CPU does not support them. Even if code for some of these > intrinsics can be generated on this CPU. We should be consistent, otherwise code can become very complex to support. > ---- > If -XX:ControlIntrinsic=+_updateBytesCRC32 can't win over -XX:-UseCRC32Intrinsics, it will come back the justification of JBS-8151779: > Why do we need to support the usage -XX:ControlIntrinsic=+_updateBytesCRC32? If a user doesn't set +updateBytesCRC32, it's still enabled. > > I read the description of "JBS-8235981" and "JBS-8151779" again. I try to understand in this way. The option 'UseCRC32Intrinsics' is the consolidation of 3 real intrinsics [_updateCRC32, _updateBytesCRC32, _updateByteBufferCRC32]. It represents some sorta hardware capabilities to make those intrinsics optimal. If UseCRC32Intrinsics is OFF, it will not make sense to intrinsify them anymore because inliner can deliver the similar result. > > Quote from JBS-8235981 "Right now, there's no way to introduce experimental intrinsics which are turned off by default and let users enable them on their side. " > Currently, once a user declares one new intrinsics in VM_INTRINSICS_DO, it's enabled. It might not be true in the future. > i.e. A develop can declare an intrinsic but mark it turn-off by default. He will try it out by -XX:ControlIntrinsic=+_myNewIntrinsic in his development stage. > > Do I catch up your intention this time? if yes, could you take a look at this new revision? I think I meet the requirement. > Webrev: http://cr.openjdk.java.net/~xliu/8151779/05/webrev/ > Incremental diff: http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff > > Here is the change log from rev04. > 1) An intrinsic is enabled if and only if neither ControlIntrinsic nor the corresponding UseXXXIntrinsics disables it. > The implementation is still in vmIntrinsics::is_disabled_by_flags(vmIntrinsics::ID id). > > 2) I introduce a compact data structure TriBoolArray. It compresses an array of Tribool. Each tribool only takes 2 bits now. > I also took Coleen's suggestion to put TriBool and TriBoolArray in a standalone file "utilities/tribool.hpp". A new gtest is attached. > > 3) Correct some typos. Thank you David pointed them out. > > Thanks, > --lx > > > On 5/12/20, 12:59 AM, "David Holmes" wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi, > > Sorry for the delay in getting back to this. > > On 5/05/2020 7:37 pm, Liu, Xin wrote: > > Hello, David and Nils > > > > Thank you to review the patch. I went to brush up my English grammar and then update my patch to rev04. > > https://cr.openjdk.java.net/~xliu/8151779/04/webrev/ > > Here is the incremental diff: https://cr.openjdk.java.net/~xliu/8151779/r3_to_r4.diff It reflect changes based on David's feedbacks. I really appreciate that you review so carefully and found so many invaluable suggestions. TBH, I don't understand Amazon's copyright header neither. I choose the simple way to dodge that problem. > > In vmSymbols.hpp > > + // 1. Disable/Control Intrinsic accept a list of intrinsic IDs. > > s/accept/accepts/ > > + // their final value are subject to hardware inspection > (VM_Version::initialize). > > s/value/values/ > > Otherwise all my nits have been addressed - thanks. > > I don't need to see a further webrev. > > Thanks, > David > ----- > > > Nils points out a very tricky question. Yes, I also notice that each TriBool takes 4 bytes on x86_64. It's a natural machine word and supposed to be the most efficient form. As a result, the vector control_words take about 1.3Kb for all intrinsics. I thought it's not a big deal, but Nils brought up that each DirectiveSet will increase from 128b to 1440b. Theoretically, the user may provide a CompileCommandFile which consists of hundreds of directives. Will hotspot have hundreds of DirectiveSet in that case? > > > > Actually, I do have a compacted container of TriBool. It's like a vector specialization. > > https://cr.openjdk.java.net/~xliu/8151779/TriBool.cpp > > > > The reason I didn't include it because I still feel that a few KiloBytes memories are not a big deal. Nowadays, hotspot allows Java programmers allocate over 100G heap. Is it wise to increase software complexity to save KBs? > > > > If you think it matters, I can integrate it. May I update TriBoolArray in a standalone JBS? I have made a lot of changes. I hope I can verify them using KitchenSink? > > > > For the second problem, I think it's because I used 'memset' to initialize an array of objects in rev01. Previously, I had code like this: > > memset(&_intrinsic_control_words[0], 0, sizeof(_intrinsic_control_words)); > > > > This kind of usage will be warned as -Werror=class-memaccess in g++-8. I have fixed it since rev02. I use DirectiveSet::fill_in(). Please check out. > > > > Thanks, > > --lx > > > > From Yang.Zhang at arm.com Fri Jun 12 05:28:12 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Fri, 12 Jun 2020 05:28:12 +0000 Subject: 8244926: Add absolute check for int/long to generate Abs nodes In-Reply-To: <8b67dcb0-d710-82fd-b99d-875096b8f4db@oracle.com> References: <8b67dcb0-d710-82fd-b99d-875096b8f4db@oracle.com> Message-ID: Hi Nils Thanks a lot for your review. Hi reviewers Any other comments? Regards, Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of Nils Eliasson Sent: Thursday, June 11, 2020 5:21 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: 8244926: Add absolute check for int/long to generate Abs nodes Hi Yang, The patch looks good. Reviewed. Best regards, Nils Eliasson On 2020-06-11 07:36, Yang Zhang wrote: > Hi, > > Ping it again. Could anyone help to review this patch? > > Regards > Yang > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Yang > Zhang > Sent: Thursday, June 4, 2020 4:28 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: nd > Subject: RFR: 8244926: Add absolute check for int/long to generate Abs > nodes > > Hi, > > May I have a review of this enhancement of absolute check for int/long? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 > Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ > > There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: > ((a < 0) ? -a : a) > ((a <= 0) ? -a : a) > ((a > 0) ? a : -a) > ((a >= 0) ? a : -a) > > Test case: > public static int absi(int a) { > return ((a < 0) ? -a : a); > } > > With c2, AbsI node is generated and matched. The following snippet is generated on x86: > 0x00007f67c8b6155b: mov %ecx,%r11d > 0x00007f67c8b6155e: sar $0x1f,%r11d > 0x00007f67c8b61562: mov %ecx,%r10d > 0x00007f67c8b61565: xor %r11d,%r10d > 0x00007f67c8b61568: sub %r11d,%r10d > > On AArch64: > 0x0000ffffa8b878e4: cmp w3, wzr > 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop > > Note: AArch64 result is based on this patch which is in review [2]. > > Test: > Full jtreg on x86 and AArch64, no new failure > > Performance: > Jmh test is uploaded. > http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java > > X86 > Before: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op > TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op > TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op > TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op > TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op > TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op > TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op > TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op > > After: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op > TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op > TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op > TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op > TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op > TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op > TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op > TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op > > AArch64: > Before: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op > TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op > TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op > TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op > TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op > TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op > TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op > TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op > > After: > Benchmark (size) Mode Cnt Score Error Units > TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op > TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op > TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op > TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op > TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op > TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op > TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op > TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/share > /opto/cfgnode.cpp#l1519 [2] > https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/0088 > 61.html > > > Regards, > Yang > From tobias.hartmann at oracle.com Fri Jun 12 12:02:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 12 Jun 2020 14:02:12 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: <55e24d91-e483-3d09-f211-6bf41aa752f5@oracle.com> Message-ID: +1 Best regards, Tobias On 11.06.20 18:27, Vladimir Kozlov wrote: > I would keep changes as they are - they provide correct testing. And treat timeouts as expected. > > Regards, > Vladimir > > On 6/11/20 12:38 AM, Christian Hagedorn wrote: >> Hi Tobias >> >> Thank you for your review! >> >> I ran some more testing with -XX:+VerifyIterativeGVN over night and compared the iterative >> solution with the old recursive version. We hit some more test timeouts with the new iterative >> solution because we are now really looking at all nodes for a requested depth. There are cases >> where the recursive solution would not do that. For example, if verify_depth = 4 and given a node >> chain 1->2->3->4->5, the recursive DFS solution visits nodes 1-4 and then at node 4 when it wants >> to visit node 5 it immediately returns because the depth 4 is reached. Node 4 is now marked as >> visited. If, however, there is an additional path 1->4->5->6->7 to look at later, the recursive >> DFS solution will just stop at node 4 because that node was already visited. The iterative >> solution, on the other hand, processes the nodes in a BFS and will visit all nodes up to depth 4, >> including node 5 and 6. This results in spending more time (as seen with more timeouts for more >> complex graphs/tests). >> >> We could now try to simulate the same recursive DFS behavior in an iterative approach with a >> stack. But as we seem to be missing some nodes at a requested depth this is probably not what we >> really want? Alternatively, we could go with this BFS solution as it is and decrement verify_depth >> = 4 in the call Node::verify(n, 4) to reduce the time spent. Or just leave webrev.01 as it is and >> treat the additional timeouts as expected. >> >> What do you think? >> >> Best regards, >> Christian >> >> On 10.06.20 14:30, Tobias Hartmann wrote: >>> Hi Christian, >>> >>> On 09.06.20 19:32, Christian Hagedorn wrote: >>>> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ >>> >>> Looks good to me. >>> >>> Did you run testing with -XX:+VerifyIterativeGVN? >>> >>> Best regards, >>> Tobias >>> From tobias.hartmann at oracle.com Fri Jun 12 12:11:15 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 12 Jun 2020 14:11:15 +0200 Subject: 8244926: Add absolute check for int/long to generate Abs nodes In-Reply-To: References: <8b67dcb0-d710-82fd-b99d-875096b8f4db@oracle.com> Message-ID: <4fdfa71e-a825-ecdd-711f-9bbfd0b8b203@oracle.com> Hi Yang, looks good to me too. Best regards, Tobias On 12.06.20 07:28, Yang Zhang wrote: > Hi Nils > Thanks a lot for your review. > > Hi reviewers > Any other comments? > > Regards, > Yang > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Nils Eliasson > Sent: Thursday, June 11, 2020 5:21 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: 8244926: Add absolute check for int/long to generate Abs nodes > > Hi Yang, > > The patch looks good. > > Reviewed. > > Best regards, > Nils Eliasson > > > On 2020-06-11 07:36, Yang Zhang wrote: >> Hi, >> >> Ping it again. Could anyone help to review this patch? >> >> Regards >> Yang >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Yang >> Zhang >> Sent: Thursday, June 4, 2020 4:28 PM >> To: hotspot-compiler-dev at openjdk.java.net >> Cc: nd >> Subject: RFR: 8244926: Add absolute check for int/long to generate Abs >> nodes >> >> Hi, >> >> May I have a review of this enhancement of absolute check for int/long? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 >> Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ >> >> There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: >> ((a < 0) ? -a : a) >> ((a <= 0) ? -a : a) >> ((a > 0) ? a : -a) >> ((a >= 0) ? a : -a) >> >> Test case: >> public static int absi(int a) { >> return ((a < 0) ? -a : a); >> } >> >> With c2, AbsI node is generated and matched. The following snippet is generated on x86: >> 0x00007f67c8b6155b: mov %ecx,%r11d >> 0x00007f67c8b6155e: sar $0x1f,%r11d >> 0x00007f67c8b61562: mov %ecx,%r10d >> 0x00007f67c8b61565: xor %r11d,%r10d >> 0x00007f67c8b61568: sub %r11d,%r10d >> >> On AArch64: >> 0x0000ffffa8b878e4: cmp w3, wzr >> 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop >> >> Note: AArch64 result is based on this patch which is in review [2]. >> >> Test: >> Full jtreg on x86 and AArch64, no new failure >> >> Performance: >> Jmh test is uploaded. >> http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java >> >> X86 >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op >> TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op >> TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op >> TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op >> TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op >> TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op >> TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op >> TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op >> TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op >> TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op >> TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op >> TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op >> TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op >> TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op >> TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op >> >> AArch64: >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op >> TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op >> TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op >> TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op >> TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op >> TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op >> TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op >> TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op >> TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op >> TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op >> TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op >> TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op >> TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op >> TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op >> TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/share >> /opto/cfgnode.cpp#l1519 [2] >> https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/0088 >> 61.html >> >> >> Regards, >> Yang >> > From nils.eliasson at oracle.com Fri Jun 12 12:35:42 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 12 Jun 2020 14:35:42 +0200 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: <1EFE6B0C-F158-4D37-89E0-C698A89E410A@amazon.com> References: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> <1EFE6B0C-F158-4D37-89E0-C698A89E410A@amazon.com> Message-ID: <4ef54e90-7a0f-065a-4fd4-2469b35dcfb3@oracle.com> On 2020-06-11 21:30, Liu, Xin wrote: > Hi, Nils, > > Thank you to look into the webrev. > > Can you change the text for -1 to "printing is disabled" - because that > is the practical difference between 0 and -1. > > Got it. here is the revision change the description. > http://cr.openjdk.java.net/~xliu/8139046/03/webrev/src/hotspot/share/opto/c2_globals.hpp.udiff.html > > For a future fix I would suggest creating the printer on demand. Then > the -1 value wouldn't be needed anymore. > > I don't quite understand here. This patch does create the printer on demand. Yes it does, I misread it. > A compiler thread only creates the instance of printer when should_print(level) returns true. PrintIdealGraphLevel=-1 roles a global switch to disable IdealGraph dumping. > > I think in this way. JDK-8139046 can also be treated as a feature, right? Previously, we can use PrintIdealGraph as a global switch. Without it, c2 just ignore whatever you write in directives. > After c2 picks up IGVPrintLevel directive automatically, hotspot loses that feature. -XX:PrintIdealGraphLevel=-1 and its synonym -XX:-PrintIdealGraph serve as the global switch. Yes, It is a good solution with minimal complexity. I don't know why anyone would like to turn PrintIdealGraph off - but who knows. Best regards, Nils > Thanks, > --lx > > From boris.ulasevich at bell-sw.com Fri Jun 12 18:10:13 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 12 Jun 2020 21:10:13 +0300 Subject: AARCH64 optimization: using TBZ instruction for bit check Message-ID: Hi all, Please review the new AARCH64 instruction selection rules. The change applies TBZ instruction for bit checks: "if ((var&16) == 16)". This makes 17% performance improvement on the benchmark and 5% on a real application. http://bugs.openjdk.java.net/browse/JDK-8247408 http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00 - from the full change I excluded far branch test is because it works a long time, and I'm not sure C2 will not change its behaviour: http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus The change was tested on jtreg in fastdebug mode: no regressions. thanks, Boris ======================================================================================== Benchmark?????????????????????????????????????????????? Mode Cnt??????? Score??? Error? Units?????????? Score???? Error TBZBenchmark.cmpAndBranch2Tbz????????????????????????? thrpt?? 25 1329060.879 ? 42.780? ops/s???? 1504990.708 ? 158.096 TBZBenchmark.cmpAndBranch2Tbz:CPI????????????????????? thrpt 5??????? 0.325 ?? 0.001?? #/op?????????? 0.410 ??? 0.001 TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses??? thrpt 5??????? 0.019 ?? 0.031?? #/op?????????? 0.018 ??? 0.025 TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads????????? thrpt 5 16.811 ?? 0.791?? #/op????????? 16.809 ??? 0.914 TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses?? thrpt 5??????? 0.016 ?? 0.017?? #/op?????????? 0.014 ??? 0.022 TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores???????? thrpt 5 16.704 ?? 0.634?? #/op????????? 16.771 ??? 0.539 TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses??? thrpt 5??????? 0.017 ?? 0.027?? #/op?????????? 0.016 ??? 0.023 TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads????????? thrpt 5 1811.848 ?? 3.552?? #/op??????? 1148.737 ??? 2.993 TBZBenchmark.cmpAndBranch2Tbz:branch-misses??????????? thrpt 5??????? 1.013 ?? 0.009?? #/op?????????? 1.011 ??? 0.018 TBZBenchmark.cmpAndBranch2Tbz:cycles?????????????????? thrpt 5 1882.193 ?? 3.799?? #/op??????? 1662.994 ??? 5.935 TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses???????? thrpt 5??????? 0.004 ?? 0.008?? #/op?????????? 0.005 ??? 0.016 TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads?????????????? thrpt 5 16.687 ?? 0.732?? #/op????????? 16.669 ??? 0.958 TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses???????? thrpt 5??????? 0.003 ?? 0.009?? #/op?????????? 0.003 ??? 0.008 TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads?????????????? thrpt 5 1586.390 ?? 2.612?? #/op??????? 1353.981 ??? 3.469 TBZBenchmark.cmpAndBranch2Tbz:instructions???????????? thrpt 5 5791.824 ? 15.362?? #/op??????? 4055.443 ?? 17.785 TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend?? thrpt 5??????? 5.279 ?? 1.968?? #/op????????? 20.459 ??? 5.258 TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend? thrpt 5 66.808 ?? 0.700?? #/op????????? 12.738 ??? 1.040 public class TBZBenchmark { ??? @Benchmark ??? public int cmpAndBranch2Tbz() { ??????? int count = 0; ??????? for (int value = 0; value < 1000; value++) { ??????????? if ((value & 32) == 32) { ??????????????? count--; ??????????? } else { ??????????????? count++; ??????????? } ??????? } ??????? return count; ??? } } From eric.caspole at oracle.com Fri Jun 12 18:24:57 2020 From: eric.caspole at oracle.com (eric.caspole at oracle.com) Date: Fri, 12 Jun 2020 14:24:57 -0400 Subject: AARCH64 optimization: using TBZ instruction for bit check In-Reply-To: References: Message-ID: Hi Boris, Could you add the JMH to your webrev under test/micro/org/openjdk/bench/? Thanks, Eric On 6/12/20 2:10 PM, Boris Ulasevich wrote: > Hi all, > > Please review the new AARCH64 instruction selection rules. > The change applies TBZ instruction for bit checks: "if ((var&16) == 16)". > This makes 17% performance improvement on the benchmark and 5% on a real > application. > > http://bugs.openjdk.java.net/browse/JDK-8247408 > http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00 > > - from the full change I excluded far branch test is because it works a > long time, and I'm not sure C2 will not change its behaviour: > http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus > > The change was tested on jtreg in fastdebug mode: no regressions. > > thanks, > Boris > > ======================================================================================== > > Benchmark?????????????????????????????????????????????? Mode Cnt > Score??? Error? Units?????????? Score???? Error > TBZBenchmark.cmpAndBranch2Tbz????????????????????????? thrpt?? 25 > 1329060.879 ? 42.780? ops/s???? 1504990.708 ? 158.096 > TBZBenchmark.cmpAndBranch2Tbz:CPI????????????????????? thrpt 5 0.325 ? > 0.001?? #/op?????????? 0.410 ??? 0.001 > TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses??? thrpt 5 0.019 ? > 0.031?? #/op?????????? 0.018 ??? 0.025 > TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads????????? thrpt 5 16.811 ? > 0.791?? #/op????????? 16.809 ??? 0.914 > TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses?? thrpt 5 0.016 ? > 0.017?? #/op?????????? 0.014 ??? 0.022 > TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores???????? thrpt 5 16.704 ? > 0.634?? #/op????????? 16.771 ??? 0.539 > TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses??? thrpt 5 0.017 ? > 0.027?? #/op?????????? 0.016 ??? 0.023 > TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads????????? thrpt 5 1811.848 > ?? 3.552?? #/op??????? 1148.737 ??? 2.993 > TBZBenchmark.cmpAndBranch2Tbz:branch-misses??????????? thrpt 5 1.013 ? > 0.009?? #/op?????????? 1.011 ??? 0.018 > TBZBenchmark.cmpAndBranch2Tbz:cycles?????????????????? thrpt 5 1882.193 > ?? 3.799?? #/op??????? 1662.994 ??? 5.935 > TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses???????? thrpt 5 0.004 ? > 0.008?? #/op?????????? 0.005 ??? 0.016 > TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads?????????????? thrpt 5 16.687 ? > 0.732?? #/op????????? 16.669 ??? 0.958 > TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses???????? thrpt 5 0.003 ? > 0.009?? #/op?????????? 0.003 ??? 0.008 > TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads?????????????? thrpt 5 1586.390 > ?? 2.612?? #/op??????? 1353.981 ??? 3.469 > TBZBenchmark.cmpAndBranch2Tbz:instructions???????????? thrpt 5 5791.824 > ? 15.362?? #/op??????? 4055.443 ?? 17.785 > TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend?? thrpt 5 5.279 ? > 1.968?? #/op????????? 20.459 ??? 5.258 > TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend? thrpt 5 66.808 ? > 0.700?? #/op????????? 12.738 ??? 1.040 > > public class TBZBenchmark { > ??? @Benchmark > ??? public int cmpAndBranch2Tbz() { > ??????? int count = 0; > ??????? for (int value = 0; value < 1000; value++) { > ??????????? if ((value & 32) == 32) { > ??????????????? count--; > ??????????? } else { > ??????????????? count++; > ??????????? } > ??????? } > ??????? return count; > ??? } > } > From xxinliu at amazon.com Fri Jun 12 18:31:08 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 12 Jun 2020 18:31:08 +0000 Subject: RFR(S): 8139046: Compiler Control: IVGPrintLevel directive should set PrintIdealGraph In-Reply-To: <4ef54e90-7a0f-065a-4fd4-2469b35dcfb3@oracle.com> References: <56AE34D1-1BA5-40FF-B7D9-23B772AF5FB1@amazon.com> <1EFE6B0C-F158-4D37-89E0-C698A89E410A@amazon.com> <4ef54e90-7a0f-065a-4fd4-2469b35dcfb3@oracle.com> Message-ID: <4D5B4C72-ACB9-4746-B056-71626045743A@amazon.com> Hi, Nils, Thanks for reviewing it. Another reason is combability. Previously, developers can use "-XX:-PrintIdealGraph" to disable the printer. I'd like to keep this behavior same. I will ask Paul to sponsor it. thanks! --lx On 6/12/20, 5:37 AM, "hotspot-compiler-dev on behalf of Nils Eliasson" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 2020-06-11 21:30, Liu, Xin wrote: > Hi, Nils, > > Thank you to look into the webrev. > > Can you change the text for -1 to "printing is disabled" - because that > is the practical difference between 0 and -1. > > Got it. here is the revision change the description. > http://cr.openjdk.java.net/~xliu/8139046/03/webrev/src/hotspot/share/opto/c2_globals.hpp.udiff.html > > For a future fix I would suggest creating the printer on demand. Then > the -1 value wouldn't be needed anymore. > > I don't quite understand here. This patch does create the printer on demand. Yes it does, I misread it. > A compiler thread only creates the instance of printer when should_print(level) returns true. PrintIdealGraphLevel=-1 roles a global switch to disable IdealGraph dumping. > > I think in this way. JDK-8139046 can also be treated as a feature, right? Previously, we can use PrintIdealGraph as a global switch. Without it, c2 just ignore whatever you write in directives. > After c2 picks up IGVPrintLevel directive automatically, hotspot loses that feature. -XX:PrintIdealGraphLevel=-1 and its synonym -XX:-PrintIdealGraph serve as the global switch. Yes, It is a good solution with minimal complexity. I don't know why anyone would like to turn PrintIdealGraph off - but who knows. Best regards, Nils > Thanks, > --lx > > From nils.eliasson at oracle.com Fri Jun 12 19:37:51 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 12 Jun 2020 21:37:51 +0200 Subject: [15] RFR(S): 8247421: [TESTBUG] ReturnBlobToWrongHeapTest.java failed allocating blob Message-ID: <01e85a84-d8ee-7098-4a7f-6b493354b4de@oracle.com> Hi, This tries to fill up one segment of the code cache with large blobs, and then fill up the rest with small code blobs. In one test run the large code blob happened to fill up the code heap precisely, leaving no room for any small code blob. That made the test fail. "CodeHeap 'non-profiled nmethods': size=11248Kb used=11248Kb max_used=11248Kb free=0Kb ?bounds [0x0000000116701000, 0x00000001171fd000, 0x00000001171fd000]" My fix allocates one small blob first, then continues on with the large blobs, and finally filling up the rest with small blobs. In that way there are always a small blob allocated. Bug: https://bugs.openjdk.java.net/browse/JDK-8247421 Webrev: http://cr.openjdk.java.net/~neliasso/8247421/webrev.01/ Please review, Nils Eliasson From vladimir.kozlov at oracle.com Fri Jun 12 20:39:32 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 12 Jun 2020 13:39:32 -0700 Subject: [15] RFR(S): 8247421: [TESTBUG] ReturnBlobToWrongHeapTest.java failed allocating blob In-Reply-To: <01e85a84-d8ee-7098-4a7f-6b493354b4de@oracle.com> References: <01e85a84-d8ee-7098-4a7f-6b493354b4de@oracle.com> Message-ID: <801b2e93-0ab0-bebf-3135-8d9040c8ef34@oracle.com> Good. Thanks, Vladimir On 6/12/20 12:37 PM, Nils Eliasson wrote: > Hi, > > This tries to fill up one segment of the code cache with large blobs, and then fill up the rest with small code blobs. > In one test run the large code blob happened to fill up the code heap precisely, leaving no room for any small code > blob. That made the test fail. > > "CodeHeap 'non-profiled nmethods': size=11248Kb used=11248Kb max_used=11248Kb free=0Kb > ?bounds [0x0000000116701000, 0x00000001171fd000, 0x00000001171fd000]" > > My fix allocates one small blob first, then continues on with the large blobs, and finally filling up the rest with > small blobs. In that way there are always a small blob allocated. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8247421 > Webrev: http://cr.openjdk.java.net/~neliasso/8247421/webrev.01/ > > Please review, > Nils Eliasson From boris.ulasevich at bell-sw.com Sat Jun 13 18:24:50 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sat, 13 Jun 2020 21:24:50 +0300 Subject: AARCH64 optimization: using TBZ instruction for bit check In-Reply-To: References: Message-ID: Hi Eric, Ok. Here is the webrev with JMH: http://cr.openjdk.java.net/~bulasevich/8247408/webrev.01 Thank you, Boris On 12.06.2020 21:24, eric.caspole at oracle.com wrote: > Hi Boris, > Could you add the JMH to your webrev under > test/micro/org/openjdk/bench/? > Thanks, > Eric > > > On 6/12/20 2:10 PM, Boris Ulasevich wrote: >> Hi all, >> >> Please review the new AARCH64 instruction selection rules. >> The change applies TBZ instruction for bit checks: "if ((var&16) == >> 16)". >> This makes 17% performance improvement on the benchmark and 5% on a >> real application. >> >> http://bugs.openjdk.java.net/browse/JDK-8247408 >> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00 >> >> - from the full change I excluded far branch test is because it works >> a long time, and I'm not sure C2 will not change its behaviour: >> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus >> >> The change was tested on jtreg in fastdebug mode: no regressions. >> >> thanks, >> Boris >> >> ======================================================================================== >> >> Benchmark?????????????????????????????????????????????? Mode Cnt >> Score??? Error? Units?????????? Score???? Error >> TBZBenchmark.cmpAndBranch2Tbz????????????????????????? thrpt 25 >> 1329060.879 ? 42.780? ops/s???? 1504990.708 ? 158.096 >> TBZBenchmark.cmpAndBranch2Tbz:CPI????????????????????? thrpt 5 0.325 >> ?? 0.001?? #/op?????????? 0.410 ??? 0.001 >> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses??? thrpt 5 0.019 >> ?? 0.031?? #/op?????????? 0.018 ??? 0.025 >> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads????????? thrpt 5 16.811 >> ? 0.791?? #/op????????? 16.809 ??? 0.914 >> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses?? thrpt 5 0.016 >> ?? 0.017?? #/op?????????? 0.014 ??? 0.022 >> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores???????? thrpt 5 16.704 >> ? 0.634?? #/op????????? 16.771 ??? 0.539 >> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses??? thrpt 5 0.017 >> ?? 0.027?? #/op?????????? 0.016 ??? 0.023 >> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads????????? thrpt 5 >> 1811.848 ?? 3.552?? #/op??????? 1148.737 ??? 2.993 >> TBZBenchmark.cmpAndBranch2Tbz:branch-misses??????????? thrpt 5 1.013 >> ?? 0.009?? #/op?????????? 1.011 ??? 0.018 >> TBZBenchmark.cmpAndBranch2Tbz:cycles?????????????????? thrpt 5 >> 1882.193 ?? 3.799?? #/op??????? 1662.994 ??? 5.935 >> TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses???????? thrpt 5 0.004 >> ?? 0.008?? #/op?????????? 0.005 ??? 0.016 >> TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads?????????????? thrpt 5 16.687 >> ? 0.732?? #/op????????? 16.669 ??? 0.958 >> TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses???????? thrpt 5 0.003 >> ?? 0.009?? #/op?????????? 0.003 ??? 0.008 >> TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads?????????????? thrpt 5 >> 1586.390 ?? 2.612?? #/op??????? 1353.981 ??? 3.469 >> TBZBenchmark.cmpAndBranch2Tbz:instructions???????????? thrpt 5 >> 5791.824 ? 15.362?? #/op??????? 4055.443 ?? 17.785 >> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend?? thrpt 5 5.279 >> ?? 1.968?? #/op????????? 20.459 ??? 5.258 >> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend? thrpt 5 66.808 >> ? 0.700?? #/op????????? 12.738 ??? 1.040 >> >> public class TBZBenchmark { >> ???? @Benchmark >> ???? public int cmpAndBranch2Tbz() { >> ???????? int count = 0; >> ???????? for (int value = 0; value < 1000; value++) { >> ???????????? if ((value & 32) == 32) { >> ???????????????? count--; >> ???????????? } else { >> ???????????????? count++; >> ???????????? } >> ???????? } >> ???????? return count; >> ???? } >> } >> From dean.long at oracle.com Sun Jun 14 04:09:14 2020 From: dean.long at oracle.com (Dean Long) Date: Sat, 13 Jun 2020 21:09:14 -0700 Subject: [15] RFR(XS) 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode Message-ID: <1e25aad9-45cd-a4d4-561c-a54168e2c2e3@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8236647 http://cr.openjdk.java.net/~dlong/8236647/webrev/ This does a similar fix for Graal/JVMCI as JDK-8234923 did for C1/C2. dl From headius at headius.com Sun Jun 14 14:04:50 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Sun, 14 Jun 2020 09:04:50 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby Message-ID: We received a bug report today showing that JRuby's bytecode-compiled methods do not appear to be inlining properly on Hotspot. https://github.com/jruby/jruby/issues/6280 A bit of background here... JRuby is itself a tiered runtime. Most Ruby methods will be interpreted in our IR form for a while until a call threshold is reached. At that point, we JIT the method into JVM bytecode and use that code from that point forward We also optionally use invokedynamic for dynamic call sites (via the -Xcompile.invokedynamic JRuby flag), which makes it possible for most method calls to inline. Because Ruby methods can be overwritten at runtime, or whole Ruby classes might be transient, most of these jitted methods are contained within their own unique JVM classes, and also within their own unique classloaders. This allows them to unload when no longer in use. This should not affect inlining one of these methods into another, and historically it has worked fine. The bug report above shows that a trivial method call fails to inline with "unloaded signature class" (according to PrintInlining), and my experiments seem to indicate this only happens when tiered compilation is enabled. DIsabling tiered compilation and using C2 alone inlines fine and we get the native code we expect. The signatures of these methods are not exotic... the only classes specified are classes critical to the operation of JRuby itself, and they would have been loaded and in use long before these inlining decisions would be made. The jitted bytecode class itself is defined and subsequently passes through various reflection APIs, so it should also be fully loaded and resolved. So we have a puzzle. Why does running this code with tiered compilation cause it to (erroneously?) claim a signature class has not been loaded? This appears to affect every OpenJDK release at least back to 8u222, the earliest version we tested. To reproduce, create the two scripts in the bug, download a JRuby distribution from jruby.org, and execute the main script like this: bin/jruby -Xcompile.invokedynamic -J-XX:+WhateverHotspotFlag main.rb PrintInlining and PrintAssembly output will show that the "bar" method fails to inline into "foo" in the inline.rb part of the example. Help! - Charlie From Pengfei.Li at arm.com Mon Jun 15 03:20:53 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Mon, 15 Jun 2020 03:20:53 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called Message-ID: Hi, Can I have a review of this C2 loop optimization fix? JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ C2 has a loop optimization phase called intrinsify_fill. It matches the pattern of single array store with an loop invariant in a counted loop, like below, and replaces it with call to some stub routine. for (int i = start; i < limit; i++) { a[i] = value; } Unfortunately, this doesn't work in current jdk after loop strip mining. The above loop is eventually unrolled and auto-vectorized by subsequent optimization phases. Root cause is that in strip-mined loops, the inner CountedLoopNode may be used by the address polling node of the safepoint in the outer loop. But as the safepoint polling has nothing related to any real operations in the loop, it should not hinder the pattern match. So in this patch, the polladr's use is ignored in the match check. We have some performance comparison of the code for array fill, between the auto-vectorized version and the stub routine version. The JMH case for the tests can be found at [1]. Results show that on x86, the stub code is even slower than the auto-vectorized code. To prevent any regression, vm option OptimizedFill is turned off for x86 in this patch. So this patch doesn't impact on the generated code on x86. On AArch64, the two versions show almost the same performance in general cases. But if the value to be filled is zero, the stub code's performance is much better. This makes sence as AArch64 uses cache maintenance instructions (DC ZVA) to zero large blocks in the hand-crafted assembly. Below are JMH scores on AArch64. Before: Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op After: Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op Another advantage of using the stub routine is that the generated code size is reduced. Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are tested and no new failure is found. -- Thanks, Pengfei From Yang.Zhang at arm.com Mon Jun 15 06:10:13 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Mon, 15 Jun 2020 06:10:13 +0000 Subject: 8244926: Add absolute check for int/long to generate Abs nodes In-Reply-To: <4fdfa71e-a825-ecdd-711f-9bbfd0b8b203@oracle.com> References: <8b67dcb0-d710-82fd-b99d-875096b8f4db@oracle.com> <4fdfa71e-a825-ecdd-711f-9bbfd0b8b203@oracle.com> Message-ID: Hi Tobias Thanks a lot for your review. Pushed. Regards Yang -----Original Message----- From: Tobias Hartmann Sent: Friday, June 12, 2020 8:11 PM To: Yang Zhang ; Nils Eliasson ; hotspot-compiler-dev at openjdk.java.net Subject: Re: 8244926: Add absolute check for int/long to generate Abs nodes Hi Yang, looks good to me too. Best regards, Tobias On 12.06.20 07:28, Yang Zhang wrote: > Hi Nils > Thanks a lot for your review. > > Hi reviewers > Any other comments? > > Regards, > Yang > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Nils > Eliasson > Sent: Thursday, June 11, 2020 5:21 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: 8244926: Add absolute check for int/long to generate Abs > nodes > > Hi Yang, > > The patch looks good. > > Reviewed. > > Best regards, > Nils Eliasson > > > On 2020-06-11 07:36, Yang Zhang wrote: >> Hi, >> >> Ping it again. Could anyone help to review this patch? >> >> Regards >> Yang >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Yang >> Zhang >> Sent: Thursday, June 4, 2020 4:28 PM >> To: hotspot-compiler-dev at openjdk.java.net >> Cc: nd >> Subject: RFR: 8244926: Add absolute check for int/long to generate >> Abs nodes >> >> Hi, >> >> May I have a review of this enhancement of absolute check for int/long? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8244926 >> Webrev: http://cr.openjdk.java.net/~yzhang/8244926/webrev.00/ >> >> There is absolute value check for float/double already [1]. In this patch, absolute value check for integer/long is added. The following patterns can be matched to AbsI/L nodes: >> ((a < 0) ? -a : a) >> ((a <= 0) ? -a : a) >> ((a > 0) ? a : -a) >> ((a >= 0) ? a : -a) >> >> Test case: >> public static int absi(int a) { >> return ((a < 0) ? -a : a); >> } >> >> With c2, AbsI node is generated and matched. The following snippet is generated on x86: >> 0x00007f67c8b6155b: mov %ecx,%r11d >> 0x00007f67c8b6155e: sar $0x1f,%r11d >> 0x00007f67c8b61562: mov %ecx,%r10d >> 0x00007f67c8b61565: xor %r11d,%r10d >> 0x00007f67c8b61568: sub %r11d,%r10d >> >> On AArch64: >> 0x0000ffffa8b878e4: cmp w3, wzr >> 0x0000ffffa8b878e8: cneg w17, w3, lt // lt = tstop >> >> Note: AArch64 result is based on this patch which is in review [2]. >> >> Test: >> Full jtreg on x86 and AArch64, no new failure >> >> Performance: >> Jmh test is uploaded. >> http://cr.openjdk.java.net/~yzhang/8244926/TestScalar.java >> >> X86 >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 2648.235 ? 0.810 us/op >> TestScalar.testAbsI2 1024 avgt 25 2647.702 ? 0.431 us/op >> TestScalar.testAbsI3 1024 avgt 25 2647.605 ? 0.346 us/op >> TestScalar.testAbsI4 1024 avgt 25 2647.574 ? 0.651 us/op >> TestScalar.testAbsL1 1024 avgt 25 3165.787 ? 0.976 us/op >> TestScalar.testAbsL2 1024 avgt 25 3166.582 ? 2.217 us/op >> TestScalar.testAbsL3 1024 avgt 25 3168.097 ? 4.071 us/op >> TestScalar.testAbsL4 1024 avgt 25 3167.222 ? 2.573 us/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 2264.637 ? 1.164 us/op >> TestScalar.testAbsI2 1024 avgt 25 2264.318 ? 0.427 us/op >> TestScalar.testAbsI3 1024 avgt 25 2264.998 ? 0.903 us/op >> TestScalar.testAbsI4 1024 avgt 25 2264.602 ? 0.625 us/op >> TestScalar.testAbsL1 1024 avgt 25 2376.513 ? 0.345 us/op >> TestScalar.testAbsL2 1024 avgt 25 2376.681 ? 0.565 us/op >> TestScalar.testAbsL3 1024 avgt 25 2377.012 ? 0.643 us/op >> TestScalar.testAbsL4 1024 avgt 25 2376.921 ? 0.699 us/op >> >> AArch64: >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 1858.831 ? 1.249 us/op >> TestScalar.testAbsI2 1024 avgt 25 1860.248 ? 1.365 us/op >> TestScalar.testAbsI3 1024 avgt 25 1859.571 ? 1.177 us/op >> TestScalar.testAbsI4 1024 avgt 25 1859.970 ? 0.882 us/op >> TestScalar.testAbsL1 1024 avgt 25 1871.520 ? 2.592 us/op >> TestScalar.testAbsL2 1024 avgt 25 1872.728 ? 2.301 us/op >> TestScalar.testAbsL3 1024 avgt 25 1872.852 ? 2.455 us/op >> TestScalar.testAbsL4 1024 avgt 25 1872.720 ? 2.652 us/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> TestScalar.testAbsI1 1024 avgt 25 1422.781 ? 1.788 us/op >> TestScalar.testAbsI2 1024 avgt 25 1423.778 ? 2.612 us/op >> TestScalar.testAbsI3 1024 avgt 25 1424.327 ? 2.065 us/op >> TestScalar.testAbsI4 1024 avgt 25 1423.269 ? 1.437 us/op >> TestScalar.testAbsL1 1024 avgt 25 1434.279 ? 2.312 us/op >> TestScalar.testAbsL2 1024 avgt 25 1433.900 ? 2.341 us/op >> TestScalar.testAbsL3 1024 avgt 25 1435.967 ? 2.270 us/op >> TestScalar.testAbsL4 1024 avgt 25 1437.495 ? 0.957 us/op >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/dd652a1b2a39/src/hotspot/shar >> e >> /opto/cfgnode.cpp#l1519 [2] >> https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008 >> 8 >> 61.html >> >> >> Regards, >> Yang >> > From christian.hagedorn at oracle.com Mon Jun 15 07:44:43 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 15 Jun 2020 09:44:43 +0200 Subject: [15] RFR(S): 8246203: Segmentation fault in verification due to stack overflow with -XX:+VerifyIterativeGVN In-Reply-To: References: <55e24d91-e483-3d09-f211-6bf41aa752f5@oracle.com> Message-ID: <4913a52f-37f3-7ccd-4641-abb23fc65d68@oracle.com> Thank you Vladimir and Tobias for having a look at it again and sharing your thoughts! Then I'll push webrev.01 into JDK 16 (as it's a P4 only). Best regards, Christian On 12.06.20 14:02, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 11.06.20 18:27, Vladimir Kozlov wrote: >> I would keep changes as they are - they provide correct testing. And treat timeouts as expected. >> >> Regards, >> Vladimir >> >> On 6/11/20 12:38 AM, Christian Hagedorn wrote: >>> Hi Tobias >>> >>> Thank you for your review! >>> >>> I ran some more testing with -XX:+VerifyIterativeGVN over night and compared the iterative >>> solution with the old recursive version. We hit some more test timeouts with the new iterative >>> solution because we are now really looking at all nodes for a requested depth. There are cases >>> where the recursive solution would not do that. For example, if verify_depth = 4 and given a node >>> chain 1->2->3->4->5, the recursive DFS solution visits nodes 1-4 and then at node 4 when it wants >>> to visit node 5 it immediately returns because the depth 4 is reached. Node 4 is now marked as >>> visited. If, however, there is an additional path 1->4->5->6->7 to look at later, the recursive >>> DFS solution will just stop at node 4 because that node was already visited. The iterative >>> solution, on the other hand, processes the nodes in a BFS and will visit all nodes up to depth 4, >>> including node 5 and 6. This results in spending more time (as seen with more timeouts for more >>> complex graphs/tests). >>> >>> We could now try to simulate the same recursive DFS behavior in an iterative approach with a >>> stack. But as we seem to be missing some nodes at a requested depth this is probably not what we >>> really want? Alternatively, we could go with this BFS solution as it is and decrement verify_depth >>> = 4 in the call Node::verify(n, 4) to reduce the time spent. Or just leave webrev.01 as it is and >>> treat the additional timeouts as expected. >>> >>> What do you think? >>> >>> Best regards, >>> Christian >>> >>> On 10.06.20 14:30, Tobias Hartmann wrote: >>>> Hi Christian, >>>> >>>> On 09.06.20 19:32, Christian Hagedorn wrote: >>>>> http://cr.openjdk.java.net/~chagedorn/8246203/webrev.01/ >>>> >>>> Looks good to me. >>>> >>>> Did you run testing with -XX:+VerifyIterativeGVN? >>>> >>>> Best regards, >>>> Tobias >>>> From felix.yang at huawei.com Mon Jun 15 08:49:35 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 15 Jun 2020 08:49:35 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Tuesday, May 26, 2020 10:24 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > thanks for the details, makes sense to me. > > Isn't the root cause that we are loosing type information and wouldn't that > be solved by selecting the Phi with the more restrictive _adr_type? Sorry for the late reply. Not sure if I understand correctly. Do you mean something like this? diff -r 6ab7805df10d src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp Sat Jun 13 01:00:00 2020 +0200 +++ b/src/hotspot/share/opto/memnode.cpp Mon Jun 15 16:40:57 2020 +0800 @@ -4618,7 +4618,8 @@ } if (phi_mem != NULL) { // equivalent phi nodes; revert to the def - new_mem = new_base; + if (phi_base->adr_type()->higher_equal(phi_mem->adr_type())) + new_mem = new_base; } } } Thanks, Felix From felix.yang at huawei.com Mon Jun 15 09:13:30 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 15 Jun 2020 09:13:30 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: Hi Nils, CCing to Tobias. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Nils Eliasson > Sent: Wednesday, June 3, 2020 9:55 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi, > > I second Tobias. How often is this code path triggered? Would this be > removing an optimization that is important is some cases? I see the logic is also triggered for non-OSR compiles with the reduced test case. But it didn't trigger a bug in that case even through the logic didn't do the right thing. My initial patch removes the logic and no performance impact witnessed for specjbb2017. Since the code is there from day one, maybe it's hard to find out. > If this is extremely rare - I would be ok to remove it, otherwise we should > evaluate Tobias suggestion. Thanks, Felix From aph at redhat.com Mon Jun 15 09:17:07 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 15 Jun 2020 10:17:07 +0100 Subject: RFR(S): 8247200: [aarch64] assert((unsigned)fpargs < 32) In-Reply-To: References: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> Message-ID: <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> On 11/06/2020 17:56, Vladimir Kozlov wrote: > On 6/11/20 1:24 AM, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ >> >> >> Removing assert and some associated dead code. Looks good. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Jun 15 09:28:59 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 15 Jun 2020 10:28:59 +0100 Subject: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check In-Reply-To: References: Message-ID: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> On 12/06/2020 19:10, Boris Ulasevich wrote: > Please review the new AARCH64 instruction selection rules. > The change applies TBZ instruction for bit checks: "if ((var&16) == 16)". > This makes 17% performance improvement on the benchmark and 5% on a real > application. Please forgive me if I am misunderstanding, but... This is strange Java for anyone to write. The expression "((var&16) == 16)" is, I think, equivalent to "((var&16) != 0)". Do you believe that it is wise to add new patterns to do this to (potentially) every HotSpot back end rather than canonicalize the expression during the machine-independent part of C2? This would have the same improvement on all targets. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patric.hedlin at oracle.com Mon Jun 15 10:22:08 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 15 Jun 2020 12:22:08 +0200 Subject: RFR(S): 8247200: [aarch64] assert((unsigned)fpargs < 32) In-Reply-To: <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> References: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> Message-ID: Thanks for reviewing Andrew. /Patric On 2020-06-15 11:17, Andrew Haley wrote: > On 11/06/2020 17:56, Vladimir Kozlov wrote: >> On 6/11/20 1:24 AM, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ >>> >>> >>> Removing assert and some associated dead code. > Looks good. > From tobias.hartmann at oracle.com Mon Jun 15 11:22:45 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Jun 2020 13:22:45 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization Message-ID: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8237950 http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization which emits direct stores into the String internal byte array. GraphKit::array_element_address emits ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit the node limit because during GVN, dead nodes are not removed. I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for the string concat optimizations because "normal" array accesses have a range check dependent CastII which blocks that ConvI2L optimization during parsing. Thanks, Tobias From nils.eliasson at oracle.com Mon Jun 15 13:31:46 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 15 Jun 2020 15:31:46 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> Message-ID: <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> Hi Tobias, The change looks reasonable. Reviewed. Best regards, Nils On 2020-06-15 13:22, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8237950 > http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ > > A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization > which emits direct stores into the String internal byte array. GraphKit::array_element_address emits > ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent > CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and > creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit > the node limit because during GVN, dead nodes are not removed. > > I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for > the string concat optimizations because "normal" array accesses have a range check dependent CastII > which blocks that ConvI2L optimization during parsing. > > Thanks, > Tobias From tobias.hartmann at oracle.com Mon Jun 15 13:40:07 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Jun 2020 15:40:07 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> Message-ID: <3d6d0440-82fd-7133-bc27-ee10a3f3df03@oracle.com> Hi Nils, thanks for the review! Best regards, Tobias On 15.06.20 15:31, Nils Eliasson wrote: > Hi Tobias, > > The change looks reasonable. > > Reviewed. > > Best regards, > Nils > > On 2020-06-15 13:22, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8237950 >> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >> >> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >> which emits direct stores into the String internal byte array. GraphKit::array_element_address emits >> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and >> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit >> the node limit because during GVN, dead nodes are not removed. >> >> I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for >> the string concat optimizations because "normal" array accesses have a range check dependent CastII >> which blocks that ConvI2L optimization during parsing. >> >> Thanks, >> Tobias > From vladimir.kozlov at oracle.com Mon Jun 15 18:02:37 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Jun 2020 11:02:37 -0700 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> Message-ID: +1 I would suggest to do our regular performance testing to make sure there is no regression. Thanks, Vladimir On 6/15/20 6:31 AM, Nils Eliasson wrote: > Hi Tobias, > > The change looks reasonable. > > Reviewed. > > Best regards, > Nils > > On 2020-06-15 13:22, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8237950 >> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >> >> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >> which emits direct stores into the String internal byte array. GraphKit::array_element_address emits >> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and >> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit >> the node limit because during GVN, dead nodes are not removed. >> >> I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for >> the string concat optimizations because "normal" array accesses have a range check dependent CastII >> which blocks that ConvI2L optimization during parsing. >> >> Thanks, >> Tobias > From vladimir.kozlov at oracle.com Mon Jun 15 18:15:35 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Jun 2020 11:15:35 -0700 Subject: [15] RFR(XS) 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode In-Reply-To: <1e25aad9-45cd-a4d4-561c-a54168e2c2e3@oracle.com> References: <1e25aad9-45cd-a4d4-561c-a54168e2c2e3@oracle.com> Message-ID: Seems good. Thanks, Vladimir On 6/13/20 9:09 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8236647 > http://cr.openjdk.java.net/~dlong/8236647/webrev/ > > This does a similar fix for Graal/JVMCI as JDK-8234923 did for C1/C2. > > dl From vladimir.kozlov at oracle.com Mon Jun 15 18:41:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Jun 2020 11:41:40 -0700 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> Hi, It would be interesting to see difference with strip-mining off. With strip-mining only part of iterations are replaced with stub. I don't see referenced link [1] in e-mail. Are these performance data for Aarch64? What x86 CPU you tested on? (avx512?) Please, add both x64 and Aarch64 perf data to RFE. What size of arrays you tested. Few years ago OptimizedFill wins over vectorized loops but CPU and vectorization are improved since then. May be we can deprecate this code if it does not have performance benefits. Or we should revisit stub's code for modern CPUs. We need more data. Thanks, Vladimir On 6/14/20 8:20 PM, Pengfei Li wrote: > Hi, > > Can I have a review of this C2 loop optimization fix? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ > > C2 has a loop optimization phase called intrinsify_fill. It matches the > pattern of single array store with an loop invariant in a counted loop, > like below, and replaces it with call to some stub routine. > > for (int i = start; i < limit; i++) { > a[i] = value; > } > > Unfortunately, this doesn't work in current jdk after loop strip mining. > The above loop is eventually unrolled and auto-vectorized by subsequent > optimization phases. Root cause is that in strip-mined loops, the inner > CountedLoopNode may be used by the address polling node of the safepoint > in the outer loop. But as the safepoint polling has nothing related to > any real operations in the loop, it should not hinder the pattern match. > So in this patch, the polladr's use is ignored in the match check. > > We have some performance comparison of the code for array fill, between > the auto-vectorized version and the stub routine version. The JMH case > for the tests can be found at [1]. Results show that on x86, the stub > code is even slower than the auto-vectorized code. To prevent any > regression, vm option OptimizedFill is turned off for x86 in this patch. > So this patch doesn't impact on the generated code on x86. On AArch64, > the two versions show almost the same performance in general cases. But > if the value to be filled is zero, the stub code's performance is much > better. This makes sence as AArch64 uses cache maintenance instructions > (DC ZVA) to zero large blocks in the hand-crafted assembly. Below are > JMH scores on AArch64. > > Before: > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op > TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op > TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op > TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op > TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op > TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op > > After: > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op > TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op > TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op > TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op > TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op > TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op > > Another advantage of using the stub routine is that the generated code > size is reduced. > > Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are > tested and no new failure is found. > > -- > Thanks, > Pengfei > From eric.caspole at oracle.com Mon Jun 15 20:53:33 2020 From: eric.caspole at oracle.com (eric.caspole at oracle.com) Date: Mon, 15 Jun 2020 16:53:33 -0400 Subject: AARCH64 optimization: using TBZ instruction for bit check In-Reply-To: References: Message-ID: <1299b226-1157-1d23-47fe-537b80b31e4e@oracle.com> Thanks, the JMH looks good. Eric On 6/13/20 2:24 PM, Boris Ulasevich wrote: > Hi Eric, > > Ok. Here is the webrev with JMH: > http://cr.openjdk.java.net/~bulasevich/8247408/webrev.01 > > Thank you, > Boris > > On 12.06.2020 21:24, eric.caspole at oracle.com wrote: >> Hi Boris, >> Could you add the JMH to your webrev under >> test/micro/org/openjdk/bench/? >> Thanks, >> Eric >> >> >> On 6/12/20 2:10 PM, Boris Ulasevich wrote: >>> Hi all, >>> >>> Please review the new AARCH64 instruction selection rules. >>> The change applies TBZ instruction for bit checks: "if ((var&16) == >>> 16)". >>> This makes 17% performance improvement on the benchmark and 5% on a >>> real application. >>> >>> http://bugs.openjdk.java.net/browse/JDK-8247408 >>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00 >>> >>> - from the full change I excluded far branch test is because it works >>> a long time, and I'm not sure C2 will not change its behaviour: >>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.00.plus >>> >>> The change was tested on jtreg in fastdebug mode: no regressions. >>> >>> thanks, >>> Boris >>> >>> ======================================================================================== >>> >>> Benchmark?????????????????????????????????????????????? Mode Cnt >>> Score??? Error? Units?????????? Score???? Error >>> TBZBenchmark.cmpAndBranch2Tbz????????????????????????? thrpt 25 >>> 1329060.879 ? 42.780? ops/s???? 1504990.708 ? 158.096 >>> TBZBenchmark.cmpAndBranch2Tbz:CPI????????????????????? thrpt 5 0.325 >>> ?? 0.001?? #/op?????????? 0.410 ??? 0.001 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-load-misses??? thrpt 5 0.019 >>> ?? 0.031?? #/op?????????? 0.018 ??? 0.025 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-loads????????? thrpt 5 16.811 >>> ? 0.791?? #/op????????? 16.809 ??? 0.914 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-store-misses?? thrpt 5 0.016 >>> ?? 0.017?? #/op?????????? 0.014 ??? 0.022 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-dcache-stores???????? thrpt 5 16.704 >>> ? 0.634?? #/op????????? 16.771 ??? 0.539 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-load-misses??? thrpt 5 0.017 >>> ?? 0.027?? #/op?????????? 0.016 ??? 0.023 >>> TBZBenchmark.cmpAndBranch2Tbz:L1-icache-loads????????? thrpt 5 >>> 1811.848 ?? 3.552?? #/op??????? 1148.737 ??? 2.993 >>> TBZBenchmark.cmpAndBranch2Tbz:branch-misses??????????? thrpt 5 1.013 >>> ?? 0.009?? #/op?????????? 1.011 ??? 0.018 >>> TBZBenchmark.cmpAndBranch2Tbz:cycles?????????????????? thrpt 5 >>> 1882.193 ?? 3.799?? #/op??????? 1662.994 ??? 5.935 >>> TBZBenchmark.cmpAndBranch2Tbz:dTLB-load-misses???????? thrpt 5 0.004 >>> ?? 0.008?? #/op?????????? 0.005 ??? 0.016 >>> TBZBenchmark.cmpAndBranch2Tbz:dTLB-loads?????????????? thrpt 5 16.687 >>> ? 0.732?? #/op????????? 16.669 ??? 0.958 >>> TBZBenchmark.cmpAndBranch2Tbz:iTLB-load-misses???????? thrpt 5 0.003 >>> ?? 0.009?? #/op?????????? 0.003 ??? 0.008 >>> TBZBenchmark.cmpAndBranch2Tbz:iTLB-loads?????????????? thrpt 5 >>> 1586.390 ?? 2.612?? #/op??????? 1353.981 ??? 3.469 >>> TBZBenchmark.cmpAndBranch2Tbz:instructions???????????? thrpt 5 >>> 5791.824 ? 15.362?? #/op??????? 4055.443 ?? 17.785 >>> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-backend?? thrpt 5 5.279 >>> ?? 1.968?? #/op????????? 20.459 ??? 5.258 >>> TBZBenchmark.cmpAndBranch2Tbz:stalled-cycles-frontend? thrpt 5 66.808 >>> ? 0.700?? #/op????????? 12.738 ??? 1.040 >>> >>> public class TBZBenchmark { >>> ???? @Benchmark >>> ???? public int cmpAndBranch2Tbz() { >>> ???????? int count = 0; >>> ???????? for (int value = 0; value < 1000; value++) { >>> ???????????? if ((value & 32) == 32) { >>> ???????????????? count--; >>> ???????????? } else { >>> ???????????????? count++; >>> ???????????? } >>> ???????? } >>> ???????? return count; >>> ???? } >>> } >>> > From claes.redestad at oracle.com Mon Jun 15 21:23:24 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 15 Jun 2020 23:23:24 +0200 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: Message-ID: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> Hi, I added a little debug logging[1] and ran your reproducer which indicate several classes/methods have what the VM considers to be un-initialized classes, and the method in particular doesn't seem to be finding java/lang/String: [2.197s][debug][unload,resolve] has_unloaded_classes_in_signature: could not resolve klass java/lang/String in method path.to.my.script.dir.jruby_minus_9_dot_2_dot_11_dot_1.inline.RUBY$method$bar$0(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/parser/StaticScope;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;Lorg/jruby/RubyModule;Ljava/lang/String;)Lorg/jruby/runtime/builtin/IRubyObject; This is obviously pretty weird, but looks eerily similar to https://bugs.openjdk.java.net/browse/JDK-8000263 - root issue there was use of Unsafe.getObject in a way that hasn't ensured proper initialization of fields in a way that sets up class loader constraints. If so, a possible workaround might be to pass the generated class through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on 15+) /Claes [1] diff -r a3905846b9ae src/hotspot/share/oops/method.cpp --- a/src/hotspot/share/oops/method.cpp Mon Jun 15 15:56:16 2020 +0200 +++ b/src/hotspot/share/oops/method.cpp Mon Jun 15 23:11:14 2020 +0200 @@ -1717,7 +1717,13 @@ // unloaded array component types. Klass* klass = ss.as_klass_if_loaded(THREAD); assert(!HAS_PENDING_EXCEPTION, "as_klass_if_loaded contract"); - if (klass == NULL) return true; + if (klass == NULL) { + ss.as_symbol() + log_debug(unload, resolve)("has_unloaded_classes_in_signature: could not resolve klass %s in method %s", + ss.as_symbol()->as_C_string(), + m()->name_and_sig_as_C_string()); + return true; + } } } return false; On 2020-06-14 16:04, Charles Oliver Nutter wrote: > We received a bug report today showing that JRuby's bytecode-compiled > methods do not appear to be inlining properly on Hotspot. > > https://github.com/jruby/jruby/issues/6280 > > A bit of background here... > > JRuby is itself a tiered runtime. Most Ruby methods will be > interpreted in our IR form for a while until a call threshold is > reached. At that point, we JIT the method into JVM bytecode and use > that code from that point forward > > We also optionally use invokedynamic for dynamic call sites (via the > -Xcompile.invokedynamic JRuby flag), which makes it possible for most > method calls to inline. > > Because Ruby methods can be overwritten at runtime, or whole Ruby > classes might be transient, most of these jitted methods are contained > within their own unique JVM classes, and also within their own unique > classloaders. This allows them to unload when no longer in use. > > This should not affect inlining one of these methods into another, and > historically it has worked fine. > > The bug report above shows that a trivial method call fails to inline > with "unloaded signature class" (according to PrintInlining), and my > experiments seem to indicate this only happens when tiered compilation > is enabled. DIsabling tiered compilation and using C2 alone inlines > fine and we get the native code we expect. > > The signatures of these methods are not exotic... the only classes > specified are classes critical to the operation of JRuby itself, and > they would have been loaded and in use long before these inlining > decisions would be made. The jitted bytecode class itself is defined > and subsequently passes through various reflection APIs, so it should > also be fully loaded and resolved. > > So we have a puzzle. Why does running this code with tiered > compilation cause it to (erroneously?) claim a signature class has not > been loaded? > > This appears to affect every OpenJDK release at least back to 8u222, > the earliest version we tested. > > To reproduce, create the two scripts in the bug, download a JRuby > distribution from jruby.org, and execute the main script like this: > > bin/jruby -Xcompile.invokedynamic -J-XX:+WhateverHotspotFlag main.rb > > PrintInlining and PrintAssembly output will show that the "bar" method > fails to inline into "foo" in the inline.rb part of the example. > > Help! > > - Charlie > From headius at headius.com Mon Jun 15 21:38:29 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 15 Jun 2020 16:38:29 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> References: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> Message-ID: Charlie Gracie figured out a nice Hotspot incantation to reproduce 100% and dump just the PriintInlining graph in question. He also managed this with tiered compilation *turned off*, so that may have been a red herring. jruby \ -Xcompile.invokedynamic \ "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \ "-J-XX:CompileCommand=compileonly,*::*foo*" \ "-J-XX:-TieredCompilation" \ main.rb On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad wrote: > If so, a possible workaround might be to pass the generated class > through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on > 15+) I added Unsafe.ensureClassInitialized right after the JIT class has been defined, and it did not appear to help. I tried turning off JRuby's background JIT threads, which could cause a method to get jitted and loaded twice (into separate classloaders). The JRuby flag is "-Xjit.background=false" but it also did not help. - Charlie From luhenry at microsoft.com Mon Jun 15 23:58:56 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 15 Jun 2020 23:58:56 +0000 Subject: RFR(S): Remove some dead code in C2 Message-ID: Hi, As I was exploring code in src/hotspot/share/opto/doCall.cpp, I noticed the `delayed_forbidden` parameter to `Compile::call_generator` never to be used. >From doing some mercurial archeology, the change was introduced with https://hg.openjdk.java.net/jdk/jdk/rev/823590505eb4, and later modified with https://hg.openjdk.java.net/jdk/jdk/rev/e1685e30beca which removed all uses of `delayed_forbidden`. As I am not an author, please find following the patch removing the dead code. Thank you, -- Ludovic diff --git a/src/hotspot/share/opto/callGenerator.cpp b/src/hotspot/share/opto/callGenerator.cpp index 1092d582184..606325d4dd6 100644 --- a/src/hotspot/share/opto/callGenerator.cpp +++ b/src/hotspot/share/opto/callGenerator.cpp @@ -821,13 +821,13 @@ JVMState* PredictedCallGenerator::generate(JVMState* jvms) { } -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, ciMethod* caller, ciMethod* callee) { assert(callee->is_method_handle_intrinsic(), "for_method_handle_call mismatch"); bool input_not_const; CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, caller, callee, input_not_const); Compile* C = Compile::current(); if (cg != NULL) { - if (!delayed_forbidden && AlwaysIncrementalInline) { + if (AlwaysIncrementalInline) { return CallGenerator::for_late_inline(callee, cg); } else { return cg; diff --git a/src/hotspot/share/opto/callGenerator.hpp b/src/hotspot/share/opto/callGenerator.hpp index 46bf9f5d7f9..3a04fa4d5cd 100644 --- a/src/hotspot/share/opto/callGenerator.hpp +++ b/src/hotspot/share/opto/callGenerator.hpp @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { static CallGenerator* for_direct_call(ciMethod* m, bool separate_io_projs = false); // static, special static CallGenerator* for_virtual_call(ciMethod* m, int vtable_index); // virtual, interface - static CallGenerator* for_method_handle_call( JVMState* jvms, ciMethod* caller, ciMethod* callee, bool delayed_forbidden); + static CallGenerator* for_method_handle_call( JVMState* jvms, ciMethod* caller, ciMethod* callee); static CallGenerator* for_method_handle_inline(JVMState* jvms, ciMethod* caller, ciMethod* callee, bool& input_not_const); // How to generate a replace a direct call with an inline version diff --git a/src/hotspot/share/opto/compile.hpp b/src/hotspot/share/opto/compile.hpp index a922a905707..bd0def2997c 100644 --- a/src/hotspot/share/opto/compile.hpp +++ b/src/hotspot/share/opto/compile.hpp @@ -854,7 +854,7 @@ class Compile : public Phase { // The profile factor is a discount to apply to this site's interp. profile. CallGenerator* call_generator(ciMethod* call_method, int vtable_index, bool call_does_dispatch, JVMState* jvms, bool allow_inline, float profile_factor, ciKlass* speculative_receiver_type = NULL, - bool allow_intrinsics = true, bool delayed_forbidden = false); + bool allow_intrinsics = true); bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { return should_delay_string_inlining(call_method, jvms) || should_delay_boxing_inlining(call_method, jvms); diff --git a/src/hotspot/share/opto/doCall.cpp b/src/hotspot/share/opto/doCall.cpp index c26dc4b682d..c4d55d0d4c4 100644 --- a/src/hotspot/share/opto/doCall.cpp +++ b/src/hotspot/share/opto/doCall.cpp @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod *method, int depth, int bci, ciMeth CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool call_does_dispatch, JVMState* jvms, bool allow_inline, float prof_factor, ciKlass* speculative_receiver_type, - bool allow_intrinsics, bool delayed_forbidden) { + bool allow_intrinsics) { ciMethod* caller = jvms->method(); int bci = jvms->bci(); Bytecodes::Code bytecode = caller->java_code_at_bci(bci); @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool // MethodHandle.invoke* are native methods which obviously don't // have bytecodes and so normal inlining fails. if (callee->is_method_handle_intrinsic()) { - CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee, delayed_forbidden); - assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); + CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee); + assert(cg == NULL || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); return cg; } @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool // opportunity to perform some high level optimizations // first. if (should_delay_string_inlining(callee, jvms)) { - assert(!delayed_forbidden, "strange"); return CallGenerator::for_string_late_inline(callee, cg); } else if (should_delay_boxing_inlining(callee, jvms)) { - assert(!delayed_forbidden, "strange"); return CallGenerator::for_boxing_late_inline(callee, cg); - } else if ((should_delay || AlwaysIncrementalInline) && !delayed_forbidden) { + } else if ((should_delay || AlwaysIncrementalInline)) { return CallGenerator::for_late_inline(callee, cg); } } From dean.long at oracle.com Tue Jun 16 00:14:32 2020 From: dean.long at oracle.com (Dean Long) Date: Mon, 15 Jun 2020 17:14:32 -0700 Subject: [15] RFR(XS) 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode In-Reply-To: References: <1e25aad9-45cd-a4d4-561c-a54168e2c2e3@oracle.com> Message-ID: Thanks Vladimir. dl On 6/15/20 11:15 AM, Vladimir Kozlov wrote: > Seems good. > > Thanks, > Vladimir > > On 6/13/20 9:09 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8236647 >> http://cr.openjdk.java.net/~dlong/8236647/webrev/ >> >> This does a similar fix for Graal/JVMCI as JDK-8234923 did for C1/C2. >> >> dl From Pengfei.Li at arm.com Tue Jun 16 06:18:36 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Tue, 16 Jun 2020 06:18:36 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> References: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> Message-ID: Hi Vladimir, Thanks for looking at this. > I don't see referenced link [1] in e-mail. Sorry I forgot to paste my JMH url. [1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java > Are these performance data for Aarch64? Yes. I didn't paste the x86 result since there's no difference after my patch. But if I turn OptimizeFill on manully there's a regression on x86. (see below) Before (x86) Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 1793.206 ? 15.337 ns/op TestArrayFill.fillIntArray avgt 25 6679.491 ? 14.729 ns/op TestArrayFill.fillShortArray avgt 25 3412.708 ? 12.005 ns/op TestArrayFill.zeroByteArray avgt 25 1785.940 ? 15.174 ns/op TestArrayFill.zeroIntArray avgt 25 6666.709 ? 11.735 ns/op TestArrayFill.zeroShortArray avgt 25 3404.146 ? 23.045 ns/op After (x86) Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 2281.374 ? 191.220 ns/op TestArrayFill.fillIntArray avgt 25 9009.679 ? 901.541 ns/op TestArrayFill.fillShortArray avgt 25 4828.686 ? 49.199 ns/op TestArrayFill.zeroByteArray avgt 25 2463.745 ? 47.640 ns/op TestArrayFill.zeroIntArray avgt 25 9062.682 ? 939.538 ns/op TestArrayFill.zeroShortArray avgt 25 4837.231 ? 50.026 ns/op > What x86 CPU you tested on? (avx512?) The results above are produced on Intel? Xeon? Gold 6152 but UseAVX=2 by default in latest JDK master. > What size of arrays you tested. It's 65536 in my test, see [1]. > Few years ago OptimizedFill wins over vectorized loops but CPU and > vectorization are improved since then. May be we can deprecate this code if > it does not have performance benefits. Or we should revisit stub's code for > modern CPUs. I think it's still valuable since it does have performance benefit on AArch64 if the value to be filled is zero. See this part of TestArrayFill.zero* cases. Before (AArch64) Benchmark Mode Cnt Score Error Units TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op After (AArch64) Benchmark Mode Cnt Score Error Units TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op -- Thanks, Pengfei From Pengfei.Li at arm.com Tue Jun 16 06:24:55 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Tue, 16 Jun 2020 06:24:55 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: Sorry I forgot to paste below JMH link in my last email. [1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java BTW. If I turn on OptimizeFill manually there's below performance regression on x86. So I turned it off on x86 in my patch to make things unchanged. Before (x86 with -XX:+OptimizeFill) Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 1793.206 ? 15.337 ns/op TestArrayFill.fillIntArray avgt 25 6679.491 ? 14.729 ns/op TestArrayFill.fillShortArray avgt 25 3412.708 ? 12.005 ns/op TestArrayFill.zeroByteArray avgt 25 1785.940 ? 15.174 ns/op TestArrayFill.zeroIntArray avgt 25 6666.709 ? 11.735 ns/op TestArrayFill.zeroShortArray avgt 25 3404.146 ? 23.045 ns/op After (x86 with -XX:+OptimizeFill) Benchmark Mode Cnt Score Error Units TestArrayFill.fillByteArray avgt 25 2281.374 ? 191.220 ns/op TestArrayFill.fillIntArray avgt 25 9009.679 ? 901.541 ns/op TestArrayFill.fillShortArray avgt 25 4828.686 ? 49.199 ns/op TestArrayFill.zeroByteArray avgt 25 2463.745 ? 47.640 ns/op TestArrayFill.zeroIntArray avgt 25 9062.682 ? 939.538 ns/op TestArrayFill.zeroShortArray avgt 25 4837.231 ? 50.026 ns/op > Hi, > > Can I have a review of this C2 loop optimization fix? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ > > C2 has a loop optimization phase called intrinsify_fill. It matches the pattern > of single array store with an loop invariant in a counted loop, like below, and > replaces it with call to some stub routine. > > for (int i = start; i < limit; i++) { > a[i] = value; > } > > Unfortunately, this doesn't work in current jdk after loop strip mining. > The above loop is eventually unrolled and auto-vectorized by subsequent > optimization phases. Root cause is that in strip-mined loops, the inner > CountedLoopNode may be used by the address polling node of the safepoint > in the outer loop. But as the safepoint polling has nothing related to any real > operations in the loop, it should not hinder the pattern match. > So in this patch, the polladr's use is ignored in the match check. > > We have some performance comparison of the code for array fill, between > the auto-vectorized version and the stub routine version. The JMH case for > the tests can be found at [1]. Results show that on x86, the stub code is even > slower than the auto-vectorized code. To prevent any regression, vm option > OptimizedFill is turned off for x86 in this patch. > So this patch doesn't impact on the generated code on x86. On AArch64, the > two versions show almost the same performance in general cases. But if the > value to be filled is zero, the stub code's performance is much better. This > makes sence as AArch64 uses cache maintenance instructions (DC ZVA) to > zero large blocks in the hand-crafted assembly. Below are JMH scores on > AArch64. > > Before: > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op > TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op > TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op > TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op > TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op > TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op > > After: > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op > TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op > TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op > TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op > TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op > TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op > > Another advantage of using the stub routine is that the generated code size is > reduced. > > Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are tested > and no new failure is found. Thanks, Pengfei From tobias.hartmann at oracle.com Tue Jun 16 06:34:58 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 16 Jun 2020 08:34:58 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> Message-ID: Thanks Vladimir! I'll run our regular performance testing. Best regards, Tobias On 15.06.20 20:02, Vladimir Kozlov wrote: > +1 > > I would suggest to do our regular performance testing to make sure there is no regression. > > Thanks, > Vladimir > > On 6/15/20 6:31 AM, Nils Eliasson wrote: >> Hi Tobias, >> >> The change looks reasonable. >> >> Reviewed. >> >> Best regards, >> Nils >> >> On 2020-06-15 13:22, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>> >>> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >>> which emits direct stores into the String internal byte array. GraphKit::array_element_address emits >>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >>> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and >>> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit >>> the node limit because during GVN, dead nodes are not removed. >>> >>> I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for >>> the string concat optimizations because "normal" array accesses have a range check dependent CastII >>> which blocks that ConvI2L optimization during parsing. >>> >>> Thanks, >>> Tobias >> From nils.eliasson at oracle.com Tue Jun 16 09:15:30 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 16 Jun 2020 11:15:30 +0200 Subject: [15] RFR(S): 8247421: [TESTBUG] ReturnBlobToWrongHeapTest.java failed allocating blob In-Reply-To: <801b2e93-0ab0-bebf-3135-8d9040c8ef34@oracle.com> References: <01e85a84-d8ee-7098-4a7f-6b493354b4de@oracle.com> <801b2e93-0ab0-bebf-3135-8d9040c8ef34@oracle.com> Message-ID: Thank you Vladimir! // Nils On 2020-06-12 22:39, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/12/20 12:37 PM, Nils Eliasson wrote: >> Hi, >> >> This tries to fill up one segment of the code cache with large blobs, >> and then fill up the rest with small code blobs. In one test run the >> large code blob happened to fill up the code heap precisely, leaving >> no room for any small code blob. That made the test fail. >> >> "CodeHeap 'non-profiled nmethods': size=11248Kb used=11248Kb >> max_used=11248Kb free=0Kb >> ??bounds [0x0000000116701000, 0x00000001171fd000, 0x00000001171fd000]" >> >> My fix allocates one small blob first, then continues on with the >> large blobs, and finally filling up the rest with small blobs. In >> that way there are always a small blob allocated. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8247421 >> Webrev: http://cr.openjdk.java.net/~neliasso/8247421/webrev.01/ >> >> Please review, >> Nils Eliasson From nils.eliasson at oracle.com Tue Jun 16 11:13:52 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 16 Jun 2020 13:13:52 +0200 Subject: RFR(S): 8247200: [aarch64] assert((unsigned)fpargs < 32) In-Reply-To: <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> References: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> Message-ID: <272458f4-3451-a555-0a45-119b78fd45aa@oracle.com> +1 Regards, Nils On 2020-06-15 11:17, Andrew Haley wrote: > On 11/06/2020 17:56, Vladimir Kozlov wrote: >> On 6/11/20 1:24 AM, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ >>> >>> >>> Removing assert and some associated dead code. > Looks good. > From patric.hedlin at oracle.com Tue Jun 16 11:16:59 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 16 Jun 2020 13:16:59 +0200 Subject: RFR(S): 8247200: [aarch64] assert((unsigned)fpargs < 32) In-Reply-To: <272458f4-3451-a555-0a45-119b78fd45aa@oracle.com> References: <40b781a4-5a1f-bd83-0dab-00dc2890e06d@oracle.com> <61acad9c-98a1-46b2-8e24-d3cb84fa2628@redhat.com> <272458f4-3451-a555-0a45-119b78fd45aa@oracle.com> Message-ID: <5d8a9e13-5fd2-c301-3d06-5c7d34b1af41@oracle.com> Thanks for reviewing Nils. /Patric On 2020-06-16 13:13, Nils Eliasson wrote: > +1 > > Regards, > Nils > > On 2020-06-15 11:17, Andrew Haley wrote: >> On 11/06/2020 17:56, Vladimir Kozlov wrote: >>> On 6/11/20 1:24 AM, Patric Hedlin wrote: >>>> Dear all, >>>> >>>> I would like to ask for help to review the following change/update: >>>> >>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247200 >>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247200/ >>>> >>>> >>>> Removing assert and some associated dead code. >> Looks good. From nils.eliasson at oracle.com Tue Jun 16 12:18:56 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 16 Jun 2020 14:18:56 +0200 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> References: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> Message-ID: <60e4501d-4d78-3523-c1bd-8388d18ed05d@oracle.com> Hi, Also we need to consider code size. The auto-vectorized version is inlined - and the unrolling may fail or be limited. To fully take advantage of this we would need to outline the fill-loop (like what's done for the intrinsic, where the loop is substituted for a call). But instead of having a handcrafted intrinsic - the call goes to some java code. To do this we need somewhere to put the java-version of the fill-loop. Regards, Nils Leveraging the auto-vectorization very nice - but On 2020-06-15 20:41, Vladimir Kozlov wrote: > Hi, > > It would be interesting to see difference with strip-mining off. > With strip-mining only part of iterations are replaced with stub. > > I don't see referenced link [1] in e-mail. > > Are these performance data for Aarch64? > > What x86 CPU you tested on? (avx512?) > > Please, add both x64 and Aarch64 perf data to RFE. > > What size of arrays you tested. > > Few years ago OptimizedFill wins over vectorized loops but CPU and > vectorization are improved since then. May be we can deprecate this > code if it does not have performance benefits. Or we should revisit > stub's code for modern CPUs. > > We need more data. > > Thanks, > Vladimir > > On 6/14/20 8:20 PM, Pengfei Li wrote: >> Hi, >> >> Can I have a review of this C2 loop optimization fix? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ >> >> C2 has a loop optimization phase called intrinsify_fill. It matches the >> pattern of single array store with an loop invariant in a counted loop, >> like below, and replaces it with call to some stub routine. >> >> ?? for (int i = start; i < limit; i++) { >> ???? a[i] = value; >> ?? } >> >> Unfortunately, this doesn't work in current jdk after loop strip mining. >> The above loop is eventually unrolled and auto-vectorized by subsequent >> optimization phases. Root cause is that in strip-mined loops, the inner >> CountedLoopNode may be used by the address polling node of the safepoint >> in the outer loop. But as the safepoint polling has nothing related to >> any real operations in the loop, it should not hinder the pattern match. >> So in this patch, the polladr's use is ignored in the match check. >> >> We have some performance comparison of the code for array fill, between >> the auto-vectorized version and the stub routine version. The JMH case >> for the tests can be found at [1]. Results show that on x86, the stub >> code is even slower than the auto-vectorized code. To prevent any >> regression, vm option OptimizedFill is turned off for x86 in this patch. >> So this patch doesn't impact on the generated code on x86. On AArch64, >> the two versions show almost the same performance in general cases. But >> if the value to be filled is zero, the stub code's performance is much >> better. This makes sence as AArch64 uses cache maintenance instructions >> (DC ZVA) to zero large blocks in the hand-crafted assembly. Below are >> JMH scores on AArch64. >> >> Before: >> ?? Benchmark???????????????????? Mode? Cnt????? Score???? Error Units >> ?? TestArrayFill.fillByteArray?? avgt?? 25?? 2078.700 ??? 7.719 ns/op >> ?? TestArrayFill.fillIntArray??? avgt?? 25? 12371.497 ? 566.773 ns/op >> ?? TestArrayFill.fillShortArray? avgt?? 25?? 4132.439 ?? 25.096 ns/op >> ?? TestArrayFill.zeroByteArray?? avgt?? 25?? 2080.313 ??? 7.516 ns/op >> ?? TestArrayFill.zeroIntArray??? avgt?? 25? 10961.331 ? 527.750 ns/op >> ?? TestArrayFill.zeroShortArray? avgt?? 25?? 4126.386 ?? 20.997 ns/op >> >> After: >> ?? Benchmark???????????????????? Mode? Cnt????? Score???? Error Units >> ?? TestArrayFill.fillByteArray?? avgt?? 25?? 2080.382 ??? 2.103 ns/op >> ?? TestArrayFill.fillIntArray??? avgt?? 25? 11997.621 ? 569.058 ns/op >> ?? TestArrayFill.fillShortArray? avgt?? 25?? 4309.035 ? 285.456 ns/op >> ?? TestArrayFill.zeroByteArray?? avgt?? 25??? 903.434 ?? 10.944 ns/op >> ?? TestArrayFill.zeroIntArray??? avgt?? 25?? 8141.533 ? 946.341 ns/op >> ?? TestArrayFill.zeroShortArray? avgt?? 25?? 1784.124 ?? 24.618 ns/op >> >> Another advantage of using the stub routine is that the generated code >> size is reduced. >> >> Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are >> tested and no new failure is found. >> >> -- >> Thanks, >> Pengfei >> From vladimir.kozlov at oracle.com Tue Jun 16 16:38:41 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Jun 2020 09:38:41 -0700 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: Message-ID: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> Hi Ludovic, Looks good. Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: "But after JDK-8072008 there's no problem with delaying inlining. C2 can decide whether to keep the direct call or inline through it. So, I enabled late inlining for all linkers. (Surprisingly, no significant performance difference on nashorn.)" I created RFE https://bugs.openjdk.java.net/browse/JDK-8247697 for this. Thanks, Vladimir [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021137.html PS: Does someone in your group have OpenJDK Author status so he can file issues in JBS? On 6/15/20 4:58 PM, Ludovic Henry wrote: > Hi, > > As I was exploring code in src/hotspot/share/opto/doCall.cpp, I noticed the `delayed_forbidden` parameter to `Compile::call_generator` never to be used. > > From doing some mercurial archeology, the change was introduced with https://hg.openjdk.java.net/jdk/jdk/rev/823590505eb4, and later modified with https://hg.openjdk.java.net/jdk/jdk/rev/e1685e30beca which removed all uses of `delayed_forbidden`. > > As I am not an author, please find following the patch removing the dead code. > > Thank you, > > -- > Ludovic > > diff --git a/src/hotspot/share/opto/callGenerator.cpp b/src/hotspot/share/opto/callGenerator.cpp > index 1092d582184..606325d4dd6 100644 > --- a/src/hotspot/share/opto/callGenerator.cpp > +++ b/src/hotspot/share/opto/callGenerator.cpp > @@ -821,13 +821,13 @@ JVMState* PredictedCallGenerator::generate(JVMState* jvms) { > } > > > -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { > +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, ciMethod* caller, ciMethod* callee) { > assert(callee->is_method_handle_intrinsic(), "for_method_handle_call mismatch"); > bool input_not_const; > CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, caller, callee, input_not_const); > Compile* C = Compile::current(); > if (cg != NULL) { > - if (!delayed_forbidden && AlwaysIncrementalInline) { > + if (AlwaysIncrementalInline) { > return CallGenerator::for_late_inline(callee, cg); > } else { > return cg; > diff --git a/src/hotspot/share/opto/callGenerator.hpp b/src/hotspot/share/opto/callGenerator.hpp > index 46bf9f5d7f9..3a04fa4d5cd 100644 > --- a/src/hotspot/share/opto/callGenerator.hpp > +++ b/src/hotspot/share/opto/callGenerator.hpp > @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { > static CallGenerator* for_direct_call(ciMethod* m, bool separate_io_projs = false); // static, special > static CallGenerator* for_virtual_call(ciMethod* m, int vtable_index); // virtual, interface > > - static CallGenerator* for_method_handle_call( JVMState* jvms, ciMethod* caller, ciMethod* callee, bool delayed_forbidden); > + static CallGenerator* for_method_handle_call( JVMState* jvms, ciMethod* caller, ciMethod* callee); > static CallGenerator* for_method_handle_inline(JVMState* jvms, ciMethod* caller, ciMethod* callee, bool& input_not_const); > > // How to generate a replace a direct call with an inline version > diff --git a/src/hotspot/share/opto/compile.hpp b/src/hotspot/share/opto/compile.hpp > index a922a905707..bd0def2997c 100644 > --- a/src/hotspot/share/opto/compile.hpp > +++ b/src/hotspot/share/opto/compile.hpp > @@ -854,7 +854,7 @@ class Compile : public Phase { > // The profile factor is a discount to apply to this site's interp. profile. > CallGenerator* call_generator(ciMethod* call_method, int vtable_index, bool call_does_dispatch, > JVMState* jvms, bool allow_inline, float profile_factor, ciKlass* speculative_receiver_type = NULL, > - bool allow_intrinsics = true, bool delayed_forbidden = false); > + bool allow_intrinsics = true); > bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { > return should_delay_string_inlining(call_method, jvms) || > should_delay_boxing_inlining(call_method, jvms); > diff --git a/src/hotspot/share/opto/doCall.cpp b/src/hotspot/share/opto/doCall.cpp > index c26dc4b682d..c4d55d0d4c4 100644 > --- a/src/hotspot/share/opto/doCall.cpp > +++ b/src/hotspot/share/opto/doCall.cpp > @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod *method, int depth, int bci, ciMeth > CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool call_does_dispatch, > JVMState* jvms, bool allow_inline, > float prof_factor, ciKlass* speculative_receiver_type, > - bool allow_intrinsics, bool delayed_forbidden) { > + bool allow_intrinsics) { > ciMethod* caller = jvms->method(); > int bci = jvms->bci(); > Bytecodes::Code bytecode = caller->java_code_at_bci(bci); > @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool > // MethodHandle.invoke* are native methods which obviously don't > // have bytecodes and so normal inlining fails. > if (callee->is_method_handle_intrinsic()) { > - CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee, delayed_forbidden); > - assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); > + CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee); > + assert(cg == NULL || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); > return cg; > } > > @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool > // opportunity to perform some high level optimizations > // first. > if (should_delay_string_inlining(callee, jvms)) { > - assert(!delayed_forbidden, "strange"); > return CallGenerator::for_string_late_inline(callee, cg); > } else if (should_delay_boxing_inlining(callee, jvms)) { > - assert(!delayed_forbidden, "strange"); > return CallGenerator::for_boxing_late_inline(callee, cg); > - } else if ((should_delay || AlwaysIncrementalInline) && !delayed_forbidden) { > + } else if ((should_delay || AlwaysIncrementalInline)) { > return CallGenerator::for_late_inline(callee, cg); > } > } > > From vladimir.x.ivanov at oracle.com Tue Jun 16 18:15:21 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 16 Jun 2020 21:15:21 +0300 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> Message-ID: <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Good catch, Ludovic! Looks good to me as well. Best regards, Vladimir Ivanov On 16.06.2020 19:38, Vladimir Kozlov wrote: > Hi Ludovic, > > Looks good. > > Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: > > "But after JDK-8072008 there's no problem with delaying inlining. C2 can > decide whether to keep the direct call or inline through it. So, I > enabled late inlining for all linkers. (Surprisingly, no significant > performance difference on nashorn.)" > > I created RFE https://bugs.openjdk.java.net/browse/JDK-8247697 for this. > > Thanks, > Vladimir > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021137.html > > > PS: Does someone in your group have OpenJDK Author status so he can file > issues in JBS? > > On 6/15/20 4:58 PM, Ludovic Henry wrote: >> Hi, >> >> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >> never to be used. >> >> ?From doing some mercurial archeology, the change was introduced with >> https://hg.openjdk.java.net/jdk/jdk/rev/823590505eb4, and later >> modified with https://hg.openjdk.java.net/jdk/jdk/rev/e1685e30beca >> which removed all uses of `delayed_forbidden`. >> >> As I am not an author, please find following the patch removing the >> dead code. >> >> Thank you, >> >> -- >> Ludovic >> >> diff --git a/src/hotspot/share/opto/callGenerator.cpp >> b/src/hotspot/share/opto/callGenerator.cpp >> index 1092d582184..606325d4dd6 100644 >> --- a/src/hotspot/share/opto/callGenerator.cpp >> +++ b/src/hotspot/share/opto/callGenerator.cpp >> @@ -821,13 +821,13 @@ JVMState* >> PredictedCallGenerator::generate(JVMState* jvms) { >> ? } >> >> >> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee) { >> ??? assert(callee->is_method_handle_intrinsic(), >> "for_method_handle_call mismatch"); >> ??? bool input_not_const; >> ??? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >> caller, callee, input_not_const); >> ??? Compile* C = Compile::current(); >> ??? if (cg != NULL) { >> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >> +??? if (AlwaysIncrementalInline) { >> ??????? return CallGenerator::for_late_inline(callee, cg); >> ????? } else { >> ??????? return cg; >> diff --git a/src/hotspot/share/opto/callGenerator.hpp >> b/src/hotspot/share/opto/callGenerator.hpp >> index 46bf9f5d7f9..3a04fa4d5cd 100644 >> --- a/src/hotspot/share/opto/callGenerator.hpp >> +++ b/src/hotspot/share/opto/callGenerator.hpp >> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >> ??? static CallGenerator* for_direct_call(ciMethod* m, bool >> separate_io_projs = false);?? // static, special >> ??? static CallGenerator* for_virtual_call(ciMethod* m, int >> vtable_index);? // virtual, interface >> >> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee); >> ??? static CallGenerator* for_method_handle_inline(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool& input_not_const); >> >> ??? // How to generate a replace a direct call with an inline version >> diff --git a/src/hotspot/share/opto/compile.hpp >> b/src/hotspot/share/opto/compile.hpp >> index a922a905707..bd0def2997c 100644 >> --- a/src/hotspot/share/opto/compile.hpp >> +++ b/src/hotspot/share/opto/compile.hpp >> @@ -854,7 +854,7 @@ class Compile : public Phase { >> ??? // The profile factor is a discount to apply to this site's >> interp. profile. >> ??? CallGenerator*??? call_generator(ciMethod* call_method, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????? JVMState* jvms, bool >> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >> = NULL, >> -?????????????????????????????????? bool allow_intrinsics = true, bool >> delayed_forbidden = false); >> +?????????????????????????????????? bool allow_intrinsics = true); >> ??? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >> ????? return should_delay_string_inlining(call_method, jvms) || >> ???????????? should_delay_boxing_inlining(call_method, jvms); >> diff --git a/src/hotspot/share/opto/doCall.cpp >> b/src/hotspot/share/opto/doCall.cpp >> index c26dc4b682d..c4d55d0d4c4 100644 >> --- a/src/hotspot/share/opto/doCall.cpp >> +++ b/src/hotspot/share/opto/doCall.cpp >> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >> *method, int depth, int bci, ciMeth >> ? CallGenerator* Compile::call_generator(ciMethod* callee, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????????? JVMState* jvms, bool >> allow_inline, >> ???????????????????????????????????????? float prof_factor, ciKlass* >> speculative_receiver_type, >> -?????????????????????????????????????? bool allow_intrinsics, bool >> delayed_forbidden) { >> +?????????????????????????????????????? bool allow_intrinsics) { >> ??? ciMethod*?????? caller?? = jvms->method(); >> ??? int???????????? bci????? = jvms->bci(); >> ??? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??? // MethodHandle.invoke* are native methods which obviously don't >> ??? // have bytecodes and so normal inlining fails. >> ??? if (callee->is_method_handle_intrinsic()) { >> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee, delayed_forbidden); >> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee); >> +??? assert(cg == NULL || !cg->is_late_inline() || >> cg->is_mh_late_inline(), "unexpected CallGenerator"); >> ????? return cg; >> ??? } >> >> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??????????? // opportunity to perform some high level optimizations >> ??????????? // first. >> ??????????? if (should_delay_string_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_string_late_inline(callee, cg); >> ??????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >> !delayed_forbidden) { >> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >> ????????????? return CallGenerator::for_late_inline(callee, cg); >> ??????????? } >> ????????? } >> >> From vladimir.kozlov at oracle.com Tue Jun 16 22:34:06 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Jun 2020 15:34:06 -0700 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: Hi Pengfei, I ran your benchmark on my machine (only 3 iterations). First, I was confused by numbers because I assumed that bigger is better. But it is opposite. Second, I thought it may be interaction with strip mining. By default LoopStripMiningIter is set to 1000 [1]. So inner loop in your tests will execute only 65 iterations - that is why vectorization (inlined instructions) wins. Then I reduced outer iterations to 100 to have inner 655 iterations. It did not help. And finally I tried switch off strip mining by -XX:-UseCountedLoopSafepoints or using Parallel GC. And again had the same results: -XX:+UseParallelGC TestArrayFill.fillByteArray avgt 3 2164.076 ? 168.821 ns/op TestArrayFill.fillIntArray avgt 3 8600.738 ? 954.559 ns/op TestArrayFill.fillShortArray avgt 3 4488.062 ? 217.501 ns/op TestArrayFill.zeroByteArray avgt 3 2167.487 ? 373.366 ns/op TestArrayFill.zeroIntArray avgt 3 8595.717 ? 579.696 ns/op TestArrayFill.zeroShortArray avgt 3 4482.645 ? 44.031 ns/op -XX:+UseParallelGC -XX:-OptimizeFill TestArrayFill.fillByteArray avgt 3 1586.719 ? 87.300 ns/op TestArrayFill.fillIntArray avgt 3 5879.356 ? 34.836 ns/op TestArrayFill.fillShortArray avgt 3 3045.436 ? 41.981 ns/op TestArrayFill.zeroByteArray avgt 3 1513.536 ? 738.573 ns/op TestArrayFill.zeroIntArray avgt 3 5911.524 ? 172.335 ns/op TestArrayFill.zeroShortArray avgt 3 3053.304 ? 50.365 ns/op Looking on generated code I see that vectorized loop may unroll 16 times (16 vector instructions by 256 bytes) where generate_fill() stub on x86 has 2 (256 bytes wide) instructions per iteration and 1 instruction for avx512 [2]. Also stub has alot of pre- and post-loop instructions and checks. I thought may be we can improve stub. But it seems vectorized loop with predicates is more compact and efficient. And it is auto generated! Base on results I agree with you switching off fill optimization on x86. There could be side effects due to loops code will be larger (vs stub call) but we have it already right now before your changes so I don't think we will see regression for GCs which use strip mining. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/3585f92edcaa/src/hotspot/share/gc/g1/g1Arguments.cpp#l183 [2] http://hg.openjdk.java.net/jdk/jdk/file/3585f92edcaa/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l5023 On 6/15/20 11:24 PM, Pengfei Li wrote: > Sorry I forgot to paste below JMH link in my last email. > > [1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java > > BTW. If I turn on OptimizeFill manually there's below performance regression on x86. So I turned it off on x86 in my patch to make things unchanged. > > Before (x86 with -XX:+OptimizeFill) > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 1793.206 ? 15.337 ns/op > TestArrayFill.fillIntArray avgt 25 6679.491 ? 14.729 ns/op > TestArrayFill.fillShortArray avgt 25 3412.708 ? 12.005 ns/op > TestArrayFill.zeroByteArray avgt 25 1785.940 ? 15.174 ns/op > TestArrayFill.zeroIntArray avgt 25 6666.709 ? 11.735 ns/op > TestArrayFill.zeroShortArray avgt 25 3404.146 ? 23.045 ns/op > > After (x86 with -XX:+OptimizeFill) > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2281.374 ? 191.220 ns/op > TestArrayFill.fillIntArray avgt 25 9009.679 ? 901.541 ns/op > TestArrayFill.fillShortArray avgt 25 4828.686 ? 49.199 ns/op > TestArrayFill.zeroByteArray avgt 25 2463.745 ? 47.640 ns/op > TestArrayFill.zeroIntArray avgt 25 9062.682 ? 939.538 ns/op > TestArrayFill.zeroShortArray avgt 25 4837.231 ? 50.026 ns/op > >> Hi, >> >> Can I have a review of this C2 loop optimization fix? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ >> >> C2 has a loop optimization phase called intrinsify_fill. It matches the pattern >> of single array store with an loop invariant in a counted loop, like below, and >> replaces it with call to some stub routine. >> >> for (int i = start; i < limit; i++) { >> a[i] = value; >> } >> >> Unfortunately, this doesn't work in current jdk after loop strip mining. >> The above loop is eventually unrolled and auto-vectorized by subsequent >> optimization phases. Root cause is that in strip-mined loops, the inner >> CountedLoopNode may be used by the address polling node of the safepoint >> in the outer loop. But as the safepoint polling has nothing related to any real >> operations in the loop, it should not hinder the pattern match. >> So in this patch, the polladr's use is ignored in the match check. >> >> We have some performance comparison of the code for array fill, between >> the auto-vectorized version and the stub routine version. The JMH case for >> the tests can be found at [1]. Results show that on x86, the stub code is even >> slower than the auto-vectorized code. To prevent any regression, vm option >> OptimizedFill is turned off for x86 in this patch. >> So this patch doesn't impact on the generated code on x86. On AArch64, the >> two versions show almost the same performance in general cases. But if the >> value to be filled is zero, the stub code's performance is much better. This >> makes sence as AArch64 uses cache maintenance instructions (DC ZVA) to >> zero large blocks in the hand-crafted assembly. Below are JMH scores on >> AArch64. >> >> Before: >> Benchmark Mode Cnt Score Error Units >> TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op >> TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op >> TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op >> TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op >> TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op >> TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op >> TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op >> TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op >> TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op >> TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op >> TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op >> >> Another advantage of using the stub routine is that the generated code size is >> reduced. >> >> Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are tested >> and no new failure is found. > > Thanks, > Pengfei > From Pengfei.Li at arm.com Wed Jun 17 03:30:37 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Wed, 17 Jun 2020 03:30:37 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: Hi Vladimir, > Looking on generated code I see that vectorized loop may unroll 16 times (16 > vector instructions by 256 bytes) where > generate_fill() stub on x86 has 2 (256 bytes wide) instructions per iteration > and 1 instruction for avx512 [2]. > Also stub has alot of pre- and post-loop instructions and checks. Right, I also take a look at x86 generated stub code and think the performance is potentially to be improved if the loop is unrolled more times. The AArch64 stub code is manually unrolled 8 times and it has almost no performance difference with the auto-vectorized version in general cases. > I thought may be we can improve stub. But it seems vectorized loop with > predicates is more compact and efficient. And it is auto generated! > > Base on results I agree with you switching off fill optimization on x86. > > There could be side effects due to loops code will be larger (vs stub call) but > we have it already right now before your changes so I don't think we will see > regression for GCs which use strip mining. Trying to improve the stub is my next plan. I believe both x86 and AArch64 stubs have room for improvement. So I prefer the keep the stub code for now and check if it can win the auto-vectorized version after been improved in the near future. But I hope some Intel guy could help with the x86 backend part since I'm not quite familiar with new x86 instructions. I'm also studying the experimental feature of vectorized loop with predicates optimization in recent days (the PostLoopMultiversioning). But I found it's more complex and not working well now. This could be another long term goal. Please let me know if you have further comments. -- Thanks, Pengfei From Pengfei.Li at arm.com Wed Jun 17 03:43:20 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Wed, 17 Jun 2020 03:43:20 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: <60e4501d-4d78-3523-c1bd-8388d18ed05d@oracle.com> References: <4b8516e6-c9a5-5f8c-fba4-d314c9e47f0c@oracle.com> <60e4501d-4d78-3523-c1bd-8388d18ed05d@oracle.com> Message-ID: Hi Nils, > Also we need to consider code size. The auto-vectorized version is inlined - > and the unrolling may fail or be limited. To fully take advantage of this we > would need to outline the fill-loop (like what's done for the intrinsic, where > the loop is substituted for a call). But instead of having a handcrafted intrinsic > - the call goes to some java code. To do this we need somewhere to put the > java-version of the fill-loop. Thanks for comments. I also prefer calling the stub code but unfortunately it loses on x86, probably because of lack of unrolling (see my latest reply to Vladimir for details). So disabling it on x86 is my temporary solution to make everything on x86 unchanged for now. I hope this disabling could be removed soon after the stubs get improved in the near future. -- Thanks, Pengfei From luhenry at microsoft.com Wed Jun 17 05:16:10 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 17 Jun 2020 05:16:10 +0000 Subject: RFR(S): Remove faulty assertion in Compile::call_generator triggered with -XX:+AlwaysIncrementalInline Message-ID: Hi, While playing with -XX:+AlwaysIncrementalInline, I ran into an assertion triggering on pretty much any tests in test/hotspot/jtreg in debug (slow or fast, assertion just need to be enabled). I tracked it down to the following issue and explanation. In CallGenerator, when calling `for_method_handle_call`, it will redirect to one of the following: - `for_method_handle_inline` itself redirecting to `Compile::call_generator` - `for_late_inline` - `for_mh_late_inline` - `for_direct_call` The removed assert would fail if we end up calling `for_late_inline`. That would be the case when `for_method_handle_inline` returns successfully and AlwaysIncrementalInline is set. This would guarantee failure whenever this global setting is set with -XX:+AlwaysIncrementalInline In this case, we can safely remove the assert because the case treated specificically in `for_mh_late_inline` is treated by the combination of `for_method_handle_inline` and `for_late_inline`. Looking at `LateInlineMHCallGenerator::do_late_inline_check`, which is called from `LateInlineCallGenerator::do_late_inline`, we notice that it itself calls `for_method_handle_inline`. This means that `for_method_handle_inline` is called through the two following path with late inlining: 1. `for_late_inline` -> `do_late_inline` <- `for_method_handle_inline` (which was previously called and its result passed through `for_late_inline(inline_cg)`) 2. `for_late_mh_inline` -> `do_late_inline` -> `do_late_inline_check` -> `for_method_handle_inline` So whether we return a LateInlineCallGenerator or a LateInlineMHCallGenerator from `for_method_handle_call`, the logic in `for_method_handle_inline` will be called. I do not have authorship status, so I can neither create a JBS nor a webrev. I am working with a colleague to do it, and I'll link it here, but feel free to create any of it before that. The diff is available at [1] Thank you, -- Ludovic [1] diff --git a/src/hotspot/share/opto/doCall.cpp b/src/hotspot/share/opto/doCall.cpp index c4d55d0d4c4..ba5f737592e 100644 --- a/src/hotspot/share/opto/doCall.cpp +++ b/src/hotspot/share/opto/doCall.cpp @@ -146,7 +146,6 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool // have bytecodes and so normal inlining fails. if (callee->is_method_handle_intrinsic()) { CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee); - assert(cg == NULL || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); return cg; } From luhenry at microsoft.com Wed Jun 17 05:17:47 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 17 Jun 2020 05:17:47 +0000 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Message-ID: Hi, Thank you for the review! I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. -- Ludovic -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov Sent: Tuesday, June 16, 2020 11:15 AM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Remove some dead code in C2 Good catch, Ludovic! Looks good to me as well. Best regards, Vladimir Ivanov On 16.06.2020 19:38, Vladimir Kozlov wrote: > Hi Ludovic, > > Looks good. > > Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: > > "But after JDK-8072008 there's no problem with delaying inlining. C2 can > decide whether to keep the direct call or inline through it. So, I > enabled late inlining for all linkers. (Surprisingly, no significant > performance difference on nashorn.)" > > I created RFE https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8247697&data=02%7C01%7Cluhenry%40microsoft.com%7Cbb17ce6d05e14499a6cf08d81235ec41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279370094276730&sdata=UcLt7t3CGAeysoFQS%2Bj70M6l0PQ3CG8RCpBBSyM4ssQ%3D&reserved=0 for this. > > Thanks, > Vladimir > > [1] > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2016-February%2F021137.html&data=02%7C01%7Cluhenry%40microsoft.com%7Cbb17ce6d05e14499a6cf08d81235ec41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279370094276730&sdata=zIULVvySl4sHkrzJkVpJn3lWvw2UTZsJa4rmiEHqack%3D&reserved=0 > > > PS: Does someone in your group have OpenJDK Author status so he can file > issues in JBS? > > On 6/15/20 4:58 PM, Ludovic Henry wrote: >> Hi, >> >> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >> never to be used. >> >> ?From doing some mercurial archeology, the change was introduced with >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2F823590505eb4&data=02%7C01%7Cluhenry%40microsoft.com%7Cbb17ce6d05e14499a6cf08d81235ec41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279370094286691&sdata=oH33pbt9BMQITYtSv4fQg8qOQkFTz0MruV0%2F8bUyAqM%3D&reserved=0, and later >> modified with https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2Fe1685e30beca&data=02%7C01%7Cluhenry%40microsoft.com%7Cbb17ce6d05e14499a6cf08d81235ec41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279370094286691&sdata=WHWgZyG3CDhIErnYcnIg9PHgQ9GiWOx4bCtZN55MnvQ%3D&reserved=0 >> which removed all uses of `delayed_forbidden`. >> >> As I am not an author, please find following the patch removing the >> dead code. >> >> Thank you, >> >> -- >> Ludovic >> >> diff --git a/src/hotspot/share/opto/callGenerator.cpp >> b/src/hotspot/share/opto/callGenerator.cpp >> index 1092d582184..606325d4dd6 100644 >> --- a/src/hotspot/share/opto/callGenerator.cpp >> +++ b/src/hotspot/share/opto/callGenerator.cpp >> @@ -821,13 +821,13 @@ JVMState* >> PredictedCallGenerator::generate(JVMState* jvms) { >> ? } >> >> >> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee) { >> ??? assert(callee->is_method_handle_intrinsic(), >> "for_method_handle_call mismatch"); >> ??? bool input_not_const; >> ??? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >> caller, callee, input_not_const); >> ??? Compile* C = Compile::current(); >> ??? if (cg != NULL) { >> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >> +??? if (AlwaysIncrementalInline) { >> ??????? return CallGenerator::for_late_inline(callee, cg); >> ????? } else { >> ??????? return cg; >> diff --git a/src/hotspot/share/opto/callGenerator.hpp >> b/src/hotspot/share/opto/callGenerator.hpp >> index 46bf9f5d7f9..3a04fa4d5cd 100644 >> --- a/src/hotspot/share/opto/callGenerator.hpp >> +++ b/src/hotspot/share/opto/callGenerator.hpp >> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >> ??? static CallGenerator* for_direct_call(ciMethod* m, bool >> separate_io_projs = false);?? // static, special >> ??? static CallGenerator* for_virtual_call(ciMethod* m, int >> vtable_index);? // virtual, interface >> >> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee); >> ??? static CallGenerator* for_method_handle_inline(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool& input_not_const); >> >> ??? // How to generate a replace a direct call with an inline version >> diff --git a/src/hotspot/share/opto/compile.hpp >> b/src/hotspot/share/opto/compile.hpp >> index a922a905707..bd0def2997c 100644 >> --- a/src/hotspot/share/opto/compile.hpp >> +++ b/src/hotspot/share/opto/compile.hpp >> @@ -854,7 +854,7 @@ class Compile : public Phase { >> ??? // The profile factor is a discount to apply to this site's >> interp. profile. >> ??? CallGenerator*??? call_generator(ciMethod* call_method, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????? JVMState* jvms, bool >> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >> = NULL, >> -?????????????????????????????????? bool allow_intrinsics = true, bool >> delayed_forbidden = false); >> +?????????????????????????????????? bool allow_intrinsics = true); >> ??? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >> ????? return should_delay_string_inlining(call_method, jvms) || >> ???????????? should_delay_boxing_inlining(call_method, jvms); >> diff --git a/src/hotspot/share/opto/doCall.cpp >> b/src/hotspot/share/opto/doCall.cpp >> index c26dc4b682d..c4d55d0d4c4 100644 >> --- a/src/hotspot/share/opto/doCall.cpp >> +++ b/src/hotspot/share/opto/doCall.cpp >> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >> *method, int depth, int bci, ciMeth >> ? CallGenerator* Compile::call_generator(ciMethod* callee, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????????? JVMState* jvms, bool >> allow_inline, >> ???????????????????????????????????????? float prof_factor, ciKlass* >> speculative_receiver_type, >> -?????????????????????????????????????? bool allow_intrinsics, bool >> delayed_forbidden) { >> +?????????????????????????????????????? bool allow_intrinsics) { >> ??? ciMethod*?????? caller?? = jvms->method(); >> ??? int???????????? bci????? = jvms->bci(); >> ??? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??? // MethodHandle.invoke* are native methods which obviously don't >> ??? // have bytecodes and so normal inlining fails. >> ??? if (callee->is_method_handle_intrinsic()) { >> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee, delayed_forbidden); >> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee); >> +??? assert(cg == NULL || !cg->is_late_inline() || >> cg->is_mh_late_inline(), "unexpected CallGenerator"); >> ????? return cg; >> ??? } >> >> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??????????? // opportunity to perform some high level optimizations >> ??????????? // first. >> ??????????? if (should_delay_string_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_string_late_inline(callee, cg); >> ??????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >> !delayed_forbidden) { >> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >> ????????????? return CallGenerator::for_late_inline(callee, cg); >> ??????????? } >> ????????? } >> >> From xxinliu at amazon.com Wed Jun 17 08:01:22 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 17 Jun 2020 08:01:22 +0000 Subject: RFR[M]: 8151779: Some intrinsic flags could be replaced with one general flag In-Reply-To: <0365691c-5f80-9a3a-e47f-9852ef66f217@oracle.com> References: <19CD3956-4DC6-4908-8626-27D48A9AB4A4@amazon.com> <0EDAAC88-E5D9-424F-A19E-5E20C689C2F3@amazon.com> <801D878C-CAE5-4EBE-8AFE-4E35346CD5BD@amazon.com> <58ff5b66-1dce-d4ad-8f21-254abd1b887b@oracle.com> <65dcfd1f-5e7e-b9e1-8298-5daafcda8a81@oracle.com> <1EBE66E6-9AA7-4EC5-9B91-45F884071FAC@amazon.com> <2982174F-DBB6-4316-93C3-1B4DFDF34C88@amazon.com> <0365691c-5f80-9a3a-e47f-9852ef66f217@oracle.com> Message-ID: <1bd5bfea-d19c-bff5-66c7-c8dfdfb953a0@amazon.com> Hi, Nils, Thank you to review it. I finally get gtest fixed on my side. I verify that webrev again. It still can apply to TIP cleanly and pass both gtest:all and hotspot:tier1. Call for another reviewer to approve it. http://cr.openjdk.java.net/~xliu/8151779/05/webrev/ I think of your request and I've filed a follow-up issue: JDK-8247732. I plan to use a constraint of globals.hpp? and install the constraint function to jvmFlagContraintsCompiler.hpp. for directives, I will add validator in DirectiveSet::init_control_intrinsic. thanks, --lx On 6/11/20 2:24 PM, Nils Eliasson wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > Hi Xin, > > In general I think the patch looks good. > > I am missing strict name checking. (I want to see an error on startup if > the user has specified unknown intrinsic names.) I see that the lazy > initialization of the intrinsic name tables might make it non-trivial to > find a good place to do that. I am ok if you follow up on that in a > future patch. > > Best regards, > Nils Eliasson > > >> Incremental diff: http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff >> >> I verified it in submit repo a week ago. I also double-check the >> patch still can patch to TIP and pass both hotspot:tier1 and gtest:all. >> >> Here is log message I got from mach-5. >> Job: mach5-one-phh-JDK-8151779-1-20200513-1821-11015755 >> >> BuildId: 2020-05-13-1820211.hohensee.source >> >> No failed tests >> >> Tasks Summary >> >> EXECUTED_WITH_FAILURE: 0 >> NOTHING_TO_RUN: 0 >> KILLED: 0 >> HARNESS_ERROR: 0 >> FAILED: 0 >> PASSED: 101 >> UNABLE_TO_RUN: 0 >> NA: 0 >> >> >> Thanks, >> --lx >> >> >> >> ?On 5/13/20, 12:03 AM, "hotspot-compiler-dev on behalf of Liu, Xin" >> > xxinliu at amazon.com> wrote: >> >> ???? Hi, Vladimir, >> >> ???? > 2.? add +/- UseCRC32Intrinsics to IntrinsicAvailableTest.java >> ???? > The purpose of that test is not to generate a CRC32 intrinsic. >> Its purpose is to check if compilers determine to intrinsify >> _updateCRC32 or not. >> ???? > Mathematically, "UseCRC32Intrinsics" is a set = [_updateCRC32, >> _updateBytesCRC32, _updateByteBufferCRC32]. >> ???? > "-XX:-UseCRC32Intrinsics" disables all 3 of them. If users use >> -XX:ControlIntrinsic=+_updateCRC32 and -XX:-UseCRC32Intrinsics, >> _updateCRC32 should be enabled individually. >> >> ???? No, I think we should preserve current behavior when >> UseCRC32Intrinsics is off then all corresponding intrinsics are >> ???? also should be off. This is the purpose of such flags - to be >> able control several intrinsics with one flag. >> ???? Otherwise you have to check each individual intrinsic if CPU >> does not support them. Even if code for some of these >> ???? intrinsics can be generated on this CPU. We should be >> consistent, otherwise code can become very complex to support. >> ???? ---- >> ???? If -XX:ControlIntrinsic=+_updateBytesCRC32 can't win over >> -XX:-UseCRC32Intrinsics,? it will come back the justification of >> JBS-8151779: >> ???? Why do we need to support the usage >> -XX:ControlIntrinsic=+_updateBytesCRC32? If a user doesn't set >> +updateBytesCRC32, it's still enabled. >> >> ???? I read the description of "JBS-8235981" and "JBS-8151779" again. >> I try to understand in this way. The option 'UseCRC32Intrinsics' is >> the consolidation of 3 real intrinsics [_updateCRC32, >> _updateBytesCRC32, _updateByteBufferCRC32]. It represents some sorta >> hardware capabilities to make those intrinsics optimal. If >> UseCRC32Intrinsics is OFF,? it will not make sense to intrinsify them >> anymore because inliner can deliver the similar result. >> >> ???? Quote from JBS-8235981 "Right now, there's no way to introduce >> experimental intrinsics which are turned off by default and let users >> enable them on their side. " >> ???? Currently, once a user declares one new intrinsics in >> VM_INTRINSICS_DO,? it's enabled. It might not be true in the future. >> ???? i.e. A develop can declare an intrinsic but mark it turn-off by >> default. He will try it out by -XX:ControlIntrinsic=+_myNewIntrinsic >> in his development stage. >> >> ???? Do I catch up your intention this time? if yes, could you take a >> look at this new revision? I think I meet the requirement. >> ???? Webrev: http://cr.openjdk.java.net/~xliu/8151779/05/webrev/ >> ???? Incremental diff: >> http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff >> >> ???? Here is the change log from rev04. >> ???? 1) An intrinsic is enabled if and only if neither >> ControlIntrinsic nor the corresponding UseXXXIntrinsics disables it. >> ???? The implementation is still in >> vmIntrinsics::is_disabled_by_flags(vmIntrinsics::ID id). >> >> ???? 2) I introduce a compact data structure TriBoolArray. It >> compresses an array of Tribool.? Each tribool only takes 2 bits now. >> ???? I also took Coleen's suggestion to put TriBool and TriBoolArray >> in a standalone file "utilities/tribool.hpp". A new gtest is attached. >> >> ???? 3) Correct some typos. Thank you David pointed them out. >> >> ???? Thanks, >> ???? --lx >> >> >> ???? On 5/12/20, 12:59 AM, "David Holmes" >> wrote: >> >> ???????? CAUTION: This email originated from outside of the >> organization. Do not click links or open attachments unless you can >> confirm the sender and know the content is safe. >> >> >> >> ???????? Hi, >> >> ???????? Sorry for the delay in getting back to this. >> >> ???????? On 5/05/2020 7:37 pm, Liu, Xin wrote: >> ???????? > Hello, David and Nils >> ???????? > >> ???????? > Thank you to review the patch.? I went to brush up my >> English grammar and then update my patch to rev04. >> ???????? > https://cr.openjdk.java.net/~xliu/8151779/04/webrev/ >> ???????? > Here is the incremental diff: >> https://cr.openjdk.java.net/~xliu/8151779/r3_to_r4.diff? It reflect >> changes based on David's feedbacks. I really appreciate that you >> review so carefully and found so many invaluable suggestions. TBH, I >> don't understand Amazon's copyright header neither. I choose the >> simple way to dodge that problem. >> >> ???????? In vmSymbols.hpp >> >> ???????? +? // 1. Disable/Control Intrinsic accept a list of >> intrinsic IDs. >> >> ???????? s/accept/accepts/ >> >> ???????? +? //??? their final value are subject to hardware inspection >> ???????? (VM_Version::initialize). >> >> ???????? s/value/values/ >> >> ???????? Otherwise all my nits have been addressed - thanks. >> >> ???????? I don't need to see a further webrev. >> >> ???????? Thanks, >> ???????? David >> ???????? ----- >> >> ???????? > Nils points out a very tricky question. Yes, I also notice >> that each TriBool takes 4 bytes on x86_64. It's a natural machine >> word and supposed to be the most efficient form.? As a result, the >> vector control_words take about 1.3Kb for all intrinsics. I thought >> it's not a big deal, but Nils brought up that each DirectiveSet will >> increase from 128b to 1440b.? Theoretically, the user may provide a >> CompileCommandFile which consists of hundreds of directives. Will >> hotspot have hundreds of DirectiveSet in that case? >> ???????? > >> ???????? > Actually, I do have a compacted container of TriBool. It's >> like a vector specialization. >> ???????? > https://cr.openjdk.java.net/~xliu/8151779/TriBool.cpp >> ???????? > >> ???????? > The reason I didn't include it because I still feel that a >> few KiloBytes memories are not a big deal. Nowadays, hotspot allows >> Java programmers allocate over 100G heap.? Is it wise to increase >> software complexity to save KBs? >> ???????? > >> ???????? > If you think it matters, I can integrate it.? May I update >> TriBoolArray in a standalone JBS? I have made a lot of changes. I >> hope I can verify them using KitchenSink? >> ???????? > >> ???????? > For the second problem,? I think it's because I used >> 'memset' to initialize an array of objects in rev01. Previously, I >> had code like this: >> ???????? > memset(&_intrinsic_control_words[0], 0, >> sizeof(_intrinsic_control_words)); >> ???????? > >> ???????? > This kind of usage will be warned as >> -Werror=class-memaccess in g++-8.? I have fixed it since rev02. I use >> DirectiveSet::fill_in(). Please check out. >> ???????? > >> ???????? > Thanks, >> ???????? > --lx >> ???????? > >> >> > From tobias.hartmann at oracle.com Wed Jun 17 08:03:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 10:03:12 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Hi Felix, On 15.06.20 10:49, Yangfei (Felix) wrote: > Sorry for the late reply. > Not sure if I understand correctly. Do you mean something like this? > > diff -r 6ab7805df10d src/hotspot/share/opto/memnode.cpp > --- a/src/hotspot/share/opto/memnode.cpp Sat Jun 13 01:00:00 2020 +0200 > +++ b/src/hotspot/share/opto/memnode.cpp Mon Jun 15 16:40:57 2020 +0800 > @@ -4618,7 +4618,8 @@ > } > if (phi_mem != NULL) { > // equivalent phi nodes; revert to the def > - new_mem = new_base; > + if (phi_base->adr_type()->higher_equal(phi_mem->adr_type())) > + new_mem = new_base; > } > } > } Yes, that would fix the issue you are seeing, right? Best regards, Tobias From tobias.hartmann at oracle.com Wed Jun 17 08:08:38 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 10:08:38 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: Hi Felix, On 15.06.20 11:13, Yangfei (Felix) wrote: > I see the logic is also triggered for non-OSR compiles with the reduced test case. > But it didn't trigger a bug in that case even through the logic didn't do the right thing. > > My initial patch removes the logic and no performance impact witnessed for specjbb2017. > Since the code is there from day one, maybe it's hard to find out. Even if the change introduces a regression (for example, blocking some loop optimization due to useless memory merges not being removed), I think it's unlikely to show in SPECjbb. To get a sense of how often this code is triggered, you could simply add an assert or log code and run the jtreg compiler tests. Best regards, Tobias From tobias.hartmann at oracle.com Wed Jun 17 08:16:30 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 10:16:30 +0200 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Message-ID: Hi Ludovic, On 17.06.20 07:17, Ludovic Henry wrote: > I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. Looks good to me too. I'll sponsor. Best regards, Tobias From tobias.hartmann at oracle.com Wed Jun 17 10:19:13 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 12:19:13 +0200 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Message-ID: Hi Ludovic, testing this, I'm hitting the "unexpected CallGenerator" assert with all kinds of tests. I see that you've sent a patch for that as well but here it happens without -XX:+AlwaysIncrementalInline. The reason for that is that before your patch !delayed_forbidden was always true and therefore the assert condition did always hold. I would suggest to fix both issues in one. I've also noticed that the extra brackets around "((should_delay || AlwaysIncrementalInline))" should be removed. Instead of sending patches inlined into emails, you could temporarily upload a webrev to webspace other that cr.openjdk.java.net to make it easier to review. Best regards, Tobias On 17.06.20 10:16, Tobias Hartmann wrote: > Hi Ludovic, > > On 17.06.20 07:17, Ludovic Henry wrote: >> I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. > > Looks good to me too. I'll sponsor. > > Best regards, > Tobias > From tobias.hartmann at oracle.com Wed Jun 17 10:21:13 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 12:21:13 +0200 Subject: RFR(S): Remove faulty assertion in Compile::call_generator triggered with -XX:+AlwaysIncrementalInline In-Reply-To: References: Message-ID: <7022ad41-8a6d-7d86-5842-b438dc0cbc52@oracle.com> Hi Ludovic, this is only an issue with your patch that removes 'delayed_forbidden' (see my reply in that thread). I would therefore suggest to remove the assert with that patch. Best regards, Tobias On 17.06.20 07:16, Ludovic Henry wrote: > Hi, > > While playing with -XX:+AlwaysIncrementalInline, I ran into an assertion triggering on pretty much any tests in test/hotspot/jtreg in debug (slow or fast, assertion just need to be enabled). I tracked it down to the following issue and explanation. > > In CallGenerator, when calling `for_method_handle_call`, it will redirect to one of the following: > - `for_method_handle_inline` itself redirecting to `Compile::call_generator` > - `for_late_inline` > - `for_mh_late_inline` > - `for_direct_call` > > The removed assert would fail if we end up calling `for_late_inline`. That would be the case when `for_method_handle_inline` returns successfully and AlwaysIncrementalInline is set. This would guarantee failure whenever this global setting is set with -XX:+AlwaysIncrementalInline > > In this case, we can safely remove the assert because the case treated specificically in `for_mh_late_inline` is treated by the combination of `for_method_handle_inline` and `for_late_inline`. > > Looking at `LateInlineMHCallGenerator::do_late_inline_check`, which is called from `LateInlineCallGenerator::do_late_inline`, we notice that it itself calls `for_method_handle_inline`. This means that `for_method_handle_inline` is called through the two following path with late inlining: > 1. `for_late_inline` -> `do_late_inline` <- `for_method_handle_inline` (which was previously called and its result passed through `for_late_inline(inline_cg)`) > 2. `for_late_mh_inline` -> `do_late_inline` -> `do_late_inline_check` -> `for_method_handle_inline` > > So whether we return a LateInlineCallGenerator or a LateInlineMHCallGenerator from `for_method_handle_call`, the logic in `for_method_handle_inline` will be called. > > I do not have authorship status, so I can neither create a JBS nor a webrev. I am working with a colleague to do it, and I'll link it here, but feel free to create any of it before that. The diff is available at [1] > > Thank you, > > -- > Ludovic > > [1] > diff --git a/src/hotspot/share/opto/doCall.cpp b/src/hotspot/share/opto/doCall.cpp > index c4d55d0d4c4..ba5f737592e 100644 > --- a/src/hotspot/share/opto/doCall.cpp > +++ b/src/hotspot/share/opto/doCall.cpp > @@ -146,7 +146,6 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool > // have bytecodes and so normal inlining fails. > if (callee->is_method_handle_intrinsic()) { > CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee); > - assert(cg == NULL || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); > return cg; > } > > From felix.yang at huawei.com Wed Jun 17 12:30:08 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 17 Jun 2020 12:30:08 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Wednesday, June 17, 2020 4:03 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 15.06.20 10:49, Yangfei (Felix) wrote: > > Sorry for the late reply. > > Not sure if I understand correctly. Do you mean something like this? > > > > diff -r 6ab7805df10d src/hotspot/share/opto/memnode.cpp > > --- a/src/hotspot/share/opto/memnode.cpp Sat Jun 13 01:00:00 2020 > +0200 > > +++ b/src/hotspot/share/opto/memnode.cpp Mon Jun 15 16:40:57 > 2020 +0800 > > @@ -4618,7 +4618,8 @@ > > } > > if (phi_mem != NULL) { > > // equivalent phi nodes; revert to the def > > - new_mem = new_base; > > + if (phi_base->adr_type()->higher_equal(phi_mem->adr_type())) > > + new_mem = new_base; > > } > > } > > } > > Yes, that would fix the issue you are seeing, right? Thanks for confirming that. Yes, this works for the reduced test case. I see phi_base was calculated from base_memory(). That is in(AliasIdxBot)) which is a "wide" memory state containing all alias categories. So I was thinking that maybe the condition " phi_base->adr_type()->higher_equal(phi_mem->adr_type())" will always equals false? If that is true, then this is the same in functionality with my initial patch. Could you please clarify? Felix From felix.yang at huawei.com Wed Jun 17 12:32:57 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 17 Jun 2020 12:32:57 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Wednesday, June 17, 2020 4:09 PM > To: Yangfei (Felix) ; Nils Eliasson > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 15.06.20 11:13, Yangfei (Felix) wrote: > > I see the logic is also triggered for non-OSR compiles with the reduced test > case. > > But it didn't trigger a bug in that case even through the logic didn't do the > right thing. > > > > My initial patch removes the logic and no performance impact witnessed > for specjbb2017. > > Since the code is there from day one, maybe it's hard to find out. > > Even if the change introduces a regression (for example, blocking some loop > optimization due to useless memory merges not being removed), I think it's > unlikely to show in SPECjbb. OK, that make sense to me. > To get a sense of how often this code is triggered, you could simply add an > assert or log code and run the jtreg compiler tests. When I try something like: diff -r 10d1e833ba25 src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp Wed Jun 17 05:28:05 2020 +0200 +++ b/src/hotspot/share/opto/memnode.cpp Wed Jun 17 19:53:44 2020 +0800 @@ -4619,6 +4619,7 @@ if (phi_mem != NULL) { // equivalent phi nodes; revert to the def new_mem = new_base; + assert(false, "just checking"); } } } Then I even fail to do a slowdebug build with this. So I suppose this is not something rarely executed. Build log: ...... Creating interim java.base.jmod Creating interim jimage # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/memnode.cpp:4622 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/yangfei/openjdk-jdk/src/hotspot/share/opto/memnode.cpp:4622), pid=2010, tid=2031 # assert(false) failed: just checking # # JRE version: OpenJDK Runtime Environment (16.0) (slowdebug build 16-internal+0-adhoc.yangfei.openjdk-jdk) # Java VM: OpenJDK 64-Bit Server VM (slowdebug 16-internal+0-adhoc.yangfei.openjdk-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xdf9071] MergeMemNode::Ideal(PhaseGVN*, bool)+0x493 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/yangfei/openjdk-jdk/make/core.2010) # # An error report file with more information is saved as: # /home/yangfei/openjdk-jdk/make/hs_err_pid2010.log # # Compiler replay data is saved as: # /home/yangfei/openjdk-jdk/make/replay_pid2010.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # InterimImage.gmk:47: recipe for target '/home/yangfei/openjdk-jdk/build/linux-x86_64-server-slowdebug/support/interim-image/bin/java' failed make[3]: *** [/home/yangfei/openjdk-jdk/build/linux-x86_64-server-slowdebug/support/interim-image/bin/java] Aborted (core dumped) make/Main.gmk:576: recipe for target 'interim-image' failed make[2]: *** [interim-image] Error 2 ERROR: Build failed for target 'install' in configuration 'linux-x86_64-server-slowdebug' (exit code 2) From tobias.hartmann at oracle.com Wed Jun 17 13:26:24 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Jun 2020 15:26:24 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Message-ID: Hi Felix, On 17.06.20 14:30, Yangfei (Felix) wrote: > Thanks for confirming that. Yes, this works for the reduced test case. > I see phi_base was calculated from base_memory(). That is in(AliasIdxBot)) which is a "wide" memory state containing all alias categories. > So I was thinking that maybe the condition " phi_base->adr_type()->higher_equal(phi_mem->adr_type())" will always equals false? > If that is true, then this is the same in functionality with my initial patch. Could you please clarify? Right, that wouldn't work. What I was trying to suggest is to replace the MergeMem by the Phi with the most restrictive type if all inputs are Phis. I think that would be an Identity optimization and only work with MergeMems that have two inputs. For example, in your case it should be safe to replace the 4: MergeMem by 1: Phi1, right? Best regards, Tobias From luhenry at microsoft.com Wed Jun 17 14:27:32 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 17 Jun 2020 14:27:32 +0000 Subject: RFR(S): Remove faulty assertion in Compile::call_generator triggered with -XX:+AlwaysIncrementalInline In-Reply-To: <7022ad41-8a6d-7d86-5842-b438dc0cbc52@oracle.com> References: <7022ad41-8a6d-7d86-5842-b438dc0cbc52@oracle.com> Message-ID: Ok, let me squash the changes together and upload it. Thank you for looking into it. -- Ludovic -----Original Message----- From: Tobias Hartmann Sent: Wednesday, June 17, 2020 3:21 AM To: Ludovic Henry ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Remove faulty assertion in Compile::call_generator triggered with -XX:+AlwaysIncrementalInline Hi Ludovic, this is only an issue with your patch that removes 'delayed_forbidden' (see my reply in that thread). I would therefore suggest to remove the assert with that patch. Best regards, Tobias On 17.06.20 07:16, Ludovic Henry wrote: > Hi, > > While playing with -XX:+AlwaysIncrementalInline, I ran into an assertion triggering on pretty much any tests in test/hotspot/jtreg in debug (slow or fast, assertion just need to be enabled). I tracked it down to the following issue and explanation. > > In CallGenerator, when calling `for_method_handle_call`, it will redirect to one of the following: > - `for_method_handle_inline` itself redirecting to `Compile::call_generator` > - `for_late_inline` > - `for_mh_late_inline` > - `for_direct_call` > > The removed assert would fail if we end up calling `for_late_inline`. That would be the case when `for_method_handle_inline` returns successfully and AlwaysIncrementalInline is set. This would guarantee failure whenever this global setting is set with -XX:+AlwaysIncrementalInline > > In this case, we can safely remove the assert because the case treated specificically in `for_mh_late_inline` is treated by the combination of `for_method_handle_inline` and `for_late_inline`. > > Looking at `LateInlineMHCallGenerator::do_late_inline_check`, which is called from `LateInlineCallGenerator::do_late_inline`, we notice that it itself calls `for_method_handle_inline`. This means that `for_method_handle_inline` is called through the two following path with late inlining: > 1. `for_late_inline` -> `do_late_inline` <- `for_method_handle_inline` (which was previously called and its result passed through `for_late_inline(inline_cg)`) > 2. `for_late_mh_inline` -> `do_late_inline` -> `do_late_inline_check` -> `for_method_handle_inline` > > So whether we return a LateInlineCallGenerator or a LateInlineMHCallGenerator from `for_method_handle_call`, the logic in `for_method_handle_inline` will be called. > > I do not have authorship status, so I can neither create a JBS nor a webrev. I am working with a colleague to do it, and I'll link it here, but feel free to create any of it before that. The diff is available at [1] > > Thank you, > > -- > Ludovic > > [1] > diff --git a/src/hotspot/share/opto/doCall.cpp b/src/hotspot/share/opto/doCall.cpp > index c4d55d0d4c4..ba5f737592e 100644 > --- a/src/hotspot/share/opto/doCall.cpp > +++ b/src/hotspot/share/opto/doCall.cpp > @@ -146,7 +146,6 @@ CallGenerator* Compile::call_generator(ciMethod* callee, int vtable_index, bool > // have bytecodes and so normal inlining fails. > if (callee->is_method_handle_intrinsic()) { > CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, caller, callee); > - assert(cg == NULL || !cg->is_late_inline() || cg->is_mh_late_inline(), "unexpected CallGenerator"); > return cg; > } > > From vladimir.kozlov at oracle.com Wed Jun 17 16:10:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Jun 2020 09:10:19 -0700 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: <3685d03c-e752-638f-ab75-434968311f08@oracle.com> No further comments from me. Yes, we can work on stubs later. Thanks, Vladimir On 6/16/20 8:30 PM, Pengfei Li wrote: > Hi Vladimir, > >> Looking on generated code I see that vectorized loop may unroll 16 times (16 >> vector instructions by 256 bytes) where >> generate_fill() stub on x86 has 2 (256 bytes wide) instructions per iteration >> and 1 instruction for avx512 [2]. >> Also stub has alot of pre- and post-loop instructions and checks. > > Right, I also take a look at x86 generated stub code and think the performance is potentially to be improved if the loop is unrolled more times. The AArch64 stub code is manually unrolled 8 times and it has almost no performance difference with the auto-vectorized version in general cases. > >> I thought may be we can improve stub. But it seems vectorized loop with >> predicates is more compact and efficient. And it is auto generated! >> >> Base on results I agree with you switching off fill optimization on x86. >> >> There could be side effects due to loops code will be larger (vs stub call) but >> we have it already right now before your changes so I don't think we will see >> regression for GCs which use strip mining. > > Trying to improve the stub is my next plan. I believe both x86 and AArch64 stubs have room for improvement. So I prefer the keep the stub code for now and check if it can win the auto-vectorized version after been improved in the near future. But I hope some Intel guy could help with the x86 backend part since I'm not quite familiar with new x86 instructions. > > I'm also studying the experimental feature of vectorized loop with predicates optimization in recent days (the PostLoopMultiversioning). But I found it's more complex and not working well now. This could be another long term goal. > > Please let me know if you have further comments. > > -- > Thanks, > Pengfei > From luhenry at microsoft.com Wed Jun 17 19:39:01 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 17 Jun 2020 19:39:01 +0000 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Message-ID: The webrev is available at http://cr.openjdk.java.net/~adityam/ludovic/rem_delayed_forbidden/jdk.patch. It also contains the fix for https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-June/038636.html. -- Ludovic -----Original Message----- From: hotspot-compiler-dev On Behalf Of Ludovic Henry Sent: Tuesday, June 16, 2020 10:18 PM To: Vladimir Ivanov ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): Remove some dead code in C2 Hi, Thank you for the review! I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. -- Ludovic -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov Sent: Tuesday, June 16, 2020 11:15 AM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): Remove some dead code in C2 Good catch, Ludovic! Looks good to me as well. Best regards, Vladimir Ivanov On 16.06.2020 19:38, Vladimir Kozlov wrote: > Hi Ludovic, > > Looks good. > > Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: > > "But after JDK-8072008 there's no problem with delaying inlining. C2 can > decide whether to keep the direct call or inline through it. So, I > enabled late inlining for all linkers. (Surprisingly, no significant > performance difference on nashorn.)" > > I created RFE https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8247697&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=gwXvpyGBz7IE4Vq7Kr95v5KRysoGKEEpg4tXil3y%2FSg%3D&reserved=0 for this. > > Thanks, > Vladimir > > [1] > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2016-February%2F021137.html&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=mgOIg%2F9zsu5f22BDoeKRnZsUX2VE7WmQtpHKChbSRMU%3D&reserved=0 > > > PS: Does someone in your group have OpenJDK Author status so he can file > issues in JBS? > > On 6/15/20 4:58 PM, Ludovic Henry wrote: >> Hi, >> >> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >> never to be used. >> >> ?From doing some mercurial archeology, the change was introduced with >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2F823590505eb4&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=oAeWirekxu3yo5CSiSND1lYSvRnc8pw5blBYFlXegqY%3D&reserved=0, and later >> modified with https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2Fe1685e30beca&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=TsPQvy9ZIoNPMZaazl8nA%2F9F0iPQpghHMRRp0QbT3Ec%3D&reserved=0 >> which removed all uses of `delayed_forbidden`. >> >> As I am not an author, please find following the patch removing the >> dead code. >> >> Thank you, >> >> -- >> Ludovic >> >> diff --git a/src/hotspot/share/opto/callGenerator.cpp >> b/src/hotspot/share/opto/callGenerator.cpp >> index 1092d582184..606325d4dd6 100644 >> --- a/src/hotspot/share/opto/callGenerator.cpp >> +++ b/src/hotspot/share/opto/callGenerator.cpp >> @@ -821,13 +821,13 @@ JVMState* >> PredictedCallGenerator::generate(JVMState* jvms) { >> ? } >> >> >> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >> ciMethod* caller, ciMethod* callee) { >> ??? assert(callee->is_method_handle_intrinsic(), >> "for_method_handle_call mismatch"); >> ??? bool input_not_const; >> ??? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >> caller, callee, input_not_const); >> ??? Compile* C = Compile::current(); >> ??? if (cg != NULL) { >> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >> +??? if (AlwaysIncrementalInline) { >> ??????? return CallGenerator::for_late_inline(callee, cg); >> ????? } else { >> ??????? return cg; >> diff --git a/src/hotspot/share/opto/callGenerator.hpp >> b/src/hotspot/share/opto/callGenerator.hpp >> index 46bf9f5d7f9..3a04fa4d5cd 100644 >> --- a/src/hotspot/share/opto/callGenerator.hpp >> +++ b/src/hotspot/share/opto/callGenerator.hpp >> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >> ??? static CallGenerator* for_direct_call(ciMethod* m, bool >> separate_io_projs = false);?? // static, special >> ??? static CallGenerator* for_virtual_call(ciMethod* m, int >> vtable_index);? // virtual, interface >> >> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >> ciMethod* caller, ciMethod* callee); >> ??? static CallGenerator* for_method_handle_inline(JVMState* jvms, >> ciMethod* caller, ciMethod* callee, bool& input_not_const); >> >> ??? // How to generate a replace a direct call with an inline version >> diff --git a/src/hotspot/share/opto/compile.hpp >> b/src/hotspot/share/opto/compile.hpp >> index a922a905707..bd0def2997c 100644 >> --- a/src/hotspot/share/opto/compile.hpp >> +++ b/src/hotspot/share/opto/compile.hpp >> @@ -854,7 +854,7 @@ class Compile : public Phase { >> ??? // The profile factor is a discount to apply to this site's >> interp. profile. >> ??? CallGenerator*??? call_generator(ciMethod* call_method, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????? JVMState* jvms, bool >> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >> = NULL, >> -?????????????????????????????????? bool allow_intrinsics = true, bool >> delayed_forbidden = false); >> +?????????????????????????????????? bool allow_intrinsics = true); >> ??? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >> ????? return should_delay_string_inlining(call_method, jvms) || >> ???????????? should_delay_boxing_inlining(call_method, jvms); >> diff --git a/src/hotspot/share/opto/doCall.cpp >> b/src/hotspot/share/opto/doCall.cpp >> index c26dc4b682d..c4d55d0d4c4 100644 >> --- a/src/hotspot/share/opto/doCall.cpp >> +++ b/src/hotspot/share/opto/doCall.cpp >> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >> *method, int depth, int bci, ciMeth >> ? CallGenerator* Compile::call_generator(ciMethod* callee, int >> vtable_index, bool call_does_dispatch, >> ???????????????????????????????????????? JVMState* jvms, bool >> allow_inline, >> ???????????????????????????????????????? float prof_factor, ciKlass* >> speculative_receiver_type, >> -?????????????????????????????????????? bool allow_intrinsics, bool >> delayed_forbidden) { >> +?????????????????????????????????????? bool allow_intrinsics) { >> ??? ciMethod*?????? caller?? = jvms->method(); >> ??? int???????????? bci????? = jvms->bci(); >> ??? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??? // MethodHandle.invoke* are native methods which obviously don't >> ??? // have bytecodes and so normal inlining fails. >> ??? if (callee->is_method_handle_intrinsic()) { >> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee, delayed_forbidden); >> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >> caller, callee); >> +??? assert(cg == NULL || !cg->is_late_inline() || >> cg->is_mh_late_inline(), "unexpected CallGenerator"); >> ????? return cg; >> ??? } >> >> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >> callee, int vtable_index, bool >> ??????????? // opportunity to perform some high level optimizations >> ??????????? // first. >> ??????????? if (should_delay_string_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_string_late_inline(callee, cg); >> ??????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >> -??????????? assert(!delayed_forbidden, "strange"); >> ????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >> !delayed_forbidden) { >> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >> ????????????? return CallGenerator::for_late_inline(callee, cg); >> ??????????? } >> ????????? } >> >> From doug.simon at oracle.com Wed Jun 17 21:34:37 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 17 Jun 2020 23:34:37 +0200 Subject: [15] RFR: 8241802: [Graal] compiler/loopopts/TestLogSum.java timed out Message-ID: Please review this change cherry-picked from upstream Graal to fix a problem where compiling a method with many nested loops took too long. The heart of the fix (authored by Gilles Duboscq) is to limit the amount of work done by the LoopFullUnrollPhase. https://bugs.openjdk.java.net/browse/JDK-8241802 https://dougxc.github.io/webrevs/8241802/index.html Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From vladimir.kozlov at oracle.com Wed Jun 17 22:21:57 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Jun 2020 15:21:57 -0700 Subject: [15] RFR: 8241802: [Graal] compiler/loopopts/TestLogSum.java timed out In-Reply-To: References: Message-ID: <84236421-ccc7-ba10-8c42-ae21b31ad9a6@oracle.com> Looks good. Thanks, Vladimir On 6/17/20 2:34 PM, Doug Simon wrote: > Please review this change cherry-picked from upstream Graal to fix a problem where compiling a method with many nested loops took too long. > The heart of the fix (authored by Gilles Duboscq) is to limit the amount of work done by the LoopFullUnrollPhase. > > https://bugs.openjdk.java.net/browse/JDK-8241802 > https://dougxc.github.io/webrevs/8241802/index.html > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From tobias.hartmann at oracle.com Thu Jun 18 06:17:00 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 18 Jun 2020 08:17:00 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> Message-ID: <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> Hi, performance testing revealed a slight regression with microbenchmarks due to ConvI2L nodes not always being processed by IGVN and therefore not being optimized. Here's a new version that makes sure that such ConvI2L are always recorded for IGVN. Performance numbers look good now. http://cr.openjdk.java.net/~thartmann/8237950/webrev.01/ Thanks, Tobias On 16.06.20 08:34, Tobias Hartmann wrote: > Thanks Vladimir! I'll run our regular performance testing. > > Best regards, > Tobias > > On 15.06.20 20:02, Vladimir Kozlov wrote: >> +1 >> >> I would suggest to do our regular performance testing to make sure there is no regression. >> >> Thanks, >> Vladimir >> >> On 6/15/20 6:31 AM, Nils Eliasson wrote: >>> Hi Tobias, >>> >>> The change looks reasonable. >>> >>> Reviewed. >>> >>> Best regards, >>> Nils >>> >>> On 2020-06-15 13:22, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>>> >>>> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >>>> which emits direct stores into the String internal byte array. GraphKit::array_element_address emits >>>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >>>> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and >>>> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit >>>> the node limit because during GVN, dead nodes are not removed. >>>> >>>> I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for >>>> the string concat optimizations because "normal" array accesses have a range check dependent CastII >>>> which blocks that ConvI2L optimization during parsing. >>>> >>>> Thanks, >>>> Tobias >>> From Pengfei.Li at arm.com Thu Jun 18 07:53:07 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Thu, 18 Jun 2020 07:53:07 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: <3685d03c-e752-638f-ab75-434968311f08@oracle.com> References: <3685d03c-e752-638f-ab75-434968311f08@oracle.com> Message-ID: Thanks Vladimir. Do I still need another reviewer to look at this fix? BTW: Yesterday I mentioned vectorized loop with predicates isn't working well now. I've just created a bug (https://bugs.openjdk.java.net/browse/JDK-8247838) for it. You could take a look if you're interested. -- Thanks, Pengfei > -----Original Message----- > From: Vladimir Kozlov > Sent: Thursday, June 18, 2020 00:10 > To: Pengfei Li ; hotspot-compiler- > dev at openjdk.java.net > Cc: nd > Subject: Re: RFR(S): 8247307: C2: Loop array fill stub routines are not called > > No further comments from me. > > Yes, we can work on stubs later. > > Thanks, > Vladimir > > On 6/16/20 8:30 PM, Pengfei Li wrote: > > Hi Vladimir, > > > >> Looking on generated code I see that vectorized loop may unroll 16 > >> times (16 vector instructions by 256 bytes) where > >> generate_fill() stub on x86 has 2 (256 bytes wide) instructions per > >> iteration and 1 instruction for avx512 [2]. > >> Also stub has alot of pre- and post-loop instructions and checks. > > > > Right, I also take a look at x86 generated stub code and think the > performance is potentially to be improved if the loop is unrolled more times. > The AArch64 stub code is manually unrolled 8 times and it has almost no > performance difference with the auto-vectorized version in general cases. > > > >> I thought may be we can improve stub. But it seems vectorized loop > >> with predicates is more compact and efficient. And it is auto generated! > >> > >> Base on results I agree with you switching off fill optimization on x86. > >> > >> There could be side effects due to loops code will be larger (vs stub > >> call) but we have it already right now before your changes so I don't > >> think we will see regression for GCs which use strip mining. > > > > Trying to improve the stub is my next plan. I believe both x86 and AArch64 > stubs have room for improvement. So I prefer the keep the stub code for > now and check if it can win the auto-vectorized version after been improved > in the near future. But I hope some Intel guy could help with the x86 backend > part since I'm not quite familiar with new x86 instructions. > > > > I'm also studying the experimental feature of vectorized loop with > predicates optimization in recent days (the PostLoopMultiversioning). But I > found it's more complex and not working well now. This could be another > long term goal. > > > > Please let me know if you have further comments. > > > > -- > > Thanks, > > Pengfei > > From doug.simon at oracle.com Thu Jun 18 08:38:18 2020 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 18 Jun 2020 10:38:18 +0200 Subject: [15] RFR: 8241802: [Graal] compiler/loopopts/TestLogSum.java timed out In-Reply-To: <84236421-ccc7-ba10-8c42-ae21b31ad9a6@oracle.com> References: <84236421-ccc7-ba10-8c42-ae21b31ad9a6@oracle.com> Message-ID: <27F60825-F4A3-443A-B473-36A60E5688F2@oracle.com> Thanks Vladimir. All testing passed so I?ll push it soon. -Doug > On 18 Jun 2020, at 00:21, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 6/17/20 2:34 PM, Doug Simon wrote: >> Please review this change cherry-picked from upstream Graal to fix a problem where compiling a method with many nested loops took too long. >> The heart of the fix (authored by Gilles Duboscq) is to limit the amount of work done by the LoopFullUnrollPhase. >> https://bugs.openjdk.java.net/browse/JDK-8241802 >> https://dougxc.github.io/webrevs/8241802/index.html >> Testing: hs-tier1,hs-tier2,hs-tier3-graal >> -Doug From felix.yang at huawei.com Thu Jun 18 09:02:50 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 18 Jun 2020 09:02:50 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Wednesday, June 17, 2020 9:26 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 17.06.20 14:30, Yangfei (Felix) wrote: > > Thanks for confirming that. Yes, this works for the reduced test case. > > I see phi_base was calculated from base_memory(). That is in(AliasIdxBot)) > which is a "wide" memory state containing all alias categories. > > So I was thinking that maybe the condition " phi_base->adr_type()- > >higher_equal(phi_mem->adr_type())" will always equals false? > > If that is true, then this is the same in functionality with my initial patch. > Could you please clarify? > > Right, that wouldn't work. What I was trying to suggest is to replace the > MergeMem by the Phi with the most restrictive type if all inputs are Phis. I > think that would be an Identity optimization and only work with MergeMems > that have two inputs. For example, in your case it should be safe to replace > the 4: MergeMem by 1: Phi1, right? For the first iteration on the loop, the MergeMem node actually have three phis as input. It looks like: (gdb) p this->dump() 556 MergeMem === _ 1 660 655 1 657 [[ 559 ]] { N655:rawptr:BotPTR - N657:TestReplaceEquivPhis+12 * } Memory: @BotPTR *+bot, idx=Bot; !orig=151 !jvms: TestReplaceEquivPhis::test @ bci:25 $2 = void (gdb) p in(2)->dump() 660 Phi === 679 752 583 [[ 556 533 ]] #memory Memory: @BotPTR *+bot, idx=Bot; $3 = void (gdb) p in(3)->dump() 655 Phi === 679 752 583 [[ 556 ]] #memory Memory: @rawptr:BotPTR, idx=Raw; $4 = void (gdb) p in(5)->dump() 657 Phi === 679 752 583 [[ 556 516 ]] #memory Memory: @TestReplaceEquivPhis+12 *, name=iFld, idx=5; $5 = void After the first iteration, in(3) (i.e., in(AliasIdxRaw)) was removed from the inputs. I am not sure if this transformation is correct even through it does not make a difference on app behavior. After that, the MergeMem have two phi nodes as input: in(5) and in(2) ((i.e., in(AliasIdxBot))). This is the same structure as described in my first email. Here, I see in(2) is also an input for node 533: (gdb) p in(2)->find(533)->dump() 533 LoadI === _ 660 180 [[ 517 532 559 ]] @java/lang/Class:exact+120 *, name=instanceCount, idx=6; Volatile! #int !orig=181 !jvms: TestReplaceEquivPhis::test @ bci:39 If we have a store to the same memory slice as in(5) after the MergeMem node, I think we might trigger one similar bug if we decide to keep in(5) here. So I guess it might be safer to go with the initially proposed patch. Thanks, Felix From tobias.hartmann at oracle.com Thu Jun 18 11:43:35 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 18 Jun 2020 13:43:35 +0200 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> Message-ID: <6367197d-050c-2cdf-4575-3b1fc67d7be4@oracle.com> Hi Ludovic, thanks for updating. Looks good to me. Best regards, Tobias On 17.06.20 21:39, Ludovic Henry wrote: > The webrev is available at http://cr.openjdk.java.net/~adityam/ludovic/rem_delayed_forbidden/jdk.patch. It also contains the fix for https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-June/038636.html. > > -- > Ludovic > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Ludovic Henry > Sent: Tuesday, June 16, 2020 10:18 PM > To: Vladimir Ivanov ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR(S): Remove some dead code in C2 > > Hi, > > Thank you for the review! > > I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. > > -- > Ludovic > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov > Sent: Tuesday, June 16, 2020 11:15 AM > To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): Remove some dead code in C2 > > Good catch, Ludovic! > > Looks good to me as well. > > Best regards, > Vladimir Ivanov > > On 16.06.2020 19:38, Vladimir Kozlov wrote: >> Hi Ludovic, >> >> Looks good. >> >> Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: >> >> "But after JDK-8072008 there's no problem with delaying inlining. C2 can >> decide whether to keep the direct call or inline through it. So, I >> enabled late inlining for all linkers. (Surprisingly, no significant >> performance difference on nashorn.)" >> >> I created RFE https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8247697&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=gwXvpyGBz7IE4Vq7Kr95v5KRysoGKEEpg4tXil3y%2FSg%3D&reserved=0 for this. >> >> Thanks, >> Vladimir >> >> [1] >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2016-February%2F021137.html&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=mgOIg%2F9zsu5f22BDoeKRnZsUX2VE7WmQtpHKChbSRMU%3D&reserved=0 >> >> >> PS: Does someone in your group have OpenJDK Author status so he can file >> issues in JBS? >> >> On 6/15/20 4:58 PM, Ludovic Henry wrote: >>> Hi, >>> >>> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >>> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >>> never to be used. >>> >>> ?From doing some mercurial archeology, the change was introduced with >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2F823590505eb4&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=oAeWirekxu3yo5CSiSND1lYSvRnc8pw5blBYFlXegqY%3D&reserved=0, and later >>> modified with https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2Fe1685e30beca&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=TsPQvy9ZIoNPMZaazl8nA%2F9F0iPQpghHMRRp0QbT3Ec%3D&reserved=0 >>> which removed all uses of `delayed_forbidden`. >>> >>> As I am not an author, please find following the patch removing the >>> dead code. >>> >>> Thank you, >>> >>> -- >>> Ludovic >>> >>> diff --git a/src/hotspot/share/opto/callGenerator.cpp >>> b/src/hotspot/share/opto/callGenerator.cpp >>> index 1092d582184..606325d4dd6 100644 >>> --- a/src/hotspot/share/opto/callGenerator.cpp >>> +++ b/src/hotspot/share/opto/callGenerator.cpp >>> @@ -821,13 +821,13 @@ JVMState* >>> PredictedCallGenerator::generate(JVMState* jvms) { >>> ? } >>> >>> >>> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >>> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>> ciMethod* caller, ciMethod* callee) { >>> ??? assert(callee->is_method_handle_intrinsic(), >>> "for_method_handle_call mismatch"); >>> ??? bool input_not_const; >>> ??? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >>> caller, callee, input_not_const); >>> ??? Compile* C = Compile::current(); >>> ??? if (cg != NULL) { >>> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >>> +??? if (AlwaysIncrementalInline) { >>> ??????? return CallGenerator::for_late_inline(callee, cg); >>> ????? } else { >>> ??????? return cg; >>> diff --git a/src/hotspot/share/opto/callGenerator.hpp >>> b/src/hotspot/share/opto/callGenerator.hpp >>> index 46bf9f5d7f9..3a04fa4d5cd 100644 >>> --- a/src/hotspot/share/opto/callGenerator.hpp >>> +++ b/src/hotspot/share/opto/callGenerator.hpp >>> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >>> ??? static CallGenerator* for_direct_call(ciMethod* m, bool >>> separate_io_projs = false);?? // static, special >>> ??? static CallGenerator* for_virtual_call(ciMethod* m, int >>> vtable_index);? // virtual, interface >>> >>> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >>> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>> ciMethod* caller, ciMethod* callee); >>> ??? static CallGenerator* for_method_handle_inline(JVMState* jvms, >>> ciMethod* caller, ciMethod* callee, bool& input_not_const); >>> >>> ??? // How to generate a replace a direct call with an inline version >>> diff --git a/src/hotspot/share/opto/compile.hpp >>> b/src/hotspot/share/opto/compile.hpp >>> index a922a905707..bd0def2997c 100644 >>> --- a/src/hotspot/share/opto/compile.hpp >>> +++ b/src/hotspot/share/opto/compile.hpp >>> @@ -854,7 +854,7 @@ class Compile : public Phase { >>> ??? // The profile factor is a discount to apply to this site's >>> interp. profile. >>> ??? CallGenerator*??? call_generator(ciMethod* call_method, int >>> vtable_index, bool call_does_dispatch, >>> ???????????????????????????????????? JVMState* jvms, bool >>> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >>> = NULL, >>> -?????????????????????????????????? bool allow_intrinsics = true, bool >>> delayed_forbidden = false); >>> +?????????????????????????????????? bool allow_intrinsics = true); >>> ??? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >>> ????? return should_delay_string_inlining(call_method, jvms) || >>> ???????????? should_delay_boxing_inlining(call_method, jvms); >>> diff --git a/src/hotspot/share/opto/doCall.cpp >>> b/src/hotspot/share/opto/doCall.cpp >>> index c26dc4b682d..c4d55d0d4c4 100644 >>> --- a/src/hotspot/share/opto/doCall.cpp >>> +++ b/src/hotspot/share/opto/doCall.cpp >>> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >>> *method, int depth, int bci, ciMeth >>> ? CallGenerator* Compile::call_generator(ciMethod* callee, int >>> vtable_index, bool call_does_dispatch, >>> ???????????????????????????????????????? JVMState* jvms, bool >>> allow_inline, >>> ???????????????????????????????????????? float prof_factor, ciKlass* >>> speculative_receiver_type, >>> -?????????????????????????????????????? bool allow_intrinsics, bool >>> delayed_forbidden) { >>> +?????????????????????????????????????? bool allow_intrinsics) { >>> ??? ciMethod*?????? caller?? = jvms->method(); >>> ??? int???????????? bci????? = jvms->bci(); >>> ??? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >>> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >>> callee, int vtable_index, bool >>> ??? // MethodHandle.invoke* are native methods which obviously don't >>> ??? // have bytecodes and so normal inlining fails. >>> ??? if (callee->is_method_handle_intrinsic()) { >>> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>> caller, callee, delayed_forbidden); >>> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >>> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >>> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>> caller, callee); >>> +??? assert(cg == NULL || !cg->is_late_inline() || >>> cg->is_mh_late_inline(), "unexpected CallGenerator"); >>> ????? return cg; >>> ??? } >>> >>> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >>> callee, int vtable_index, bool >>> ??????????? // opportunity to perform some high level optimizations >>> ??????????? // first. >>> ??????????? if (should_delay_string_inlining(callee, jvms)) { >>> -??????????? assert(!delayed_forbidden, "strange"); >>> ????????????? return CallGenerator::for_string_late_inline(callee, cg); >>> ??????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >>> -??????????? assert(!delayed_forbidden, "strange"); >>> ????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >>> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >>> !delayed_forbidden) { >>> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >>> ????????????? return CallGenerator::for_late_inline(callee, cg); >>> ??????????? } >>> ????????? } >>> >>> From rwestrel at redhat.com Thu Jun 18 12:55:56 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 18 Jun 2020 14:55:56 +0200 Subject: RFR(S): 8247824: CTW: C2 (Shenandoah) compilation fails with SEGV in SBC2Support::pin_and_expand Message-ID: <87sgeswn1f.fsf@redhat.com> https://bugs.openjdk.java.net/browse/JDK-8247824 http://cr.openjdk.java.net/~roland/8247824/webrev.00/ If a barrier is expanded in the outer loop of a strip mined loop nest, the outer loop head is changed to a new LoopNode so loop strip mining verification code doesn't trigger and fail. The crash occurs when there's 2 barriers in the outer loop and C2 attempts to transform the loop head twice. The second time, loop->_head points to a dead node. Roland. From rwestrel at redhat.com Thu Jun 18 14:13:01 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 18 Jun 2020 16:13:01 +0200 Subject: RFR(XS): 8247763: assert(outer->outcnt() == 2) failed: 'only phis' failure in LoopNode::verify_strip_mined() Message-ID: <87pn9wwjgy.fsf@redhat.com> https://bugs.openjdk.java.net/browse/JDK-8247763 http://cr.openjdk.java.net/~roland/8247763/webrev.00/ A store is sunk from a pre loop and control is set to the outer strip mined loop of the main loop (the only use of the store is the phi of the main loop). Fix simply sets control to entry control in that case. Roland. From vladimir.kozlov at oracle.com Thu Jun 18 15:43:47 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 18 Jun 2020 08:43:47 -0700 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> Message-ID: Good. Thanks, Vladimir On 6/17/20 11:17 PM, Tobias Hartmann wrote: > Hi, > > performance testing revealed a slight regression with microbenchmarks due to ConvI2L nodes not > always being processed by IGVN and therefore not being optimized. > > Here's a new version that makes sure that such ConvI2L are always recorded for IGVN. Performance > numbers look good now. > http://cr.openjdk.java.net/~thartmann/8237950/webrev.01/ > > Thanks, > Tobias > > > On 16.06.20 08:34, Tobias Hartmann wrote: >> Thanks Vladimir! I'll run our regular performance testing. >> >> Best regards, >> Tobias >> >> On 15.06.20 20:02, Vladimir Kozlov wrote: >>> +1 >>> >>> I would suggest to do our regular performance testing to make sure there is no regression. >>> >>> Thanks, >>> Vladimir >>> >>> On 6/15/20 6:31 AM, Nils Eliasson wrote: >>>> Hi Tobias, >>>> >>>> The change looks reasonable. >>>> >>>> Reviewed. >>>> >>>> Best regards, >>>> Nils >>>> >>>> On 2020-06-15 13:22, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>>>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>>>> >>>>> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >>>>> which emits direct stores into the String internal byte array. GraphKit::array_element_address emits >>>>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >>>>> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps in and >>>>> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We hit >>>>> the node limit because during GVN, dead nodes are not removed. >>>>> >>>>> I propose to simply postpone that optimization to IGVN. This only affects array accesses emitted for >>>>> the string concat optimizations because "normal" array accesses have a range check dependent CastII >>>>> which blocks that ConvI2L optimization during parsing. >>>>> >>>>> Thanks, >>>>> Tobias >>>> From tobias.hartmann at oracle.com Thu Jun 18 16:03:21 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 18 Jun 2020 18:03:21 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> Message-ID: <2505ee2d-abe2-8686-e1f1-926d42a52682@oracle.com> Thanks Vladimir! Best regards, Tobias On 18.06.20 17:43, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/17/20 11:17 PM, Tobias Hartmann wrote: >> Hi, >> >> performance testing revealed a slight regression with microbenchmarks due to ConvI2L nodes not >> always being processed by IGVN and therefore not being optimized. >> >> Here's a new version that makes sure that such ConvI2L are always recorded for IGVN. Performance >> numbers look good now. >> http://cr.openjdk.java.net/~thartmann/8237950/webrev.01/ >> >> Thanks, >> Tobias >> >> >> On 16.06.20 08:34, Tobias Hartmann wrote: >>> Thanks Vladimir! I'll run our regular performance testing. >>> >>> Best regards, >>> Tobias >>> >>> On 15.06.20 20:02, Vladimir Kozlov wrote: >>>> +1 >>>> >>>> I would suggest to do our regular performance testing to make sure there is no regression. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 6/15/20 6:31 AM, Nils Eliasson wrote: >>>>> Hi Tobias, >>>>> >>>>> The change looks reasonable. >>>>> >>>>> Reviewed. >>>>> >>>>> Best regards, >>>>> Nils >>>>> >>>>> On 2020-06-15 13:22, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following patch: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>>>>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>>>>> >>>>>> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >>>>>> which emits direct stores into the String internal byte array. GraphKit::array_element_address >>>>>> emits >>>>>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check dependent >>>>>> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps >>>>>> in and >>>>>> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. We >>>>>> hit >>>>>> the node limit because during GVN, dead nodes are not removed. >>>>>> >>>>>> I propose to simply postpone that optimization to IGVN. This only affects array accesses >>>>>> emitted for >>>>>> the string concat optimizations because "normal" array accesses have a range check dependent >>>>>> CastII >>>>>> which blocks that ConvI2L optimization during parsing. >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>> From vladimir.kozlov at oracle.com Thu Jun 18 20:27:51 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 18 Jun 2020 13:27:51 -0700 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: <6367197d-050c-2cdf-4575-3b1fc67d7be4@oracle.com> References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> <6367197d-050c-2cdf-4575-3b1fc67d7be4@oracle.com> Message-ID: +1 Thanks, Vladimir K On 6/18/20 4:43 AM, Tobias Hartmann wrote: > Hi Ludovic, > > thanks for updating. Looks good to me. > > Best regards, > Tobias > > On 17.06.20 21:39, Ludovic Henry wrote: >> The webrev is available at http://cr.openjdk.java.net/~adityam/ludovic/rem_delayed_forbidden/jdk.patch. It also contains the fix for https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-June/038636.html. >> >> -- >> Ludovic >> >> -----Original Message----- >> From: hotspot-compiler-dev On Behalf Of Ludovic Henry >> Sent: Tuesday, June 16, 2020 10:18 PM >> To: Vladimir Ivanov ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR(S): Remove some dead code in C2 >> >> Hi, >> >> Thank you for the review! >> >> I do not have authorship, so feel free to take my change and commit them directly (if that's the appropriate thing to do of course!). I'll work with a colleague who has authorship to get a webrev going, but feel free to take it from there if you want to see it happen sooner rather than later. >> >> -- >> Ludovic >> >> -----Original Message----- >> From: hotspot-compiler-dev On Behalf Of Vladimir Ivanov >> Sent: Tuesday, June 16, 2020 11:15 AM >> To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): Remove some dead code in C2 >> >> Good catch, Ludovic! >> >> Looks good to me as well. >> >> Best regards, >> Vladimir Ivanov >> >> On 16.06.2020 19:38, Vladimir Kozlov wrote: >>> Hi Ludovic, >>> >>> Looks good. >>> >>> Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: >>> >>> "But after JDK-8072008 there's no problem with delaying inlining. C2 can >>> decide whether to keep the direct call or inline through it. So, I >>> enabled late inlining for all linkers. (Surprisingly, no significant >>> performance difference on nashorn.)" >>> >>> I created RFE https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8247697&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=gwXvpyGBz7IE4Vq7Kr95v5KRysoGKEEpg4tXil3y%2FSg%3D&reserved=0 for this. >>> >>> Thanks, >>> Vladimir >>> >>> [1] >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2016-February%2F021137.html&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=mgOIg%2F9zsu5f22BDoeKRnZsUX2VE7WmQtpHKChbSRMU%3D&reserved=0 >>> >>> >>> PS: Does someone in your group have OpenJDK Author status so he can file >>> issues in JBS? >>> >>> On 6/15/20 4:58 PM, Ludovic Henry wrote: >>>> Hi, >>>> >>>> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >>>> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >>>> never to be used. >>>> >>>> ?From doing some mercurial archeology, the change was introduced with >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2F823590505eb4&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=oAeWirekxu3yo5CSiSND1lYSvRnc8pw5blBYFlXegqY%3D&reserved=0, and later >>>> modified with https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2Fe1685e30beca&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=TsPQvy9ZIoNPMZaazl8nA%2F9F0iPQpghHMRRp0QbT3Ec%3D&reserved=0 >>>> which removed all uses of `delayed_forbidden`. >>>> >>>> As I am not an author, please find following the patch removing the >>>> dead code. >>>> >>>> Thank you, >>>> >>>> -- >>>> Ludovic >>>> >>>> diff --git a/src/hotspot/share/opto/callGenerator.cpp >>>> b/src/hotspot/share/opto/callGenerator.cpp >>>> index 1092d582184..606325d4dd6 100644 >>>> --- a/src/hotspot/share/opto/callGenerator.cpp >>>> +++ b/src/hotspot/share/opto/callGenerator.cpp >>>> @@ -821,13 +821,13 @@ JVMState* >>>> PredictedCallGenerator::generate(JVMState* jvms) { >>>> ? } >>>> >>>> >>>> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >>>> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>>> ciMethod* caller, ciMethod* callee) { >>>> ??? assert(callee->is_method_handle_intrinsic(), >>>> "for_method_handle_call mismatch"); >>>> ??? bool input_not_const; >>>> ??? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >>>> caller, callee, input_not_const); >>>> ??? Compile* C = Compile::current(); >>>> ??? if (cg != NULL) { >>>> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >>>> +??? if (AlwaysIncrementalInline) { >>>> ??????? return CallGenerator::for_late_inline(callee, cg); >>>> ????? } else { >>>> ??????? return cg; >>>> diff --git a/src/hotspot/share/opto/callGenerator.hpp >>>> b/src/hotspot/share/opto/callGenerator.hpp >>>> index 46bf9f5d7f9..3a04fa4d5cd 100644 >>>> --- a/src/hotspot/share/opto/callGenerator.hpp >>>> +++ b/src/hotspot/share/opto/callGenerator.hpp >>>> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >>>> ??? static CallGenerator* for_direct_call(ciMethod* m, bool >>>> separate_io_projs = false);?? // static, special >>>> ??? static CallGenerator* for_virtual_call(ciMethod* m, int >>>> vtable_index);? // virtual, interface >>>> >>>> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >>>> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>>> ciMethod* caller, ciMethod* callee); >>>> ??? static CallGenerator* for_method_handle_inline(JVMState* jvms, >>>> ciMethod* caller, ciMethod* callee, bool& input_not_const); >>>> >>>> ??? // How to generate a replace a direct call with an inline version >>>> diff --git a/src/hotspot/share/opto/compile.hpp >>>> b/src/hotspot/share/opto/compile.hpp >>>> index a922a905707..bd0def2997c 100644 >>>> --- a/src/hotspot/share/opto/compile.hpp >>>> +++ b/src/hotspot/share/opto/compile.hpp >>>> @@ -854,7 +854,7 @@ class Compile : public Phase { >>>> ??? // The profile factor is a discount to apply to this site's >>>> interp. profile. >>>> ??? CallGenerator*??? call_generator(ciMethod* call_method, int >>>> vtable_index, bool call_does_dispatch, >>>> ???????????????????????????????????? JVMState* jvms, bool >>>> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >>>> = NULL, >>>> -?????????????????????????????????? bool allow_intrinsics = true, bool >>>> delayed_forbidden = false); >>>> +?????????????????????????????????? bool allow_intrinsics = true); >>>> ??? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >>>> ????? return should_delay_string_inlining(call_method, jvms) || >>>> ???????????? should_delay_boxing_inlining(call_method, jvms); >>>> diff --git a/src/hotspot/share/opto/doCall.cpp >>>> b/src/hotspot/share/opto/doCall.cpp >>>> index c26dc4b682d..c4d55d0d4c4 100644 >>>> --- a/src/hotspot/share/opto/doCall.cpp >>>> +++ b/src/hotspot/share/opto/doCall.cpp >>>> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >>>> *method, int depth, int bci, ciMeth >>>> ? CallGenerator* Compile::call_generator(ciMethod* callee, int >>>> vtable_index, bool call_does_dispatch, >>>> ???????????????????????????????????????? JVMState* jvms, bool >>>> allow_inline, >>>> ???????????????????????????????????????? float prof_factor, ciKlass* >>>> speculative_receiver_type, >>>> -?????????????????????????????????????? bool allow_intrinsics, bool >>>> delayed_forbidden) { >>>> +?????????????????????????????????????? bool allow_intrinsics) { >>>> ??? ciMethod*?????? caller?? = jvms->method(); >>>> ??? int???????????? bci????? = jvms->bci(); >>>> ??? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >>>> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >>>> callee, int vtable_index, bool >>>> ??? // MethodHandle.invoke* are native methods which obviously don't >>>> ??? // have bytecodes and so normal inlining fails. >>>> ??? if (callee->is_method_handle_intrinsic()) { >>>> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>>> caller, callee, delayed_forbidden); >>>> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >>>> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >>>> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>>> caller, callee); >>>> +??? assert(cg == NULL || !cg->is_late_inline() || >>>> cg->is_mh_late_inline(), "unexpected CallGenerator"); >>>> ????? return cg; >>>> ??? } >>>> >>>> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >>>> callee, int vtable_index, bool >>>> ??????????? // opportunity to perform some high level optimizations >>>> ??????????? // first. >>>> ??????????? if (should_delay_string_inlining(callee, jvms)) { >>>> -??????????? assert(!delayed_forbidden, "strange"); >>>> ????????????? return CallGenerator::for_string_late_inline(callee, cg); >>>> ??????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >>>> -??????????? assert(!delayed_forbidden, "strange"); >>>> ????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >>>> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >>>> !delayed_forbidden) { >>>> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >>>> ????????????? return CallGenerator::for_late_inline(callee, cg); >>>> ??????????? } >>>> ????????? } >>>> >>>> From vladimir.kozlov at oracle.com Thu Jun 18 20:30:10 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 18 Jun 2020 13:30:10 -0700 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: <3685d03c-e752-638f-ab75-434968311f08@oracle.com> Message-ID: Yes, you need second review. Regards, Vladimir On 6/18/20 12:53 AM, Pengfei Li wrote: > Thanks Vladimir. > > Do I still need another reviewer to look at this fix? > > BTW: Yesterday I mentioned vectorized loop with predicates isn't working well now. I've just created a bug (https://bugs.openjdk.java.net/browse/JDK-8247838) for it. You could take a look if you're interested. > > -- > Thanks, > Pengfei > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Thursday, June 18, 2020 00:10 >> To: Pengfei Li ; hotspot-compiler- >> dev at openjdk.java.net >> Cc: nd >> Subject: Re: RFR(S): 8247307: C2: Loop array fill stub routines are not called >> >> No further comments from me. >> >> Yes, we can work on stubs later. >> >> Thanks, >> Vladimir >> >> On 6/16/20 8:30 PM, Pengfei Li wrote: >>> Hi Vladimir, >>> >>>> Looking on generated code I see that vectorized loop may unroll 16 >>>> times (16 vector instructions by 256 bytes) where >>>> generate_fill() stub on x86 has 2 (256 bytes wide) instructions per >>>> iteration and 1 instruction for avx512 [2]. >>>> Also stub has alot of pre- and post-loop instructions and checks. >>> >>> Right, I also take a look at x86 generated stub code and think the >> performance is potentially to be improved if the loop is unrolled more times. >> The AArch64 stub code is manually unrolled 8 times and it has almost no >> performance difference with the auto-vectorized version in general cases. >>> >>>> I thought may be we can improve stub. But it seems vectorized loop >>>> with predicates is more compact and efficient. And it is auto generated! >>>> >>>> Base on results I agree with you switching off fill optimization on x86. >>>> >>>> There could be side effects due to loops code will be larger (vs stub >>>> call) but we have it already right now before your changes so I don't >>>> think we will see regression for GCs which use strip mining. >>> >>> Trying to improve the stub is my next plan. I believe both x86 and AArch64 >> stubs have room for improvement. So I prefer the keep the stub code for >> now and check if it can win the auto-vectorized version after been improved >> in the near future. But I hope some Intel guy could help with the x86 backend >> part since I'm not quite familiar with new x86 instructions. >>> >>> I'm also studying the experimental feature of vectorized loop with >> predicates optimization in recent days (the PostLoopMultiversioning). But I >> found it's more complex and not working well now. This could be another >> long term goal. >>> >>> Please let me know if you have further comments. >>> >>> -- >>> Thanks, >>> Pengfei >>> From vladimir.kozlov at oracle.com Thu Jun 18 22:09:29 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 18 Jun 2020 15:09:29 -0700 Subject: RFR(XS): 8247763: assert(outer->outcnt() == 2) failed: 'only phis' failure in LoopNode::verify_strip_mined() In-Reply-To: <87pn9wwjgy.fsf@redhat.com> References: <87pn9wwjgy.fsf@redhat.com> Message-ID: Good. Thanks, Vladimir On 6/18/20 7:13 AM, Roland Westrelin wrote: > > https://bugs.openjdk.java.net/browse/JDK-8247763 > http://cr.openjdk.java.net/~roland/8247763/webrev.00/ > > A store is sunk from a pre loop and control is set to the outer strip > mined loop of the main loop (the only use of the store is the phi of the > main loop). Fix simply sets control to entry control in that case. > > Roland. > From tobias.hartmann at oracle.com Fri Jun 19 06:06:45 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 19 Jun 2020 08:06:45 +0200 Subject: RFR(S): Remove some dead code in C2 In-Reply-To: References: <562032ab-f07e-4561-d0f9-b356d03182b5@oracle.com> <3ecd27a8-ac9f-31ec-3b69-3445d0592958@oracle.com> <6367197d-050c-2cdf-4575-3b1fc67d7be4@oracle.com> Message-ID: Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/f7587f7c859d Best regards, Tobias On 18.06.20 22:27, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 6/18/20 4:43 AM, Tobias Hartmann wrote: >> Hi Ludovic, >> >> thanks for updating. Looks good to me. >> >> Best regards, >> Tobias >> >> On 17.06.20 21:39, Ludovic Henry wrote: >>> The webrev is available at >>> http://cr.openjdk.java.net/~adityam/ludovic/rem_delayed_forbidden/jdk.patch. It also contains the >>> fix for https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-June/038636.html. >>> >>> -- >>> Ludovic >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev On Behalf Of Ludovic >>> Henry >>> Sent: Tuesday, June 16, 2020 10:18 PM >>> To: Vladimir Ivanov ; Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR(S): Remove some dead code in C2 >>> >>> Hi, >>> >>> Thank you for the review! >>> >>> I do not have authorship, so feel free to take my change and commit them directly (if that's the >>> appropriate thing to do of course!). I'll work with a colleague who has authorship to get a >>> webrev going, but feel free to take it from there if you want to see it happen sooner rather than >>> later. >>> >>> -- >>> Ludovic >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev On Behalf Of Vladimir >>> Ivanov >>> Sent: Tuesday, June 16, 2020 11:15 AM >>> To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): Remove some dead code in C2 >>> >>> Good catch, Ludovic! >>> >>> Looks good to me as well. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 16.06.2020 19:38, Vladimir Kozlov wrote: >>>> Hi Ludovic, >>>> >>>> Looks good. >>>> >>>> Yes, in 8148994 Vladimir Ivanov enable late inlining [1]: >>>> >>>> "But after JDK-8072008 there's no problem with delaying inlining. C2 can >>>> decide whether to keep the direct call or inline through it. So, I >>>> enabled late inlining for all linkers. (Surprisingly, no significant >>>> performance difference on nashorn.)" >>>> >>>> I created RFE >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8247697&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=gwXvpyGBz7IE4Vq7Kr95v5KRysoGKEEpg4tXil3y%2FSg%3D&reserved=0 >>>> for this. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> [1] >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2016-February%2F021137.html&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=mgOIg%2F9zsu5f22BDoeKRnZsUX2VE7WmQtpHKChbSRMU%3D&reserved=0 >>>> >>>> >>>> >>>> PS: Does someone in your group have OpenJDK Author status so he can file >>>> issues in JBS? >>>> >>>> On 6/15/20 4:58 PM, Ludovic Henry wrote: >>>>> Hi, >>>>> >>>>> As I was exploring code in src/hotspot/share/opto/doCall.cpp, I >>>>> noticed the `delayed_forbidden` parameter to `Compile::call_generator` >>>>> never to be used. >>>>> >>>>> ??From doing some mercurial archeology, the change was introduced with >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2F823590505eb4&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=oAeWirekxu3yo5CSiSND1lYSvRnc8pw5blBYFlXegqY%3D&reserved=0, >>>>> and later >>>>> modified with >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhg.openjdk.java.net%2Fjdk%2Fjdk%2Frev%2Fe1685e30beca&data=02%7C01%7Cluhenry%40microsoft.com%7Cd39a7504cd9c40d117c508d8127e0f37%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637279679919387791&sdata=TsPQvy9ZIoNPMZaazl8nA%2F9F0iPQpghHMRRp0QbT3Ec%3D&reserved=0 >>>>> >>>>> which removed all uses of `delayed_forbidden`. >>>>> >>>>> As I am not an author, please find following the patch removing the >>>>> dead code. >>>>> >>>>> Thank you, >>>>> >>>>> --? >>>>> Ludovic >>>>> >>>>> diff --git a/src/hotspot/share/opto/callGenerator.cpp >>>>> b/src/hotspot/share/opto/callGenerator.cpp >>>>> index 1092d582184..606325d4dd6 100644 >>>>> --- a/src/hotspot/share/opto/callGenerator.cpp >>>>> +++ b/src/hotspot/share/opto/callGenerator.cpp >>>>> @@ -821,13 +821,13 @@ JVMState* >>>>> PredictedCallGenerator::generate(JVMState* jvms) { >>>>> ?? } >>>>> >>>>> >>>>> -CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>>>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden) { >>>>> +CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, >>>>> ciMethod* caller, ciMethod* callee) { >>>>> ???? assert(callee->is_method_handle_intrinsic(), >>>>> "for_method_handle_call mismatch"); >>>>> ???? bool input_not_const; >>>>> ???? CallGenerator* cg = CallGenerator::for_method_handle_inline(jvms, >>>>> caller, callee, input_not_const); >>>>> ???? Compile* C = Compile::current(); >>>>> ???? if (cg != NULL) { >>>>> -??? if (!delayed_forbidden && AlwaysIncrementalInline) { >>>>> +??? if (AlwaysIncrementalInline) { >>>>> ???????? return CallGenerator::for_late_inline(callee, cg); >>>>> ?????? } else { >>>>> ???????? return cg; >>>>> diff --git a/src/hotspot/share/opto/callGenerator.hpp >>>>> b/src/hotspot/share/opto/callGenerator.hpp >>>>> index 46bf9f5d7f9..3a04fa4d5cd 100644 >>>>> --- a/src/hotspot/share/opto/callGenerator.hpp >>>>> +++ b/src/hotspot/share/opto/callGenerator.hpp >>>>> @@ -124,7 +124,7 @@ class CallGenerator : public ResourceObj { >>>>> ???? static CallGenerator* for_direct_call(ciMethod* m, bool >>>>> separate_io_projs = false);?? // static, special >>>>> ???? static CallGenerator* for_virtual_call(ciMethod* m, int >>>>> vtable_index);? // virtual, interface >>>>> >>>>> -? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>>>> ciMethod* caller, ciMethod* callee, bool delayed_forbidden); >>>>> +? static CallGenerator* for_method_handle_call(? JVMState* jvms, >>>>> ciMethod* caller, ciMethod* callee); >>>>> ???? static CallGenerator* for_method_handle_inline(JVMState* jvms, >>>>> ciMethod* caller, ciMethod* callee, bool& input_not_const); >>>>> >>>>> ???? // How to generate a replace a direct call with an inline version >>>>> diff --git a/src/hotspot/share/opto/compile.hpp >>>>> b/src/hotspot/share/opto/compile.hpp >>>>> index a922a905707..bd0def2997c 100644 >>>>> --- a/src/hotspot/share/opto/compile.hpp >>>>> +++ b/src/hotspot/share/opto/compile.hpp >>>>> @@ -854,7 +854,7 @@ class Compile : public Phase { >>>>> ???? // The profile factor is a discount to apply to this site's >>>>> interp. profile. >>>>> ???? CallGenerator*??? call_generator(ciMethod* call_method, int >>>>> vtable_index, bool call_does_dispatch, >>>>> ????????????????????????????????????? JVMState* jvms, bool >>>>> allow_inline, float profile_factor, ciKlass* speculative_receiver_type >>>>> = NULL, >>>>> -?????????????????????????????????? bool allow_intrinsics = true, bool >>>>> delayed_forbidden = false); >>>>> +?????????????????????????????????? bool allow_intrinsics = true); >>>>> ???? bool should_delay_inlining(ciMethod* call_method, JVMState* jvms) { >>>>> ?????? return should_delay_string_inlining(call_method, jvms) || >>>>> ????????????? should_delay_boxing_inlining(call_method, jvms); >>>>> diff --git a/src/hotspot/share/opto/doCall.cpp >>>>> b/src/hotspot/share/opto/doCall.cpp >>>>> index c26dc4b682d..c4d55d0d4c4 100644 >>>>> --- a/src/hotspot/share/opto/doCall.cpp >>>>> +++ b/src/hotspot/share/opto/doCall.cpp >>>>> @@ -65,7 +65,7 @@ void trace_type_profile(Compile* C, ciMethod >>>>> *method, int depth, int bci, ciMeth >>>>> ?? CallGenerator* Compile::call_generator(ciMethod* callee, int >>>>> vtable_index, bool call_does_dispatch, >>>>> ????????????????????????????????????????? JVMState* jvms, bool >>>>> allow_inline, >>>>> ????????????????????????????????????????? float prof_factor, ciKlass* >>>>> speculative_receiver_type, >>>>> -?????????????????????????????????????? bool allow_intrinsics, bool >>>>> delayed_forbidden) { >>>>> +?????????????????????????????????????? bool allow_intrinsics) { >>>>> ???? ciMethod*?????? caller?? = jvms->method(); >>>>> ???? int???????????? bci????? = jvms->bci(); >>>>> ???? Bytecodes::Code bytecode = caller->java_code_at_bci(bci); >>>>> @@ -145,8 +145,8 @@ CallGenerator* Compile::call_generator(ciMethod* >>>>> callee, int vtable_index, bool >>>>> ???? // MethodHandle.invoke* are native methods which obviously don't >>>>> ???? // have bytecodes and so normal inlining fails. >>>>> ???? if (callee->is_method_handle_intrinsic()) { >>>>> -??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>>>> caller, callee, delayed_forbidden); >>>>> -??? assert(cg == NULL || !delayed_forbidden || !cg->is_late_inline() >>>>> || cg->is_mh_late_inline(), "unexpected CallGenerator"); >>>>> +??? CallGenerator* cg = CallGenerator::for_method_handle_call(jvms, >>>>> caller, callee); >>>>> +??? assert(cg == NULL || !cg->is_late_inline() || >>>>> cg->is_mh_late_inline(), "unexpected CallGenerator"); >>>>> ?????? return cg; >>>>> ???? } >>>>> >>>>> @@ -182,12 +182,10 @@ CallGenerator* Compile::call_generator(ciMethod* >>>>> callee, int vtable_index, bool >>>>> ???????????? // opportunity to perform some high level optimizations >>>>> ???????????? // first. >>>>> ???????????? if (should_delay_string_inlining(callee, jvms)) { >>>>> -??????????? assert(!delayed_forbidden, "strange"); >>>>> ?????????????? return CallGenerator::for_string_late_inline(callee, cg); >>>>> ???????????? } else if (should_delay_boxing_inlining(callee, jvms)) { >>>>> -??????????? assert(!delayed_forbidden, "strange"); >>>>> ?????????????? return CallGenerator::for_boxing_late_inline(callee, cg); >>>>> -????????? } else if ((should_delay || AlwaysIncrementalInline) && >>>>> !delayed_forbidden) { >>>>> +????????? } else if ((should_delay || AlwaysIncrementalInline)) { >>>>> ?????????????? return CallGenerator::for_late_inline(callee, cg); >>>>> ???????????? } >>>>> ?????????? } >>>>> >>>>> From tobias.hartmann at oracle.com Fri Jun 19 07:49:33 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 19 Jun 2020 09:49:33 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Message-ID: <82c1f599-255a-cee0-0818-a10d79315b4e@oracle.com> Hi Felix, On 18.06.20 11:02, Yangfei (Felix) wrote: > For the first iteration on the loop, the MergeMem node actually have three phis as input. > It looks like: > (gdb) p this->dump() > 556 MergeMem === _ 1 660 655 1 657 [[ 559 ]] { N655:rawptr:BotPTR - N657:TestReplaceEquivPhis+12 * } Memory: @BotPTR *+bot, idx=Bot; !orig=151 !jvms: TestReplaceEquivPhis::test @ bci:25 > $2 = void > (gdb) p in(2)->dump() > 660 Phi === 679 752 583 [[ 556 533 ]] #memory Memory: @BotPTR *+bot, idx=Bot; > $3 = void > (gdb) p in(3)->dump() > 655 Phi === 679 752 583 [[ 556 ]] #memory Memory: @rawptr:BotPTR, idx=Raw; > $4 = void > (gdb) p in(5)->dump() > 657 Phi === 679 752 583 [[ 556 516 ]] #memory Memory: @TestReplaceEquivPhis+12 *, name=iFld, idx=5; > $5 = void > > After the first iteration, in(3) (i.e., in(AliasIdxRaw)) was removed from the inputs. > I am not sure if this transformation is correct even through it does not make a difference on app behavior. > After that, the MergeMem have two phi nodes as input: in(5) and in(2) ((i.e., in(AliasIdxBot))). This is the same structure as described in my first email. > > Here, I see in(2) is also an input for node 533: > (gdb) p in(2)->find(533)->dump() > 533 LoadI === _ 660 180 [[ 517 532 559 ]] @java/lang/Class:exact+120 *, name=instanceCount, idx=6; Volatile! #int !orig=181 !jvms: TestReplaceEquivPhis::test @ bci:39 > > If we have a store to the same memory slice as in(5) after the MergeMem node, I think we might trigger one similar bug if we decide to keep in(5) here. (Should be "same memory slice as 533 after" as you corrected in a private email) Right, that could be a problem. > So I guess it might be safer to go with the initially proposed patch. Let me run this through some performance testing, to see if it makes a difference. Best regards, Tobias From tobias.hartmann at oracle.com Fri Jun 19 07:55:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 19 Jun 2020 09:55:14 +0200 Subject: RFR(XS): 8247763: assert(outer->outcnt() == 2) failed: 'only phis' failure in LoopNode::verify_strip_mined() In-Reply-To: References: <87pn9wwjgy.fsf@redhat.com> Message-ID: <96d8e88c-c839-6a6a-6d60-c7261adb91f5@oracle.com> +1 Best regards, Tobias On 19.06.20 00:09, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/18/20 7:13 AM, Roland Westrelin wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8247763 >> http://cr.openjdk.java.net/~roland/8247763/webrev.00/ >> >> A store is sunk from a pre loop and control is set to the outer strip >> mined loop of the main loop (the only use of the store is the phi of the >> main loop). Fix simply sets control to entry control in that case. >> >> Roland. >> From nils.eliasson at oracle.com Fri Jun 19 09:32:50 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 19 Jun 2020 11:32:50 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> Message-ID: <78e3bb57-d0ea-c606-784e-0133891f611f@oracle.com> +1 Best regards, Nils On 2020-06-18 17:43, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/17/20 11:17 PM, Tobias Hartmann wrote: >> Hi, >> >> performance testing revealed a slight regression with microbenchmarks >> due to ConvI2L nodes not >> always being processed by IGVN and therefore not being optimized. >> >> Here's a new version that makes sure that such ConvI2L are always >> recorded for IGVN. Performance >> numbers look good now. >> http://cr.openjdk.java.net/~thartmann/8237950/webrev.01/ >> >> Thanks, >> Tobias >> >> >> On 16.06.20 08:34, Tobias Hartmann wrote: >>> Thanks Vladimir! I'll run our regular performance testing. >>> >>> Best regards, >>> Tobias >>> >>> On 15.06.20 20:02, Vladimir Kozlov wrote: >>>> +1 >>>> >>>> I would suggest to do our regular performance testing to make sure >>>> there is no regression. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 6/15/20 6:31 AM, Nils Eliasson wrote: >>>>> Hi Tobias, >>>>> >>>>> The change looks reasonable. >>>>> >>>>> Reviewed. >>>>> >>>>> Best regards, >>>>> Nils >>>>> >>>>> On 2020-06-15 13:22, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following patch: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>>>>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>>>>> >>>>>> A long chain of StringBuffer.append calls is optimized by C2's >>>>>> string concatenation optimization >>>>>> which emits direct stores into the String internal byte array. >>>>>> GraphKit::array_element_address emits >>>>>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) >>>>>> without any range check dependent >>>>>> CastII nodes because the bounds are known. As a result, the >>>>>> ConvI2L ideal optimization jumps in and >>>>>> creates over 34000 new ConvI2L nodes while pushing them through >>>>>> the long chain of AddNodes. We hit >>>>>> the node limit because during GVN, dead nodes are not removed. >>>>>> >>>>>> I propose to simply postpone that optimization to IGVN. This only >>>>>> affects array accesses emitted for >>>>>> the string concat optimizations because "normal" array accesses >>>>>> have a range check dependent CastII >>>>>> which blocks that ConvI2L optimization during parsing. >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>> From tobias.hartmann at oracle.com Fri Jun 19 08:36:39 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 19 Jun 2020 10:36:39 +0200 Subject: [15] RFR(S): 8237950: C2 compilation fails with "Live Node limit exceeded limit" during ConvI2L::Ideal optimization In-Reply-To: <78e3bb57-d0ea-c606-784e-0133891f611f@oracle.com> References: <1ad6cc37-8aa2-e20d-2a9f-8cbf21780ffd@oracle.com> <97427e35-fa63-05d9-d958-a2edb4dfa1ec@oracle.com> <91418444-7012-b54f-82b7-b6cb7683ff1b@oracle.com> <78e3bb57-d0ea-c606-784e-0133891f611f@oracle.com> Message-ID: <0f15a6ef-b019-97a1-1559-6a6d291e7e26@oracle.com> Thanks Nils! Best regards, Tobias On 19.06.20 11:32, Nils Eliasson wrote: > +1 > > Best regards, > Nils > > On 2020-06-18 17:43, Vladimir Kozlov wrote: >> Good. >> >> Thanks, >> Vladimir >> >> On 6/17/20 11:17 PM, Tobias Hartmann wrote: >>> Hi, >>> >>> performance testing revealed a slight regression with microbenchmarks due to ConvI2L nodes not >>> always being processed by IGVN and therefore not being optimized. >>> >>> Here's a new version that makes sure that such ConvI2L are always recorded for IGVN. Performance >>> numbers look good now. >>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.01/ >>> >>> Thanks, >>> Tobias >>> >>> >>> On 16.06.20 08:34, Tobias Hartmann wrote: >>>> Thanks Vladimir! I'll run our regular performance testing. >>>> >>>> Best regards, >>>> Tobias >>>> >>>> On 15.06.20 20:02, Vladimir Kozlov wrote: >>>>> +1 >>>>> >>>>> I would suggest to do our regular performance testing to make sure there is no regression. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 6/15/20 6:31 AM, Nils Eliasson wrote: >>>>>> Hi Tobias, >>>>>> >>>>>> The change looks reasonable. >>>>>> >>>>>> Reviewed. >>>>>> >>>>>> Best regards, >>>>>> Nils >>>>>> >>>>>> On 2020-06-15 13:22, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review the following patch: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8237950 >>>>>>> http://cr.openjdk.java.net/~thartmann/8237950/webrev.00/ >>>>>>> >>>>>>> A long chain of StringBuffer.append calls is optimized by C2's string concatenation optimization >>>>>>> which emits direct stores into the String internal byte array. >>>>>>> GraphKit::array_element_address emits >>>>>>> ConvI2L nodes for the array index (see Compile::conv_I2X_index) without any range check >>>>>>> dependent >>>>>>> CastII nodes because the bounds are known. As a result, the ConvI2L ideal optimization jumps >>>>>>> in and >>>>>>> creates over 34000 new ConvI2L nodes while pushing them through the long chain of AddNodes. >>>>>>> We hit >>>>>>> the node limit because during GVN, dead nodes are not removed. >>>>>>> >>>>>>> I propose to simply postpone that optimization to IGVN. This only affects array accesses >>>>>>> emitted for >>>>>>> the string concat optimizations because "normal" array accesses have a range check dependent >>>>>>> CastII >>>>>>> which blocks that ConvI2L optimization during parsing. >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>> > From rwestrel at redhat.com Fri Jun 19 14:28:08 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 19 Jun 2020 16:28:08 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> Message-ID: <87k103w2o7.fsf@redhat.com> Hi Felix, > So I guess it might be safer to go with the initially proposed patch. What about a PhiNode::Identity transformation that, for a memory phi, looks for phis with identical inputs and if it finds one that has type TypePtr::BOTTOM, replaces the current phi with the bottom phi? Wouldn't that result in both the LoadI and the MergeMem with the same memory input which would fix the problem? Roland. From boris.ulasevich at bell-sw.com Fri Jun 19 16:49:43 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 19 Jun 2020 19:49:43 +0300 Subject: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check In-Reply-To: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> Message-ID: <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> Hi Andrew, I added the expression canonicalization in the BoolNode::Ideal method: http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b The change reduces a number of generated machine instructions on all ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and ARM32 with the given change. On x86 benchmark performance improves from +1% to +13% depending on the CPU generation, except of machines affected by Intel Erratum (JDK-8234160) issue. Maximum decrease observed is -%11. It does not look like a problem with the proposed benchmark though, but rather like an issue with Erratum mitigation. On PowerPC result of the micro-benchmark is also positive. I changed the micro-benchmark to make it a little bulkier so that we don't hit the limitations of architectures with a less elaborate branch prediction mechanism. The original application performance does not change on PowerPC. thanks, Boris Cascade Lake? 817.639 ? 15.806? -> 810.058 ? 3.128?? ns/op Whiskey Lake? 751.560 ? 29.690? -> 751.390 ? 24.406? ns/op Whiskey Lake* 803.742 ? 14.280? -> 746.670 ?? 5.626? ns/op Ivy Bridge?? 1021.523 ? 166.719 -> 903.092 ? 81.799? ns/op Skylake?????? 690.554 ? 4.839?? -> 769.115 ? 18.775? ns/op --- this is the only case where we see a regression Skylake*????? 734.354 ? 8.136?? -> 712.512 ? 10.301? ns/op ARM32?????? 11760.804 ? 335.050 -> 7133.137 ? 17.058 ns/op ARM64???????? 896.789 ? 3.524?? -> 758.096 ? 3.367?? ns/op PowerPC8???? 5313.218 ? 248.753 -> 1919.234 ? 605.326 ns/op PowerPC9???? 6174.107 ? 26.885? -> 1435.108 ? 48.447 ns/op * = -XX:-IntelJccErratumMitigation On 15.06.2020 12:28, Andrew Haley wrote: > On 12/06/2020 19:10, Boris Ulasevich wrote: >> Please review the new AARCH64 instruction selection rules. >> The change applies TBZ instruction for bit checks: "if ((var&16) == 16)". >> This makes 17% performance improvement on the benchmark and 5% on a real >> application. > Please forgive me if I am misunderstanding, but... > > This is strange Java for anyone to write. The expression "((var&16) == 16)" > is, I think, equivalent to "((var&16) != 0)". Do you believe that it > is wise to add new patterns to do this to (potentially) every HotSpot > back end rather than canonicalize the expression during the > machine-independent part of C2? This would have the same improvement > on all targets. > From aph at redhat.com Fri Jun 19 17:07:37 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 19 Jun 2020 18:07:37 +0100 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> Message-ID: <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> Hi, On 19/06/2020 17:49, Boris Ulasevich wrote: > I added the expression canonicalization in the BoolNode::Ideal method: > http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b > > The change reduces a number of generated machine instructions on all > ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and > ARM32 with the given change. > > On x86 benchmark performance improves from +1% to +13% depending on the > CPU generation, except of machines affected by Intel Erratum (JDK-8234160) > issue. Maximum decrease observed is -%11. It does not look like a problem > with the proposed benchmark though, but rather like an issue with > Erratum mitigation. > > On PowerPC result of the micro-benchmark is also positive. I changed the > micro-benchmark to make it a little bulkier so that we don't hit the > limitations of architectures with a less elaborate branch prediction > mechanism. The original application performance does not change on PowerPC. Fantastic work, thanks! You've done a remarkably thorough job. It's slightly unfortunate that one of the targets regresses. If there had been no regressions, I'd approve this straight away. Forwarding to hotspot-compiler-dev for more comments. VladimirK, what do you think? I guess we could turn this off on the machines affected by JDK-8234160. Should we? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Fri Jun 19 18:36:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Jun 2020 11:36:31 -0700 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> Message-ID: Nice optimization. I don't think we should turn it off on any machine. In real application you will not see such tight loops only with such branch. On other hand reducing code size should help in all cases. Would be nice to know if any Java benchmark is affected. I will try to run our set of benchmarks with these changes. Regards, Vladimir K On 6/19/20 10:07 AM, Andrew Haley wrote: > Hi, > > On 19/06/2020 17:49, Boris Ulasevich wrote: >> I added the expression canonicalization in the BoolNode::Ideal method: >> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b >> >> The change reduces a number of generated machine instructions on all >> ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and >> ARM32 with the given change. >> >> On x86 benchmark performance improves from +1% to +13% depending on the >> CPU generation, except of machines affected by Intel Erratum (JDK-8234160) >> issue. Maximum decrease observed is -%11. It does not look like a problem >> with the proposed benchmark though, but rather like an issue with >> Erratum mitigation. >> >> On PowerPC result of the micro-benchmark is also positive. I changed the >> micro-benchmark to make it a little bulkier so that we don't hit the >> limitations of architectures with a less elaborate branch prediction >> mechanism. The original application performance does not change on PowerPC. > > Fantastic work, thanks! You've done a remarkably thorough job. It's > slightly unfortunate that one of the targets regresses. If there had > been no regressions, I'd approve this straight away. > > Forwarding to hotspot-compiler-dev for more comments. > > VladimirK, what do you think? I guess we could turn this off on the > machines affected by JDK-8234160. Should we? > From Pengfei.Li at arm.com Mon Jun 22 05:02:40 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Mon, 22 Jun 2020 05:02:40 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: PING. May I have comments from another reviewer? I need a second review. -- Thanks, Pengfei > Sorry I forgot to paste below JMH link in my last email. > > [1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java > > BTW. If I turn on OptimizeFill manually there's below performance regression > on x86. So I turned it off on x86 in my patch to make things unchanged. > > Before (x86 with -XX:+OptimizeFill) > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 1793.206 ? 15.337 ns/op > TestArrayFill.fillIntArray avgt 25 6679.491 ? 14.729 ns/op > TestArrayFill.fillShortArray avgt 25 3412.708 ? 12.005 ns/op > TestArrayFill.zeroByteArray avgt 25 1785.940 ? 15.174 ns/op > TestArrayFill.zeroIntArray avgt 25 6666.709 ? 11.735 ns/op > TestArrayFill.zeroShortArray avgt 25 3404.146 ? 23.045 ns/op > > After (x86 with -XX:+OptimizeFill) > Benchmark Mode Cnt Score Error Units > TestArrayFill.fillByteArray avgt 25 2281.374 ? 191.220 ns/op > TestArrayFill.fillIntArray avgt 25 9009.679 ? 901.541 ns/op > TestArrayFill.fillShortArray avgt 25 4828.686 ? 49.199 ns/op > TestArrayFill.zeroByteArray avgt 25 2463.745 ? 47.640 ns/op > TestArrayFill.zeroIntArray avgt 25 9062.682 ? 939.538 ns/op > TestArrayFill.zeroShortArray avgt 25 4837.231 ? 50.026 ns/op > > > Hi, > > > > Can I have a review of this C2 loop optimization fix? > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ > > > > C2 has a loop optimization phase called intrinsify_fill. It matches > > the pattern of single array store with an loop invariant in a counted > > loop, like below, and replaces it with call to some stub routine. > > > > for (int i = start; i < limit; i++) { > > a[i] = value; > > } > > > > Unfortunately, this doesn't work in current jdk after loop strip mining. > > The above loop is eventually unrolled and auto-vectorized by > > subsequent optimization phases. Root cause is that in strip-mined > > loops, the inner CountedLoopNode may be used by the address polling > > node of the safepoint in the outer loop. But as the safepoint polling > > has nothing related to any real operations in the loop, it should not hinder > the pattern match. > > So in this patch, the polladr's use is ignored in the match check. > > > > We have some performance comparison of the code for array fill, > > between the auto-vectorized version and the stub routine version. The > > JMH case for the tests can be found at [1]. Results show that on x86, > > the stub code is even slower than the auto-vectorized code. To prevent > > any regression, vm option OptimizedFill is turned off for x86 in this patch. > > So this patch doesn't impact on the generated code on x86. On AArch64, > > the two versions show almost the same performance in general cases. > > But if the value to be filled is zero, the stub code's performance is > > much better. This makes sence as AArch64 uses cache maintenance > > instructions (DC ZVA) to zero large blocks in the hand-crafted > > assembly. Below are JMH scores on AArch64. > > > > Before: > > Benchmark Mode Cnt Score Error Units > > TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op > > TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op > > TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op > > TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op > > TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op > > TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op > > > > After: > > Benchmark Mode Cnt Score Error Units > > TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op > > TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op > > TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op > > TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op > > TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op > > TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op > > > > Another advantage of using the stub routine is that the generated code > > size is reduced. > > > > Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 > > are tested and no new failure is found. > > Thanks, > Pengfei From felix.yang at huawei.com Mon Jun 22 07:37:48 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 22 Jun 2020 07:37:48 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <87k103w2o7.fsf@redhat.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> Message-ID: Hi Roland, > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Friday, June 19, 2020 10:28 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > Hi Felix, > > > So I guess it might be safer to go with the initially proposed patch. > > What about a PhiNode::Identity transformation that, for a memory phi, looks > for phis with identical inputs and if it finds one that has type > TypePtr::BOTTOM, replaces the current phi with the bottom phi? Wouldn't > that result in both the LoadI and the MergeMem with the same memory > input which would fix the problem? That sounds promising. I tried the following patch and looks like it works as expected. diff -r 2342d5af52b7 src/hotspot/share/opto/cfgnode.cpp --- a/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 08:09:23 2020 +0200 +++ b/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 15:33:06 2020 +0800 @@ -1335,6 +1335,43 @@ if (id != NULL) return id; } + // replace equivalent phis + if (type()->has_memory()) { + for (uint i = 0; i < outcnt(); i++) { + Node* mmem = raw_out(i); + if (!mmem->is_MergeMem()) { + continue; + } + + uint phi_len = req(); + Node* phi_reg = region(); + // Look at each slice looking for phis with identical inputs. + // If we find one that has type TypePtr::BOTTOM, replace the + // current phi with the bottom phi. + for (uint j = Compile::AliasIdxBot; j < mmem->req(); j++) { + Node* node = mmem->in(j); + if (node == this || !node->is_Phi()) { + continue; + } + if (node->req() == phi_len && node->in(0) == phi_reg) { + PhiNode* phi_mem = node->as_Phi(); + if (phi_mem->adr_type() != TypePtr::BOTTOM) { + continue; + } + for (uint k = 1; k < phi_len; k++) { + if (in(k) != phi_mem->in(k)) { + phi_mem = NULL; + break; + } + } + if (phi_mem != NULL) { + return phi_mem; + } + } + } + } + } + return this; // No identity } With this patch, in MergeMemNode::Ideal we have: (gdb) p this->dump() 556 MergeMem === _ 1 660 660 1 660 [[ 559 ]] { - - - } Memory: @BotPTR *+bot, idx=Bot; !orig=151 !jvms: TestReplaceEquivPhis::test @ bci:25 Here, node 655 & 657 are both replaced by node 660 and this is reflected the inputs of the MergeMem node. After MergeMemNode::Ideal, we have: (gdb) p this->dump() 556 MergeMem === _ 1 660 1 1 1 [[ 559 ]] { - - - } Memory: @BotPTR *+bot, idx=Bot; !orig=151 !jvms: TestReplaceEquivPhis::test @ bci:25 (gdb) p this->find(660)->dump() 660 Phi === 679 752 583 [[ 556 533 516 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !orig=[655] (gdb) p this->find(516)->dump() 516 LoadI === _ 660 423 [[ 680 799 798 797 796 ]] @TestReplaceEquivPhis+12 *, name=iFld, idx=5; #int !orig=177 !jvms: TestReplaceEquivPhis::test @ bci:34 Tiered 1-3 tested on x86_64-linux-gnu. Does it look good? Thanks, Felix From rwestrel at redhat.com Mon Jun 22 07:53:37 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 22 Jun 2020 09:53:37 +0200 Subject: RFR(XS): 8247763: assert(outer->outcnt() == 2) failed: 'only phis' failure in LoopNode::verify_strip_mined() In-Reply-To: <96d8e88c-c839-6a6a-6d60-c7261adb91f5@oracle.com> References: <87pn9wwjgy.fsf@redhat.com> <96d8e88c-c839-6a6a-6d60-c7261adb91f5@oracle.com> Message-ID: <87h7v3wn7i.fsf@redhat.com> Thanks for the reviews. Roland. From rwestrel at redhat.com Mon Jun 22 08:11:53 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 22 Jun 2020 10:11:53 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> Message-ID: <87eeq7wmd2.fsf@redhat.com> Hi Felix, See comments below > That sounds promising. I tried the following patch and looks like it works as expected. > > diff -r 2342d5af52b7 src/hotspot/share/opto/cfgnode.cpp > --- a/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 08:09:23 2020 +0200 > +++ b/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 15:33:06 2020 +0800 > @@ -1335,6 +1335,43 @@ > if (id != NULL) return id; > } > > + // replace equivalent phis > + if (type()->has_memory()) { It's likely unsafe to perform that transformation at GVN time so I think it needs to be performed only if phase->is_IterGVN() != NULL. > + for (uint i = 0; i < outcnt(); i++) { > + Node* mmem = raw_out(i); The usual pattern is to use one of the iterators (DUIterator_Fast in this case). > + if (!mmem->is_MergeMem()) { > + continue; > + } > + > + uint phi_len = req(); > + Node* phi_reg = region(); > + // Look at each slice looking for phis with identical inputs. > + // If we find one that has type TypePtr::BOTTOM, replace the > + // current phi with the bottom phi. > + for (uint j = Compile::AliasIdxBot; j < mmem->req(); j++) { > + Node* node = mmem->in(j); > + if (node == this || !node->is_Phi()) { > + continue; > + } > + if (node->req() == phi_len && node->in(0) == phi_reg) { > + PhiNode* phi_mem = node->as_Phi(); > + if (phi_mem->adr_type() != TypePtr::BOTTOM) { > + continue; > + } > + for (uint k = 1; k < phi_len; k++) { > + if (in(k) != phi_mem->in(k)) { > + phi_mem = NULL; > + break; > + } > + } > + if (phi_mem != NULL) { > + return phi_mem; > + } > + } > + } > + } > + } > + > return this; // No identity > } I would make this fully generic: if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM) { Node* phi_reg = region(); for (DUIterator_Fast imax, i = phi_reg->fast_outs(imax); i < imax; i++) { Node* u = phi_reg->fast_out(i); if (u->is_Phi() && u->in(0) == phi_reg && type() == Type::MEMORY && u->adr_type() == TypePtr::BOTTOM) { return u; } } } Roland. From doug.simon at oracle.com Mon Jun 22 11:12:17 2020 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 22 Jun 2020 13:12:17 +0200 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod Message-ID: Hi, can I please get some reviews for this patch that fixes a race described in JDK-8247992 . The fix in the patch is to pass the HotSpotNmethod mirror object (in a handle) through the problematic VM-to-Java transition and only after the transition will the verified entry point be extracted from the mirror. This will prevent a zombie nmethod from being executed. It?s still possible for the call to not reach the ?alternative target? due to sweeping heuristics. This is an acceptable limitation of having an alternative call target in JavaCallWrapper. It could be fixed by altering CPU specific i2c trampoline code but the added complexity is not worth it for this test-only mechanism. Tests using this API can detect whether the alternative target was actually called if they really care about it. Thanks to Erik Osterlund for the in-depth analysis and suggested solutions. https://dougxc.github.io/webrevs/8247992_16/index.html https://bugs.openjdk.java.net/browse/JDK-8247992 Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From felix.yang at huawei.com Mon Jun 22 11:52:22 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 22 Jun 2020 11:52:22 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <87eeq7wmd2.fsf@redhat.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> Message-ID: Hi Roland, Thanks for the suggestions. > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Monday, June 22, 2020 4:12 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > Hi Felix, > > See comments below > > > That sounds promising. I tried the following patch and looks like it works as > expected. > > > > diff -r 2342d5af52b7 src/hotspot/share/opto/cfgnode.cpp > > --- a/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 08:09:23 2020 > +0200 > > +++ b/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 15:33:06 2020 > +0800 > > @@ -1335,6 +1335,43 @@ > > if (id != NULL) return id; > > } > > > > + // replace equivalent phis > > + if (type()->has_memory()) { > > It's likely unsafe to perform that transformation at GVN time so I think it > needs to be performed only if phase->is_IterGVN() != NULL. OK. > > + for (uint i = 0; i < outcnt(); i++) { > > + Node* mmem = raw_out(i); > > The usual pattern is to use one of the iterators (DUIterator_Fast in this case). OK. > > + if (!mmem->is_MergeMem()) { > > + continue; > > + } > > + > > + uint phi_len = req(); > > + Node* phi_reg = region(); > > + // Look at each slice looking for phis with identical inputs. > > + // If we find one that has type TypePtr::BOTTOM, replace the > > + // current phi with the bottom phi. > > + for (uint j = Compile::AliasIdxBot; j < mmem->req(); j++) { > > + Node* node = mmem->in(j); > > + if (node == this || !node->is_Phi()) { > > + continue; > > + } > > + if (node->req() == phi_len && node->in(0) == phi_reg) { > > + PhiNode* phi_mem = node->as_Phi(); > > + if (phi_mem->adr_type() != TypePtr::BOTTOM) { > > + continue; > > + } > > + for (uint k = 1; k < phi_len; k++) { > > + if (in(k) != phi_mem->in(k)) { > > + phi_mem = NULL; > > + break; > > + } > > + } > > + if (phi_mem != NULL) { > > + return phi_mem; > > + } > > + } > > + } > > + } > > + } > > + > > return this; // No identity > > } > > I would make this fully generic: > > if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != > TypePtr::BOTTOM) { > Node* phi_reg = region(); > for (DUIterator_Fast imax, i = phi_reg->fast_outs(imax); i < imax; i++) { > Node* u = phi_reg->fast_out(i); > if (u->is_Phi() && u->in(0) == phi_reg && type() == Type::MEMORY && u- > >adr_type() == TypePtr::BOTTOM) { > return u; > } > } > } That looks much better. I modified accordingly. Does the updated patch look better? Tier1-3 tested on x86_64-linux-gnu. Will propose a webrev for that. diff -r 2342d5af52b7 src/hotspot/share/opto/cfgnode.cpp --- a/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 08:09:23 2020 +0200 +++ b/src/hotspot/share/opto/cfgnode.cpp Mon Jun 22 18:45:43 2020 +0800 @@ -1335,6 +1335,28 @@ if (id != NULL) return id; } + // Looking for phis with identical inputs. If we find one that has + // type TypePtr::BOTTOM, replace the current phi with the bottom phi. + if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM) { + uint phi_len = req(); + Node* phi_reg = region(); + for (DUIterator_Fast imax, i = phi_reg->fast_outs(imax); i < imax; i++) { + Node* u = phi_reg->fast_out(i); + if (u->is_Phi() && u->as_Phi()->type() == Type::MEMORY && + u->adr_type() == TypePtr::BOTTOM && u->in(0) == phi_reg && u->req() == phi_len) { + for (uint j = 1; j < phi_len; j++) { + if (in(j) != u->in(j)) { + u = NULL; + break; + } + } + if (u != NULL) { + return u; + } + } + } + } + return this; // No identity } Felix From rwestrel at redhat.com Mon Jun 22 11:56:43 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 22 Jun 2020 13:56:43 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> Message-ID: <878sgfwbyc.fsf@redhat.com> > Does the updated patch look better? It looks good to me. Roland. From rkennke at redhat.com Mon Jun 22 12:34:22 2020 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jun 2020 14:34:22 +0200 Subject: RFR(S): 8247824: CTW: C2 (Shenandoah) compilation fails with SEGV in SBC2Support::pin_and_expand In-Reply-To: <87sgeswn1f.fsf@redhat.com> References: <87sgeswn1f.fsf@redhat.com> Message-ID: <8ba7522c0fc0cd8ebe69d3175d7f8bd3b08b88cd.camel@redhat.com> The change looks good. We should consider if it's relevant for jdk14 and add the corresponding affects-version in the bug. Thank you! Roman On Thu, 2020-06-18 at 14:55 +0200, Roland Westrelin wrote: > https://bugs.openjdk.java.net/browse/JDK-8247824 > http://cr.openjdk.java.net/~roland/8247824/webrev.00/ > > If a barrier is expanded in the outer loop of a strip mined loop > nest, > the outer loop head is changed to a new LoopNode so loop strip mining > verification code doesn't trigger and fail. The crash occurs when > there's 2 barriers in the outer loop and C2 attempts to transform the > loop head twice. The second time, loop->_head points to a dead node. > > Roland. From rwestrel at redhat.com Mon Jun 22 12:47:10 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 22 Jun 2020 14:47:10 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> Message-ID: <875zbjw9m9.fsf@redhat.com> > I'm seeing some failures with -XX:StressLongCountedLoop=429496729. Will follow-up offline. The last patch is flawed: predicates in the inner loop use the jvm state from the predicates of the initial loop, that is the state before the loop. If deoptimization happens for an inner loop predicate on an iteration of the outer loop that's not the first one then execution resumes as if the initial loop was never executed when it's already part way through. To fix this, I changed the code so one iteration of the loop is peeled when the loop is transformed to a long counted loop. State for predicates is obtained from the safepoint at the end of the peeled iteration of the loop. I removed some logic that causes safepoint to be eliminated if redundant with a call so PhaseIdealLoop::is_long_counted_loop() has a better chance of finding a safepoint where it needs one. Still, it's possible that the safepoint is not right above the loop exit test in which case, the transformation looks for a safepoint that dominates the backedge and verifies that there's no side effects between the backedge and the safepoint. In case of side effects, the transformation proceeds but the inner loop will have no predicates (from some stress testing with CTW this seems to happen only a handful of case). This also requires the loop to be transformed in 2 passes of loop optimizations: first the loop is transformed to a loop nest with 2 loops and on the next loop pass, the inner loop becomes a counted loop. The previous patch could backtrack in case the transformation to counted loop fails. The previous patch would also not duplicate all tests that PhaseIdealLoop::is_counted_loop() performs in PhaseIdealLoop::is_long_counted_loop() to avoid some complexity. PhaseIdealLoop::is_long_counted_loop() has to now follow PhaseIdealLoop::is_counted_loop() striclty. That stricter requirement also caught some bugs where PhaseIdealLoop::is_counted_loop() would fail when it shouldn't have. http://cr.openjdk.java.net/~roland/8223051/webrev.03/ diff from previous patch: http://cr.openjdk.java.net/~roland/8223051/webrev.02-03/ Roland. From rkennke at gmail.com Mon Jun 22 12:05:06 2020 From: rkennke at gmail.com (rkennke at gmail.com) Date: Mon, 22 Jun 2020 14:05:06 +0200 Subject: RFR(S): 8247824: CTW: C2 (Shenandoah) compilation fails with SEGV in SBC2Support::pin_and_expand In-Reply-To: <87sgeswn1f.fsf@redhat.com> References: <87sgeswn1f.fsf@redhat.com> Message-ID: The change looks good. We should consider if it's relevant for jdk14 and add the corresponding affects-version in the bug. Thank you! Roman On Thu, 2020-06-18 at 14:55 +0200, Roland Westrelin wrote: > https://bugs.openjdk.java.net/browse/JDK-8247824 > http://cr.openjdk.java.net/~roland/8247824/webrev.00/ > > If a barrier is expanded in the outer loop of a strip mined loop > nest, > the outer loop head is changed to a new LoopNode so loop strip mining > verification code doesn't trigger and fail. The crash occurs when > there's 2 barriers in the outer loop and C2 attempts to transform the > loop head twice. The second time, loop->_head points to a dead node. > > Roland. From erik.osterlund at oracle.com Mon Jun 22 13:49:47 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 22 Jun 2020 15:49:47 +0200 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod In-Reply-To: References: Message-ID: Hi Doug, It would be nice if we could get rid of some cluttering of nested #if JVMCI ? if (blah) { #endif ... #if JVMCI ? } #endif It makes it hard to read for (to me) no obvious reason. Here is a patch on top of yours to remove some of that: http://cr.openjdk.java.net/~eosterlund/8247992/webrev.doug..00/ Full webrev: http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00/ The reasoning in my changes is that since the SharedRuntime wrong method handler has no idea about these alternative call targets, checking if the nmethod is still entrant right before the call, will not guarantee that we actually end up calling the target nmethod anyway. It could get made not entrant just after the call, but before executing the first instruction, which when becoming not entrant jumps to the wrong method handler, which has no clue about any of this and just re-routes the call to the currently best method code of the method used in the Java call wrapper, without throwing any exception. So either we can live with that, or we can not. If we can live with that, then all exception handling can (and should) be removed from the Java call wrapper, because it is not needed. Instead, we check if the nmethod is still entrant in the JVMCI caller function, and accept that it could subsequently become not entrant at any point in time. A few more instructions for a slippery race to happen, but not worth cluttering the call wrapper with JVMCI code over the difference. Now I don't know how this API is used, but would it make sense to instead check *after* the nmethod has been executed, if it is still in_use()? Then we know that the nmethod has been executed to completion without getting killed off, if we do not get an exception. And if we get an exception then it got deoptimized either right before calling, or during its execution. Like this: http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00..01/ That would completely close the window for that race, but assumes the caller is okay with getting an exception if the nmethod gets deoptimized while running. You know this API better than I do, so that's your call if it is a good idea or not. Thanks, /Erik On 2020-06-22 13:12, Doug Simon wrote: > Hi, can I please get some reviews for this patch that fixes a race > described in JDK-8247992 > . > > The fix in the patch is to pass the HotSpotNmethod mirror object (in a > handle) through the problematic VM-to-Java transition and only after > the transition will the verified entry point be extracted from the > mirror. This will prevent a zombie nmethod from being executed. > > It?s still possible for the call to not reach the ?alternative target? > due to sweeping heuristics. This is an acceptable limitation of having > an alternative call target in JavaCallWrapper. It could be fixed by > altering CPU specific i2c trampoline code but the added complexity is > not worth it for this test-only mechanism. Tests using this API can > detect whether the alternative target was actually called if they > really care about it. > > Thanks to Erik Osterlund for the in-depth analysis and suggested > solutions. > > https://dougxc.github.io/webrevs/8247992_16/index.html > https://bugs.openjdk.java.net/browse/JDK-8247992 > > Testing:?hs-tier1,hs-tier2,hs-tier3-graal > > -Doug From boris.ulasevich at bell-sw.com Mon Jun 22 14:45:13 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Mon, 22 Jun 2020 17:45:13 +0300 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> Message-ID: Hi Vladimir, > Would be nice to know if any Java benchmark is affected. With the change we have got 5% performance boost on lucene tokenizer method on ARM64. Same time on x86 there is no visible improvement on lucene tokenizer. thanks, Boris import org.apache.lucene.analysis.standard.StandardTokenizerImpl; import java.nio.file.Files; import java.io.*; class Test { ? public static void main(String args[]) { ??? long count = 0; ??? try { ????? byte[] content = Files.readAllBytes(new File("aarch64.ad").toPath()); ????? for (int i=0; i < 1000; i++) { ??????? Reader reader = new InputStreamReader(new ByteArrayInputStream(content)); ??????? StandardTokenizerImpl sti = new StandardTokenizerImpl(reader); ??????? while (sti.getNextToken() != -1) { ????????? count ++; ??????? } ????? } ??? } catch (Exception ex) { System.out.println(ex); } ??? System.out.println(count); ? } } On 19.06.2020 21:36, Vladimir Kozlov wrote: > Nice optimization. > > I don't think we should turn it off on any machine. In real > application you will not see such tight loops only with such branch. > On other hand reducing code size should help in all cases. > > Would be nice to know if any Java benchmark is affected. > > I will try to run our set of benchmarks with these changes. > > Regards, > Vladimir K > > On 6/19/20 10:07 AM, Andrew Haley wrote: >> Hi, >> >> On 19/06/2020 17:49, Boris Ulasevich wrote: >>> I added the expression canonicalization in the BoolNode::Ideal method: >>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b >>> >>> The change reduces a number of generated machine instructions on all >>> ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 >>> and >>> ARM32 with the given change. >>> >>> On x86 benchmark performance improves from +1% to +13% depending on the >>> CPU generation, except of machines affected by Intel Erratum >>> (JDK-8234160) >>> issue. Maximum decrease observed is -%11. It does not look like a >>> problem >>> with the proposed benchmark though, but rather like an issue with >>> Erratum mitigation. >>> >>> On PowerPC result of the micro-benchmark is also positive. I changed >>> the >>> micro-benchmark to make it a little bulkier so that we don't hit the >>> limitations of architectures with a less elaborate branch prediction >>> mechanism. The original application performance does not change on >>> PowerPC. >> >> Fantastic work, thanks! You've done a remarkably thorough job. It's >> slightly unfortunate that one of the targets regresses. If there had >> been no regressions, I'd approve this straight away. >> >> Forwarding to hotspot-compiler-dev for more comments. >> >> VladimirK, what do you think? I guess we could turn this off on the >> machines affected by JDK-8234160. Should we? >> From vladimir.kozlov at oracle.com Mon Jun 22 15:48:07 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 22 Jun 2020 08:48:07 -0700 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> Message-ID: <709f87e9-9c4f-b9ad-5246-90c9c92c5e6b@oracle.com> On 6/22/20 7:45 AM, Boris Ulasevich wrote: > Hi Vladimir, > > > Would be nice to know if any Java benchmark is affected. > > With the change we have got 5% performance boost on lucene tokenizer method on ARM64. Same time on x86 there is no > visible improvement on lucene tokenizer. Good. I ran our benchmarks (mostly jvm2008) on x86 and don't see any effects too. Thanks, Vladimir > > thanks, > Boris > > import org.apache.lucene.analysis.standard.StandardTokenizerImpl; > import java.nio.file.Files; > import java.io.*; > > class Test { > ? public static void main(String args[]) { > ??? long count = 0; > ??? try { > ????? byte[] content = Files.readAllBytes(new File("aarch64.ad").toPath()); > ????? for (int i=0; i < 1000; i++) { > ??????? Reader reader = new InputStreamReader(new ByteArrayInputStream(content)); > ??????? StandardTokenizerImpl sti = new StandardTokenizerImpl(reader); > ??????? while (sti.getNextToken() != -1) { > ????????? count ++; > ??????? } > ????? } > ??? } catch (Exception ex) { System.out.println(ex); } > ??? System.out.println(count); > ? } > } > > > On 19.06.2020 21:36, Vladimir Kozlov wrote: >> Nice optimization. >> >> I don't think we should turn it off on any machine. In real application you will not see such tight loops only with >> such branch. On other hand reducing code size should help in all cases. >> >> Would be nice to know if any Java benchmark is affected. >> >> I will try to run our set of benchmarks with these changes. >> >> Regards, >> Vladimir K >> >> On 6/19/20 10:07 AM, Andrew Haley wrote: >>> Hi, >>> >>> On 19/06/2020 17:49, Boris Ulasevich wrote: >>>> I added the expression canonicalization in the BoolNode::Ideal method: >>>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b >>>> >>>> The change reduces a number of generated machine instructions on all >>>> ARM/x86/PPC architectures. Benchmark shows positive results on ARM64 and >>>> ARM32 with the given change. >>>> >>>> On x86 benchmark performance improves from +1% to +13% depending on the >>>> CPU generation, except of machines affected by Intel Erratum (JDK-8234160) >>>> issue. Maximum decrease observed is -%11. It does not look like a problem >>>> with the proposed benchmark though, but rather like an issue with >>>> Erratum mitigation. >>>> >>>> On PowerPC result of the micro-benchmark is also positive. I changed the >>>> micro-benchmark to make it a little bulkier so that we don't hit the >>>> limitations of architectures with a less elaborate branch prediction >>>> mechanism. The original application performance does not change on PowerPC. >>> >>> Fantastic work, thanks! You've done a remarkably thorough job. It's >>> slightly unfortunate that one of the targets regresses. If there had >>> been no regressions, I'd approve this straight away. >>> >>> Forwarding to hotspot-compiler-dev for more comments. >>> >>> VladimirK, what do you think? I guess we could turn this off on the >>> machines affected by JDK-8234160. Should we? >>> > From doug.simon at oracle.com Mon Jun 22 16:25:10 2020 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 22 Jun 2020 18:25:10 +0200 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod In-Reply-To: References: Message-ID: Thanks Erik, Almost every one of your suggestions makes sense to me - thanks a lot! I?ve incorporated them into https://dougxc.github.io/webrevs/8247992_16.02/ The only thing I?m not sure about is the check after the nmethod has been executed. It means we can execute the alternative nmethod but still throw an InvalidInstalledCodeException. This is problematic for an alternative nmethod with a side effect (e.g. it changes a static field). As such, I think the better option is to update the javadoc for HotSpotNmethod.executeVargs as follows: /** * {@inheritDoc} * * It's possible for the HotSpot runtime to sweep nmethods at any point in time. As a result, * there is no guarantee that calling this method will execute the wrapped nmethod. Instead, it * may end up executing the bytecode of the associated {@link #getMethod() Java method}. Only if * {@link #isValid()} is {@code true} after returning can the caller be sure that the nmethod * was executed. If {@link #isValid()} is {@code false}, then the only way to determine if the * nmethod was executed is to test for some side-effect specific to the nmethod (e.g., update to * a field) that is not performed by the bytecode of the associated {@link #getMethod() Java * method}. */ The javadoc update is also part of 8247992_16.02. -Doug > On 22 Jun 2020, at 15:49, Erik ?sterlund wrote: > > Hi Doug, > > It would be nice if we could get rid of some cluttering of nested > > #if JVMCI > if (blah) { > #endif > ... > #if JVMCI > } > #endif > > It makes it hard to read for (to me) no obvious reason. > > Here is a patch on top of yours to remove some of that: > http://cr.openjdk.java.net/~eosterlund/8247992/webrev.doug..00/ > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00/ > > The reasoning in my changes is that since the SharedRuntime wrong method handler has no idea about these alternative call targets, checking if the nmethod is still entrant right before the call, will not guarantee that we actually end up calling the target nmethod anyway. It could get made not entrant just after the call, but before executing the first instruction, which when becoming not entrant jumps to the wrong method handler, which has no clue about any of this and just re-routes the call to the currently best method code of the method used in the Java call wrapper, without throwing any exception. So either we can live with that, or we can not. If we can live with that, then all exception handling can (and should) be removed from the Java call wrapper, because it is not needed. Instead, we check if the nmethod is still entrant in the JVMCI caller function, and accept that it could subsequently become not entrant at any point in time. A few more instructions for a slippery race to happen, but not worth cluttering the call wrapper with JVMCI code over the difference. > > Now I don't know how this API is used, but would it make sense to instead check *after* the nmethod has been executed, if it is still in_use()? Then we know that the nmethod has been executed to completion without getting killed off, if we do not get an exception. And if we get an exception then it got deoptimized either right before calling, or during its execution. Like this: > > http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00..01/ > > That would completely close the window for that race, but assumes the caller is okay with getting an exception if the nmethod gets deoptimized while running. You know this API better than I do, so that's your call if it is a good idea or not. > > Thanks, > /Erik > > On 2020-06-22 13:12, Doug Simon wrote: >> Hi, can I please get some reviews for this patch that fixes a race described in JDK-8247992 . >> >> The fix in the patch is to pass the HotSpotNmethod mirror object (in a handle) through the problematic VM-to-Java transition and only after the transition will the verified entry point be extracted from the mirror. This will prevent a zombie nmethod from being executed. >> >> It?s still possible for the call to not reach the ?alternative target? due to sweeping heuristics. This is an acceptable limitation of having an alternative call target in JavaCallWrapper. It could be fixed by altering CPU specific i2c trampoline code but the added complexity is not worth it for this test-only mechanism. Tests using this API can detect whether the alternative target was actually called if they really care about it. >> >> Thanks to Erik Osterlund for the in-depth analysis and suggested solutions. >> >> https://dougxc.github.io/webrevs/8247992_16/index.html >> https://bugs.openjdk.java.net/browse/JDK-8247992 >> >> Testing: hs-tier1,hs-tier2,hs-tier3-graal >> >> -Doug > From erik.osterlund at oracle.com Mon Jun 22 16:38:59 2020 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Mon, 22 Jun 2020 18:38:59 +0200 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod In-Reply-To: References: Message-ID: <63214412-AC93-407A-91A5-E6D32B968039@oracle.com> Hi Doug, Looks good. Thanks, /Erik >> On 22 Jun 2020, at 18:25, Doug Simon wrote: > ?Thanks Erik, > > Almost every one of your suggestions makes sense to me - thanks a lot! I?ve incorporated them into https://dougxc.github.io/webrevs/8247992_16.02/ > > The only thing I?m not sure about is the check after the nmethod has been executed. It means we can execute the alternative nmethod but still throw an InvalidInstalledCodeException. This is problematic for an alternative nmethod with a side effect (e.g. it changes a static field). As such, I think the better option is to update the javadoc for HotSpotNmethod.executeVargs as follows: > > /** > * {@inheritDoc} > * > * It's possible for the HotSpot runtime to sweep nmethods at any point in time. As a result, > * there is no guarantee that calling this method will execute the wrapped nmethod. Instead, it > * may end up executing the bytecode of the associated {@link #getMethod() Java method}. Only if > * {@link #isValid()} is {@code true} after returning can the caller be sure that the nmethod > * was executed. If {@link #isValid()} is {@code false}, then the only way to determine if the > * nmethod was executed is to test for some side-effect specific to the nmethod (e.g., update to > * a field) that is not performed by the bytecode of the associated {@link #getMethod() Java > * method}. > */ > > The javadoc update is also part of 8247992_16.02. > > -Doug > >> On 22 Jun 2020, at 15:49, Erik ?sterlund wrote: >> >> Hi Doug, >> >> It would be nice if we could get rid of some cluttering of nested >> >> #if JVMCI >> if (blah) { >> #endif >> ... >> #if JVMCI >> } >> #endif >> >> It makes it hard to read for (to me) no obvious reason. >> >> Here is a patch on top of yours to remove some of that: >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.doug..00/ >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00/ >> >> The reasoning in my changes is that since the SharedRuntime wrong method handler has no idea about these alternative call targets, checking if the nmethod is still entrant right before the call, will not guarantee that we actually end up calling the target nmethod anyway. It could get made not entrant just after the call, but before executing the first instruction, which when becoming not entrant jumps to the wrong method handler, which has no clue about any of this and just re-routes the call to the currently best method code of the method used in the Java call wrapper, without throwing any exception. So either we can live with that, or we can not. If we can live with that, then all exception handling can (and should) be removed from the Java call wrapper, because it is not needed. Instead, we check if the nmethod is still entrant in the JVMCI caller function, and accept that it could subsequently become not entrant at any point in time. A few more instructions for a slippery race to happen, but not worth cluttering the call wrapper with JVMCI code over the difference. >> >> Now I don't know how this API is used, but would it make sense to instead check *after* the nmethod has been executed, if it is still in_use()? Then we know that the nmethod has been executed to completion without getting killed off, if we do not get an exception. And if we get an exception then it got deoptimized either right before calling, or during its execution. Like this: >> >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00..01/ >> >> That would completely close the window for that race, but assumes the caller is okay with getting an exception if the nmethod gets deoptimized while running. You know this API better than I do, so that's your call if it is a good idea or not. >> >> Thanks, >> /Erik >> >>> On 2020-06-22 13:12, Doug Simon wrote: >>> Hi, can I please get some reviews for this patch that fixes a race described in JDK-8247992. >>> >>> The fix in the patch is to pass the HotSpotNmethod mirror object (in a handle) through the problematic VM-to-Java transition and only after the transition will the verified entry point be extracted from the mirror. This will prevent a zombie nmethod from being executed. >>> >>> It?s still possible for the call to not reach the ?alternative target? due to sweeping heuristics. This is an acceptable limitation of having an alternative call target in JavaCallWrapper. It could be fixed by altering CPU specific i2c trampoline code but the added complexity is not worth it for this test-only mechanism. Tests using this API can detect whether the alternative target was actually called if they really care about it. >>> >>> Thanks to Erik Osterlund for the in-depth analysis and suggested solutions. >>> >>> https://dougxc.github.io/webrevs/8247992_16/index.html >>> https://bugs.openjdk.java.net/browse/JDK-8247992 >>> >>> Testing: hs-tier1,hs-tier2,hs-tier3-graal >>> >>> -Doug From vladimir.kozlov at oracle.com Mon Jun 22 17:09:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 22 Jun 2020 10:09:20 -0700 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod In-Reply-To: <63214412-AC93-407A-91A5-E6D32B968039@oracle.com> References: <63214412-AC93-407A-91A5-E6D32B968039@oracle.com> Message-ID: <7b7842eb-4345-d0ab-4ef2-c415f8bcefb4@oracle.com> +1 Thanks, Vladimir On 6/22/20 9:38 AM, Erik ?sterlund wrote: > Hi Doug, > > Looks good. > > Thanks, > /Erik > >>> On 22 Jun 2020, at 18:25, Doug Simon wrote: >> ?Thanks Erik, >> >> Almost every one of your suggestions makes sense to me - thanks a lot! I?ve incorporated them into https://dougxc.github.io/webrevs/8247992_16.02/ >> >> The only thing I?m not sure about is the check after the nmethod has been executed. It means we can execute the alternative nmethod but still throw an InvalidInstalledCodeException. This is problematic for an alternative nmethod with a side effect (e.g. it changes a static field). As such, I think the better option is to update the javadoc for HotSpotNmethod.executeVargs as follows: >> >> /** >> * {@inheritDoc} >> * >> * It's possible for the HotSpot runtime to sweep nmethods at any point in time. As a result, >> * there is no guarantee that calling this method will execute the wrapped nmethod. Instead, it >> * may end up executing the bytecode of the associated {@link #getMethod() Java method}. Only if >> * {@link #isValid()} is {@code true} after returning can the caller be sure that the nmethod >> * was executed. If {@link #isValid()} is {@code false}, then the only way to determine if the >> * nmethod was executed is to test for some side-effect specific to the nmethod (e.g., update to >> * a field) that is not performed by the bytecode of the associated {@link #getMethod() Java >> * method}. >> */ >> >> The javadoc update is also part of 8247992_16.02. >> >> -Doug >> >>> On 22 Jun 2020, at 15:49, Erik ?sterlund wrote: >>> >>> Hi Doug, >>> >>> It would be nice if we could get rid of some cluttering of nested >>> >>> #if JVMCI >>> if (blah) { >>> #endif >>> ... >>> #if JVMCI >>> } >>> #endif >>> >>> It makes it hard to read for (to me) no obvious reason. >>> >>> Here is a patch on top of yours to remove some of that: >>> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.doug..00/ >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00/ >>> >>> The reasoning in my changes is that since the SharedRuntime wrong method handler has no idea about these alternative call targets, checking if the nmethod is still entrant right before the call, will not guarantee that we actually end up calling the target nmethod anyway. It could get made not entrant just after the call, but before executing the first instruction, which when becoming not entrant jumps to the wrong method handler, which has no clue about any of this and just re-routes the call to the currently best method code of the method used in the Java call wrapper, without throwing any exception. So either we can live with that, or we can not. If we can live with that, then all exception handling can (and should) be removed from the Java call wrapper, because it is not needed. Instead, we check if the nmethod is still entrant in the JVMCI caller function, and accept that it could subsequently become not entrant at any point in time. A few more instructions for a slippery race to happen, but not worth cluttering the call wrapper with JVMCI code over the difference. >>> >>> Now I don't know how this API is used, but would it make sense to instead check *after* the nmethod has been executed, if it is still in_use()? Then we know that the nmethod has been executed to completion without getting killed off, if we do not get an exception. And if we get an exception then it got deoptimized either right before calling, or during its execution. Like this: >>> >>> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00..01/ >>> >>> That would completely close the window for that race, but assumes the caller is okay with getting an exception if the nmethod gets deoptimized while running. You know this API better than I do, so that's your call if it is a good idea or not. >>> >>> Thanks, >>> /Erik >>> >>>> On 2020-06-22 13:12, Doug Simon wrote: >>>> Hi, can I please get some reviews for this patch that fixes a race described in JDK-8247992. >>>> >>>> The fix in the patch is to pass the HotSpotNmethod mirror object (in a handle) through the problematic VM-to-Java transition and only after the transition will the verified entry point be extracted from the mirror. This will prevent a zombie nmethod from being executed. >>>> >>>> It?s still possible for the call to not reach the ?alternative target? due to sweeping heuristics. This is an acceptable limitation of having an alternative call target in JavaCallWrapper. It could be fixed by altering CPU specific i2c trampoline code but the added complexity is not worth it for this test-only mechanism. Tests using this API can detect whether the alternative target was actually called if they really care about it. >>>> >>>> Thanks to Erik Osterlund for the in-depth analysis and suggested solutions. >>>> >>>> https://dougxc.github.io/webrevs/8247992_16/index.html >>>> https://bugs.openjdk.java.net/browse/JDK-8247992 >>>> >>>> Testing: hs-tier1,hs-tier2,hs-tier3-graal >>>> >>>> -Doug From tom.rodriguez at oracle.com Mon Jun 22 17:28:05 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 22 Jun 2020 10:28:05 -0700 Subject: RFR: 8247992: [JVMCI] HotSpotNmethod.executeVarargs can try execute a zombie nmethod In-Reply-To: References: Message-ID: <0683c93c-0b66-a10d-8665-994f8d430c3c@oracle.com> I agree it shouldn't throw if the method was invalidated when the caller returns. That's something the caller can check if they care. Looks good. tom Doug Simon wrote on 6/22/20 9:25 AM: > Thanks Erik, > > Almost every one of your suggestions makes sense to me - thanks a lot! > I?ve incorporated them into https://dougxc.github.io/webrevs/8247992_16.02/ > > The only thing I?m not sure about is the check after the nmethod has > been executed. It means we can execute the alternative nmethod but still > throw an?InvalidInstalledCodeException. This is problematic for an > alternative nmethod with a side effect (e.g. it changes a static field). > As such, I think the better option is to update the javadoc > for?HotSpotNmethod.executeVargs as follows: > > ? ??/** > ? ? ?*?{@inheritDoc} > ? ? ?* > ? ? ?* It's possible for the HotSpot runtime to sweep?nmethods?at any > point in time. As a result, > ? ? ?* there is no guarantee that calling this method will execute the > wrapped?nmethod. Instead, it > ? ? ?* may end up executing the?bytecode?of the associated?{@link > #getMethod() Java method}. Only if > ? ? ?*?{@link #isValid()}?is {@code true} after returning can the > caller be sure that the?nmethod > ? ? ?* was executed. If?{@link #isValid()}?is {@code false}, then the > only way to determine if the > ? ? ?*?nmethod?was executed is to test for some side-effect specific to > the?nmethod?(e.g., update to > ? ? ?* a field) that is not performed by the?bytecode?of the associated > {@link #getMethod() Java > ? ? ?* method}. > ? ? ?*/ > > The javadoc update is also part of 8247992_16.02. > > -Doug > >> On 22 Jun 2020, at 15:49, Erik ?sterlund > > wrote: >> >> Hi Doug, >> >> It would be nice if we could get rid of some cluttering of nested >> >> #if JVMCI >> ? if (blah) { >> #endif >> ... >> #if JVMCI >> ? } >> #endif >> >> It makes it hard to read for (to me) no obvious reason. >> >> Here is a patch on top of yours to remove some of that: >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.doug..00/ >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00/ >> >> The reasoning in my changes is that since the SharedRuntime wrong >> method handler has no idea about these alternative call targets, >> checking if the nmethod is still entrant right before the call, will >> not guarantee that we actually end up calling the target nmethod >> anyway. It could get made not entrant just after the call, but before >> executing the first instruction, which when becoming not entrant jumps >> to the wrong method handler, which has no clue about any of this and >> just re-routes the call to the currently best method code of the >> method used in the Java call wrapper, without throwing any exception. >> So either we can live with that, or we can not. If we can live with >> that, then all exception handling can (and should) be removed from the >> Java call wrapper, because it is not needed. Instead, we check if the >> nmethod is still entrant in the JVMCI caller function, and accept that >> it could subsequently become not entrant at any point in time. A few >> more instructions for a slippery race to happen, but not worth >> cluttering the call wrapper with JVMCI code over the difference. >> >> Now I don't know how this API is used, but would it make sense to >> instead check *after* the nmethod has been executed, if it is still >> in_use()? Then we know that the nmethod has been executed to >> completion without getting killed off, if we do not get an exception. >> And if we get an exception then it got deoptimized either right before >> calling, or during its execution. Like this: >> >> http://cr.openjdk.java.net/~eosterlund/8247992/webrev.00..01/ >> >> That would completely close the window for that race, but assumes the >> caller is okay with getting an exception if the nmethod gets >> deoptimized while running. You know this API better than I do, so >> that's your call if it is a good idea or not. >> >> Thanks, >> /Erik >> >> On 2020-06-22 13:12, Doug Simon wrote: >>> Hi, can I please get some reviews for this patch that fixes a race >>> described in JDK-8247992 >>> . >>> >>> The fix in the patch is to pass the HotSpotNmethod mirror object (in >>> a handle) through the problematic VM-to-Java transition and only >>> after the transition will the verified entry point be extracted from >>> the mirror. This will prevent a zombie nmethod from being executed. >>> >>> It?s still possible for the call to not reach the ?alternative >>> target? due to sweeping heuristics. This is an acceptable limitation >>> of having an alternative call target in JavaCallWrapper. It could be >>> fixed by altering CPU specific i2c trampoline code but the added >>> complexity is not worth it for this test-only mechanism. Tests using >>> this API can detect whether the alternative target was actually >>> called if they really care about it. >>> >>> Thanks to Erik Osterlund for the in-depth analysis and suggested >>> solutions. >>> >>> https://dougxc.github.io/webrevs/8247992_16/index.html >>> https://bugs.openjdk.java.net/browse/JDK-8247992 >>> >>> Testing:?hs-tier1,hs-tier2,hs-tier3-graal >>> >>> -Doug >> > From felix.yang at huawei.com Tue Jun 23 00:42:54 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 23 Jun 2020 00:42:54 +0000 Subject: RFR(XS): 8247979: aarch64: missing side effect of killing flags for clearArray_reg_reg Message-ID: Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8247979 Webrev: http://cr.openjdk.java.net/~fyang/8247979/webrev.00 For clearArray_reg_reg in aarch64.ad, we call function: MacroAssembler::zero words(Register ptr, Register cnt). This function modifies the flags register by doing a cmp instruction at entry. But this is not reflected on the side effect of clearArray_reg_reg. We didn't see this is triggering problems. But this may pose similar risk as bug: 8224828: aarch64: rflags is not correct after safepoint poll. Tier1-3 tested on aarch64-linux-gnu. OK? diff -r 2342d5af52b7 src/hotspot/cpu/aarch64/aarch64.ad --- a/src/hotspot/cpu/aarch64/aarch64.ad Mon Jun 22 08:09:23 2020 +0200 +++ b/src/hotspot/cpu/aarch64/aarch64.ad Mon Jun 22 15:58:05 2020 +0800 @@ -13845,7 +13845,7 @@ instruct clearArray_reg_reg(iRegL_R11 cnt, iRegP_R10 base, Universe dummy, rFlagsReg cr) %{ match(Set dummy (ClearArray cnt base)); - effect(USE_KILL cnt, USE_KILL base); + effect(USE_KILL cnt, USE_KILL base, KILL cr); ins_cost(4 * INSN_COST); format %{ "ClearArray $cnt, $base" %} BTW: clearArray_imm_reg does not have the issue since it calls a different function: MacroAssembler::zero_words(Register base, u_int64_t cnt) 13843 // clearing of an array 13844 13845 instruct clearArray_reg_reg(iRegL_R11 cnt, iRegP_R10 base, Universe dummy, rFlagsReg cr) 13846 %{ 13847 match(Set dummy (ClearArray cnt base)); 13848 effect(USE_KILL cnt, USE_KILL base); 13849 13850 ins_cost(4 * INSN_COST); 13851 format %{ "ClearArray $cnt, $base" %} 13852 13853 ins_encode %{ 13854 __ zero_words($base$$Register, $cnt$$Register); 13855 %} 13856 13857 ins_pipe(pipe_class_memory); 13858 %} 4771 void MacroAssembler::zero_words(Register ptr, Register cnt) 4772 { 4773 assert(is_power_of_2(zero_words_block_size), "adjust this"); 4774 assert(ptr == r10 && cnt == r11, "mismatch in register usage"); 4775 4776 BLOCK_COMMENT("zero_words {"); 4777 cmp(cnt, (u1)zero_words_block_size); <================= From felix.yang at huawei.com Tue Jun 23 02:52:16 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 23 Jun 2020 02:52:16 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <878sgfwbyc.fsf@redhat.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> Message-ID: Hi, > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Monday, June 22, 2020 7:57 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > > Does the updated patch look better? > > It looks good to me. Webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.01/ Submitted to jdk-submit repo and test result received show 4 new failures with debug build: Test Tier Platform Description compiler/escapeAnalysis/TestArrayCopy.java tier1 linux-aarch64-debug ExitCode: 134 compiler/escapeAnalysis/TestArrayCopy.java tier1 linux-x64-debug ExitCode: 134 compiler/escapeAnalysis/TestArrayCopy.java tier1 macosx-x64-debug ExitCode: 134 compiler/escapeAnalysis/TestArrayCopy.java tier1 windows-x64-debug ExitCode: 1 This can be reproduced locally with a slowdebug build by doing: make run-test TEST="test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopy.java" # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/yangfei/openjdk-jdk/src/hotspot/share/opto/phaseX.cpp:1135), pid=10598, tid=10613 # assert(false) failed: infinite loop in PhaseIterGVN::optimize Still needs efforts to see why that happens. Thanks, Felix From zhuoren.wz at alibaba-inc.com Tue Jun 23 03:39:59 2020 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Tue, 23 Jun 2020 11:39:59 +0800 Subject: =?UTF-8?B?UmU6IFJGUjo4MjQzNjE1IENvbnRpbnVvdXMgZGVvcHRpbWl6YXRpb25zIHdpdGggUmVhc29u?= =?UTF-8?B?PXVuc3RhYmxlX2lmIGFuZCBBY3Rpb249bm9uZQ==?= In-Reply-To: References: <272f8207-0b1e-4b34-b1d4-0f562b4da9d1.zhuoren.wz@alibaba-inc.com> <4dc2e0ef-315b-a72b-bb8c-6b5f418765ed@oracle.com> , Message-ID: <885b7215-0586-47ad-8124-e76822bb809f.zhuoren.wz@alibaba-inc.com> Hi John, I beta the patch for about one month and it does fix the Continuous deoptimization issue. Meanwhile, a large number of deopt were observed in checkcast, and I found an optimistic assumption in checkcast code generation, please pay attention to the comment "Edge case: no mature data. Be optimistic here". bool GraphKit::seems_never_null(Node* obj, ciProfileData* data, bool& speculating) { speculating = !_gvn.type(obj)->speculative_maybe_null(); Deoptimization::DeoptReason reason = Deoptimization::reason_null_check(speculating); if (UncommonNullCast // Cutout for this technique && obj != null() // And not the -Xcomp stupid case? && !too_many_traps(reason) ) { if (speculating) { return true; } if (data == NULL) // Edge case: no mature data. Be optimistic here. return true; // If the profile has not seen a null, assume it won't happen. assert(java_bc() == Bytecodes::_checkcast || java_bc() == Bytecodes::_instanceof || java_bc() == Bytecodes::_aastore, "MDO must collect null_seen bit here"); return !data->as_BitData()->null_seen(); } speculating = false; return false; } Can this corner case cause deopt storms? Please give advice. Regards, Zhuoren ------------------------------------------------------------------ From:John Rose Sent At:2020 May 12 (Tue.) 04:15 To:Sandler Cc:Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net Subject:Re: RFR:8243615 Continuous deoptimizations with Reason=unstable_if and Action=none On May 11, 2020, at 5:58 AM, Wang Zhuo(Zhuoren) wrote: Theoretically speaking other optimizations, with Action_maybe_recompile or Action_reinterpret, can be affected, because in uncommon_trap, Action_maybe_recompile and Action_reinterpret will be changed to Action_none if too many recompiles happened. While I have only met this issue with Reason_unstable_if so far. Here?s some background: The too_many_traps logic is like those barrels full of sand or water at the edges of freeway intersections, or a backstop on a baseball field. It?s better to have a backstop than to have none at all, but something is wrong if you are hitting the backstop. In short, the too_many_traps logic is present to prevent trap storms from lasting forever. But even short trap storms are a problem, if they happen often enough. Also, the too_many_traps logic has in the past failed to terminate trap storms. I think the bug here is probably whatever specific factor is causing small trap storms, which in turn are triggering too_many_traps. Maybe there?s a bytecode that is trapping too often, and that bytecode individually is not throttling its own traps, and so the generic backstop logic is being called into play. (Less likely, the generic throttling logic needs some fix. But usually the right fix is at the root cause, with a single optimization or bytecode that is going wrong too often.) Sometimes it?s one bytecode running one corner case optimization that is trapping too many times, as if the JIT?s optimizer were saying to itself ?last time this optimization failed, but this time for sure!?, or as if the JIT?s optimizer has no feedback path at all to see that the optimization has failed in the past. Sometimes it is a whole class of bytecodes, such as ?all check-casts arising from generic erasure.? HTH ? John From tobias.hartmann at oracle.com Tue Jun 23 07:07:15 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 23 Jun 2020 09:07:15 +0200 Subject: RFR(S): 8247824: CTW: C2 (Shenandoah) compilation fails with SEGV in SBC2Support::pin_and_expand In-Reply-To: <87sgeswn1f.fsf@redhat.com> References: <87sgeswn1f.fsf@redhat.com> Message-ID: <92f8761b-2a54-3dca-fba0-a526b005480f@oracle.com> Hi Roland, looks good to me. Best regards, Tobias On 18.06.20 14:55, Roland Westrelin wrote: > > https://bugs.openjdk.java.net/browse/JDK-8247824 > http://cr.openjdk.java.net/~roland/8247824/webrev.00/ > > If a barrier is expanded in the outer loop of a strip mined loop nest, > the outer loop head is changed to a new LoopNode so loop strip mining > verification code doesn't trigger and fail. The crash occurs when > there's 2 barriers in the outer loop and C2 attempts to transform the > loop head twice. The second time, loop->_head points to a dead node. > > Roland. > From tobias.hartmann at oracle.com Tue Jun 23 07:09:46 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 23 Jun 2020 09:09:46 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <82c1f599-255a-cee0-0818-a10d79315b4e@oracle.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <82c1f599-255a-cee0-0818-a10d79315b4e@oracle.com> Message-ID: <1017e092-691e-9b76-ff0b-71d5523d4b29@oracle.com> On 19.06.20 09:49, Tobias Hartmann wrote: > Let me run this through some performance testing, to see if it makes a difference. Performance testing did not show any significant difference but I would still prefer the solution suggested by Roland. Best regards, Tobias From tobias.hartmann at oracle.com Tue Jun 23 07:28:15 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 23 Jun 2020 09:28:15 +0200 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: Message-ID: <589465dd-88e7-0de3-a2c4-6a2a0f56b8bb@oracle.com> Hi Pengfei, > http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ I don't understand how the change in loopTransform.cpp is supposed to work. Shouldn't we bail out if use != polladr? Also please add a comment to vm_version_x86.cpp explaining why this optimization is currently disabled. Could you add the JMH benchmark to test/micro/org/openjdk/bench/? Thanks, Tobias On 22.06.20 07:02, Pengfei Li wrote: > PING. May I have comments from another reviewer? I need a second review. > > -- > Thanks, > Pengfei > >> Sorry I forgot to paste below JMH link in my last email. >> >> [1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java >> >> BTW. If I turn on OptimizeFill manually there's below performance regression >> on x86. So I turned it off on x86 in my patch to make things unchanged. >> >> Before (x86 with -XX:+OptimizeFill) >> Benchmark Mode Cnt Score Error Units >> TestArrayFill.fillByteArray avgt 25 1793.206 ? 15.337 ns/op >> TestArrayFill.fillIntArray avgt 25 6679.491 ? 14.729 ns/op >> TestArrayFill.fillShortArray avgt 25 3412.708 ? 12.005 ns/op >> TestArrayFill.zeroByteArray avgt 25 1785.940 ? 15.174 ns/op >> TestArrayFill.zeroIntArray avgt 25 6666.709 ? 11.735 ns/op >> TestArrayFill.zeroShortArray avgt 25 3404.146 ? 23.045 ns/op >> >> After (x86 with -XX:+OptimizeFill) >> Benchmark Mode Cnt Score Error Units >> TestArrayFill.fillByteArray avgt 25 2281.374 ? 191.220 ns/op >> TestArrayFill.fillIntArray avgt 25 9009.679 ? 901.541 ns/op >> TestArrayFill.fillShortArray avgt 25 4828.686 ? 49.199 ns/op >> TestArrayFill.zeroByteArray avgt 25 2463.745 ? 47.640 ns/op >> TestArrayFill.zeroIntArray avgt 25 9062.682 ? 939.538 ns/op >> TestArrayFill.zeroShortArray avgt 25 4837.231 ? 50.026 ns/op >> >>> Hi, >>> >>> Can I have a review of this C2 loop optimization fix? >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8247307 >>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ >>> >>> C2 has a loop optimization phase called intrinsify_fill. It matches >>> the pattern of single array store with an loop invariant in a counted >>> loop, like below, and replaces it with call to some stub routine. >>> >>> for (int i = start; i < limit; i++) { >>> a[i] = value; >>> } >>> >>> Unfortunately, this doesn't work in current jdk after loop strip mining. >>> The above loop is eventually unrolled and auto-vectorized by >>> subsequent optimization phases. Root cause is that in strip-mined >>> loops, the inner CountedLoopNode may be used by the address polling >>> node of the safepoint in the outer loop. But as the safepoint polling >>> has nothing related to any real operations in the loop, it should not hinder >> the pattern match. >>> So in this patch, the polladr's use is ignored in the match check. >>> >>> We have some performance comparison of the code for array fill, >>> between the auto-vectorized version and the stub routine version. The >>> JMH case for the tests can be found at [1]. Results show that on x86, >>> the stub code is even slower than the auto-vectorized code. To prevent >>> any regression, vm option OptimizedFill is turned off for x86 in this patch. >>> So this patch doesn't impact on the generated code on x86. On AArch64, >>> the two versions show almost the same performance in general cases. >>> But if the value to be filled is zero, the stub code's performance is >>> much better. This makes sence as AArch64 uses cache maintenance >>> instructions (DC ZVA) to zero large blocks in the hand-crafted >>> assembly. Below are JMH scores on AArch64. >>> >>> Before: >>> Benchmark Mode Cnt Score Error Units >>> TestArrayFill.fillByteArray avgt 25 2078.700 ? 7.719 ns/op >>> TestArrayFill.fillIntArray avgt 25 12371.497 ? 566.773 ns/op >>> TestArrayFill.fillShortArray avgt 25 4132.439 ? 25.096 ns/op >>> TestArrayFill.zeroByteArray avgt 25 2080.313 ? 7.516 ns/op >>> TestArrayFill.zeroIntArray avgt 25 10961.331 ? 527.750 ns/op >>> TestArrayFill.zeroShortArray avgt 25 4126.386 ? 20.997 ns/op >>> >>> After: >>> Benchmark Mode Cnt Score Error Units >>> TestArrayFill.fillByteArray avgt 25 2080.382 ? 2.103 ns/op >>> TestArrayFill.fillIntArray avgt 25 11997.621 ? 569.058 ns/op >>> TestArrayFill.fillShortArray avgt 25 4309.035 ? 285.456 ns/op >>> TestArrayFill.zeroByteArray avgt 25 903.434 ? 10.944 ns/op >>> TestArrayFill.zeroIntArray avgt 25 8141.533 ? 946.341 ns/op >>> TestArrayFill.zeroShortArray avgt 25 1784.124 ? 24.618 ns/op >>> >>> Another advantage of using the stub routine is that the generated code >>> size is reduced. >>> >>> Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 >>> are tested and no new failure is found. >> >> Thanks, >> Pengfei > From rwestrel at redhat.com Tue Jun 23 08:57:39 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 23 Jun 2020 10:57:39 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> Message-ID: <87wo3yupks.fsf@redhat.com> > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/yangfei/openjdk-jdk/src/hotspot/share/opto/phaseX.cpp:1135), pid=10598, tid=10613 > # assert(false) failed: infinite loop in PhaseIterGVN::optimize For known instances, MemNode::optimize_memory_chain() clones the bottom phi into a new phi that your patch then clones to create a bottom phi. diff --git a/src/hotspot/share/opto/cfgnode.cpp b/src/hotspot/share/opto/cfgnode.cpp --- a/src/hotspot/share/opto/cfgnode.cpp +++ b/src/hotspot/share/opto/cfgnode.cpp @@ -1337,7 +1337,7 @@ // Looking for phis with identical inputs. If we find one that has // type TypePtr::BOTTOM, replace the current phi with the bottom phi. - if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM) { + if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM && !adr_type()->is_known_instance()) { uint phi_len = req(); Node* phi_reg = region(); for (DUIterator_Fast imax, i = phi_reg->fast_outs(imax); i < imax; i++) { fixes it. Roland. From adinn at redhat.com Tue Jun 23 09:37:13 2020 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2020 10:37:13 +0100 Subject: RFR(XS): 8247979: aarch64: missing side effect of killing flags for clearArray_reg_reg In-Reply-To: References: Message-ID: On 23/06/2020 01:42, Yangfei (Felix) wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8247979 > Webrev: http://cr.openjdk.java.net/~fyang/8247979/webrev.00 > > For clearArray_reg_reg in aarch64.ad, we call function: MacroAssembler::zero words(Register ptr, Register cnt). > This function modifies the flags register by doing a cmp instruction at entry. But this is not reflected on the side effect of clearArray_reg_reg. > We didn't see this is triggering problems. But this may pose similar risk as bug: 8224828: aarch64: rflags is not correct after safepoint poll. > Tier1-3 tested on aarch64-linux-gnu. OK? Nice catch, Felix. The patch looks good to me. regards, Andrew Dinn ----------- From rwestrel at redhat.com Tue Jun 23 15:41:36 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 23 Jun 2020 17:41:36 +0200 Subject: [11u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 Message-ID: <87tuz1vlfz.fsf@redhat.com> Original bug: https://bugs.openjdk.java.net/browse/JDK-8240676 https://hg.openjdk.java.net/jdk/jdk/rev/6ccf082f50d4 The 11u patch is unchanged but context in compile.hpp changed so the original patch requires a small adjustment. 11u webrev: http://cr.openjdk.java.net/~roland/8240676-11u/webrev.00/ Testing: x86_64, verified new test fails with the fix commented out, works otherwise, tier1 Roland. From vladimir.kozlov at oracle.com Tue Jun 23 16:08:57 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 23 Jun 2020 09:08:57 -0700 Subject: [11u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <87tuz1vlfz.fsf@redhat.com> References: <87tuz1vlfz.fsf@redhat.com> Message-ID: <68dd5de1-1bc3-b04c-6101-e022c21d3581@oracle.com> Hi Roland, You miss new test you had in original changes. Otherwise it is good. Thanks, Vladimir On 6/23/20 8:41 AM, Roland Westrelin wrote: > > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8240676 > https://hg.openjdk.java.net/jdk/jdk/rev/6ccf082f50d4 > > The 11u patch is unchanged but context in compile.hpp changed so the > original patch requires a small adjustment. > > 11u webrev: > http://cr.openjdk.java.net/~roland/8240676-11u/webrev.00/ > > Testing: x86_64, verified new test fails with the fix commented out, > works otherwise, tier1 > > Roland. > From hohensee at amazon.com Tue Jun 23 17:35:12 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 23 Jun 2020 17:35:12 +0000 Subject: RFR[M]: 8151779: Some intrinsic flags could be replaced with one general flag In-Reply-To: <1bd5bfea-d19c-bff5-66c7-c8dfdfb953a0@amazon.com> References: <19CD3956-4DC6-4908-8626-27D48A9AB4A4@amazon.com> <0EDAAC88-E5D9-424F-A19E-5E20C689C2F3@amazon.com> <801D878C-CAE5-4EBE-8AFE-4E35346CD5BD@amazon.com> <58ff5b66-1dce-d4ad-8f21-254abd1b887b@oracle.com> <65dcfd1f-5e7e-b9e1-8298-5daafcda8a81@oracle.com> <1EBE66E6-9AA7-4EC5-9B91-45F884071FAC@amazon.com> <2982174F-DBB6-4316-93C3-1B4DFDF34C88@amazon.com> <0365691c-5f80-9a3a-e47f-9852ef66f217@oracle.com> <1bd5bfea-d19c-bff5-66c7-c8dfdfb953a0@amazon.com> Message-ID: Looks good. Thanks, Paul ?On 6/17/20, 1:04 AM, "hotspot-compiler-dev on behalf of Liu, Xin" wrote: Hi, Nils, Thank you to review it. I finally get gtest fixed on my side. I verify that webrev again. It still can apply to TIP cleanly and pass both gtest:all and hotspot:tier1. Call for another reviewer to approve it. http://cr.openjdk.java.net/~xliu/8151779/05/webrev/ I think of your request and I've filed a follow-up issue: JDK-8247732. I plan to use a constraint of globals.hpp and install the constraint function to jvmFlagContraintsCompiler.hpp. for directives, I will add validator in DirectiveSet::init_control_intrinsic. thanks, --lx On 6/11/20 2:24 PM, Nils Eliasson wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > Hi Xin, > > In general I think the patch looks good. > > I am missing strict name checking. (I want to see an error on startup if > the user has specified unknown intrinsic names.) I see that the lazy > initialization of the intrinsic name tables might make it non-trivial to > find a good place to do that. I am ok if you follow up on that in a > future patch. > > Best regards, > Nils Eliasson > > >> Incremental diff: http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff >> >> I verified it in submit repo a week ago. I also double-check the >> patch still can patch to TIP and pass both hotspot:tier1 and gtest:all. >> >> Here is log message I got from mach-5. >> Job: mach5-one-phh-JDK-8151779-1-20200513-1821-11015755 >> >> BuildId: 2020-05-13-1820211.hohensee.source >> >> No failed tests >> >> Tasks Summary >> >> EXECUTED_WITH_FAILURE: 0 >> NOTHING_TO_RUN: 0 >> KILLED: 0 >> HARNESS_ERROR: 0 >> FAILED: 0 >> PASSED: 101 >> UNABLE_TO_RUN: 0 >> NA: 0 >> >> >> Thanks, >> --lx >> >> >> >> On 5/13/20, 12:03 AM, "hotspot-compiler-dev on behalf of Liu, Xin" >> > xxinliu at amazon.com> wrote: >> >> Hi, Vladimir, >> >> > 2. add +/- UseCRC32Intrinsics to IntrinsicAvailableTest.java >> > The purpose of that test is not to generate a CRC32 intrinsic. >> Its purpose is to check if compilers determine to intrinsify >> _updateCRC32 or not. >> > Mathematically, "UseCRC32Intrinsics" is a set = [_updateCRC32, >> _updateBytesCRC32, _updateByteBufferCRC32]. >> > "-XX:-UseCRC32Intrinsics" disables all 3 of them. If users use >> -XX:ControlIntrinsic=+_updateCRC32 and -XX:-UseCRC32Intrinsics, >> _updateCRC32 should be enabled individually. >> >> No, I think we should preserve current behavior when >> UseCRC32Intrinsics is off then all corresponding intrinsics are >> also should be off. This is the purpose of such flags - to be >> able control several intrinsics with one flag. >> Otherwise you have to check each individual intrinsic if CPU >> does not support them. Even if code for some of these >> intrinsics can be generated on this CPU. We should be >> consistent, otherwise code can become very complex to support. >> ---- >> If -XX:ControlIntrinsic=+_updateBytesCRC32 can't win over >> -XX:-UseCRC32Intrinsics, it will come back the justification of >> JBS-8151779: >> Why do we need to support the usage >> -XX:ControlIntrinsic=+_updateBytesCRC32? If a user doesn't set >> +updateBytesCRC32, it's still enabled. >> >> I read the description of "JBS-8235981" and "JBS-8151779" again. >> I try to understand in this way. The option 'UseCRC32Intrinsics' is >> the consolidation of 3 real intrinsics [_updateCRC32, >> _updateBytesCRC32, _updateByteBufferCRC32]. It represents some sorta >> hardware capabilities to make those intrinsics optimal. If >> UseCRC32Intrinsics is OFF, it will not make sense to intrinsify them >> anymore because inliner can deliver the similar result. >> >> Quote from JBS-8235981 "Right now, there's no way to introduce >> experimental intrinsics which are turned off by default and let users >> enable them on their side. " >> Currently, once a user declares one new intrinsics in >> VM_INTRINSICS_DO, it's enabled. It might not be true in the future. >> i.e. A develop can declare an intrinsic but mark it turn-off by >> default. He will try it out by -XX:ControlIntrinsic=+_myNewIntrinsic >> in his development stage. >> >> Do I catch up your intention this time? if yes, could you take a >> look at this new revision? I think I meet the requirement. >> Webrev: http://cr.openjdk.java.net/~xliu/8151779/05/webrev/ >> Incremental diff: >> http://cr.openjdk.java.net/~xliu/8151779/r4_to_r5.diff >> >> Here is the change log from rev04. >> 1) An intrinsic is enabled if and only if neither >> ControlIntrinsic nor the corresponding UseXXXIntrinsics disables it. >> The implementation is still in >> vmIntrinsics::is_disabled_by_flags(vmIntrinsics::ID id). >> >> 2) I introduce a compact data structure TriBoolArray. It >> compresses an array of Tribool. Each tribool only takes 2 bits now. >> I also took Coleen's suggestion to put TriBool and TriBoolArray >> in a standalone file "utilities/tribool.hpp". A new gtest is attached. >> >> 3) Correct some typos. Thank you David pointed them out. >> >> Thanks, >> --lx >> >> >> On 5/12/20, 12:59 AM, "David Holmes" >> wrote: >> >> CAUTION: This email originated from outside of the >> organization. Do not click links or open attachments unless you can >> confirm the sender and know the content is safe. >> >> >> >> Hi, >> >> Sorry for the delay in getting back to this. >> >> On 5/05/2020 7:37 pm, Liu, Xin wrote: >> > Hello, David and Nils >> > >> > Thank you to review the patch. I went to brush up my >> English grammar and then update my patch to rev04. >> > https://cr.openjdk.java.net/~xliu/8151779/04/webrev/ >> > Here is the incremental diff: >> https://cr.openjdk.java.net/~xliu/8151779/r3_to_r4.diff It reflect >> changes based on David's feedbacks. I really appreciate that you >> review so carefully and found so many invaluable suggestions. TBH, I >> don't understand Amazon's copyright header neither. I choose the >> simple way to dodge that problem. >> >> In vmSymbols.hpp >> >> + // 1. Disable/Control Intrinsic accept a list of >> intrinsic IDs. >> >> s/accept/accepts/ >> >> + // their final value are subject to hardware inspection >> (VM_Version::initialize). >> >> s/value/values/ >> >> Otherwise all my nits have been addressed - thanks. >> >> I don't need to see a further webrev. >> >> Thanks, >> David >> ----- >> >> > Nils points out a very tricky question. Yes, I also notice >> that each TriBool takes 4 bytes on x86_64. It's a natural machine >> word and supposed to be the most efficient form. As a result, the >> vector control_words take about 1.3Kb for all intrinsics. I thought >> it's not a big deal, but Nils brought up that each DirectiveSet will >> increase from 128b to 1440b. Theoretically, the user may provide a >> CompileCommandFile which consists of hundreds of directives. Will >> hotspot have hundreds of DirectiveSet in that case? >> > >> > Actually, I do have a compacted container of TriBool. It's >> like a vector specialization. >> > https://cr.openjdk.java.net/~xliu/8151779/TriBool.cpp >> > >> > The reason I didn't include it because I still feel that a >> few KiloBytes memories are not a big deal. Nowadays, hotspot allows >> Java programmers allocate over 100G heap. Is it wise to increase >> software complexity to save KBs? >> > >> > If you think it matters, I can integrate it. May I update >> TriBoolArray in a standalone JBS? I have made a lot of changes. I >> hope I can verify them using KitchenSink? >> > >> > For the second problem, I think it's because I used >> 'memset' to initialize an array of objects in rev01. Previously, I >> had code like this: >> > memset(&_intrinsic_control_words[0], 0, >> sizeof(_intrinsic_control_words)); >> > >> > This kind of usage will be warned as >> -Werror=class-memaccess in g++-8. I have fixed it since rev02. I use >> DirectiveSet::fill_in(). Please check out. >> > >> > Thanks, >> > --lx >> > >> >> > From tom.rodriguez at oracle.com Tue Jun 23 20:46:23 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 23 Jun 2020 13:46:23 -0700 Subject: [15] RFR 8247246: Add explicit ResolvedJavaType.link and expose presence of default methods Message-ID: http://cr.openjdk.java.net/~never/8247246/webrev https://bugs.openjdk.java.net/browse/JDK-8247246 This adds a couple operations the JVMCI that are necessary for interacting with classes that aren't linked. ResolvedJavaType.link explicitly attempts linking if the current class isn't linked. ResolvedJavaType.getDeclaredMethods and getDeclaredConstructors both force linking as part of their implementation though this is more historical than necessary though there is code which relies on this behavioiur to force linking. Additionally there's currently no way of identifying class and interface which default other than by visiting all the methods of the clases, so hasDefaultMethods and declaresDefaultMethods have been added to expose this without forcing linking. Mach5 testing is clean and includes unit tests which exercise the new API. From vladimir.kozlov at oracle.com Tue Jun 23 22:02:53 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 23 Jun 2020 15:02:53 -0700 Subject: [15] RFR 8247246: Add explicit ResolvedJavaType.link and expose presence of default methods In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 6/23/20 1:46 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8247246/webrev > https://bugs.openjdk.java.net/browse/JDK-8247246 > > This adds a couple operations the JVMCI that are necessary for interacting with classes that aren't linked. > ResolvedJavaType.link explicitly attempts linking if the current class isn't linked. ResolvedJavaType.getDeclaredMethods > and getDeclaredConstructors both force linking as part of their implementation though this is more historical than > necessary though there is code which relies on this behavioiur to force linking.? Additionally there's currently no way > of identifying class and interface which default other than by visiting all the methods of the clases, so > hasDefaultMethods and declaresDefaultMethods have been added to expose this without forcing linking.? Mach5 testing is > clean and includes unit tests which exercise the new API. From felix.yang at huawei.com Wed Jun 24 00:18:37 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 24 Jun 2020 00:18:37 +0000 Subject: RFR(XS): 8247979: aarch64: missing side effect of killing flags for clearArray_reg_reg In-Reply-To: References: Message-ID: Hi Andrew, > -----Original Message----- > From: Andrew Dinn [mailto:adinn at redhat.com] > Sent: Tuesday, June 23, 2020 5:37 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: RFR(XS): 8247979: aarch64: missing side effect of killing flags for > clearArray_reg_reg > > On 23/06/2020 01:42, Yangfei (Felix) wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8247979 > > Webrev: http://cr.openjdk.java.net/~fyang/8247979/webrev.00 > > > > For clearArray_reg_reg in aarch64.ad, we call function: > MacroAssembler::zero words(Register ptr, Register cnt). > > This function modifies the flags register by doing a cmp instruction at > entry. But this is not reflected on the side effect of clearArray_reg_reg. > > We didn't see this is triggering problems. But this may pose similar risk as > bug: 8224828: aarch64: rflags is not correct after safepoint poll. > > Tier1-3 tested on aarch64-linux-gnu. OK? > Nice catch, Felix. The patch looks good to me. Thanks for the fast review. Pushed as: http://hg.openjdk.java.net/jdk/jdk/rev/9fce19fdda7e Felix From cjashfor at linux.ibm.com Wed Jun 24 00:40:03 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Tue, 23 Jun 2020 17:40:03 -0700 Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding Message-ID: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware. I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API: * Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity. * Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans. * Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments. * If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare). * One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example: Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100; These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail. I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first. Also attached is a simple test case to test that the new intrinsic API doesn't break anything. I'm open to any comments about this. Thanks for your consideration, - Corey Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com -------------- next part -------------- A non-text attachment was scrubbed... Name: decodeBlock_api-20200623.patch Type: text/x-patch Size: 3953 bytes Desc: not available URL: From HORIE at jp.ibm.com Wed Jun 24 01:23:43 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 24 Jun 2020 10:23:43 +0900 Subject: RFR(S): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Hi Corey, Following is the issue I created. https://bugs.openjdk.java.net/browse/JDK-8248188 I will upload a webrev when you're ready as we talked in private. Best regards, Michihiro From: "Corey Ashford" To: "hotspot-compiler-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, joserz at br.ibm.com Date: 2020/06/24 09:40 Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for encodeBlock, but none for decoding. This means that only encoding gets acceleration from the underlying CPU's vector hardware. I'd like to propose adding a new intrinsic for decodeBlock. The considerations I have for this new intrinsic's API: * Don't make any assumptions about the underlying capability of the hardware. For example, do not impose any specific block size granularity. * Don't assume the underlying intrinsic can handle isMIME or isURL modes, but also let them decide if they will process the data regardless of the settings of the two booleans. * Any remaining data that is not processed by the intrinsic will be processed by the pure Java implementation. This allows the intrinsic to process whatever block sizes it's good at without the complexity of handling the end fragments. * If any illegal character is discovered in the decoding process, the intrinsic will simply return -1, instead of requiring it to throw a proper exception from the context of the intrinsic. In the event of getting a -1 returned from the intrinsic, the Java Base64 library code simply calls the pure Java implementation to have it find the error and properly throw an exception. This is a performance trade-off in the case of an error (which I expect to be very rare). * One thought I have for a further optimization (not implemented in the current patch), is that when the intrinsic decides not to process a block because of some combination of isURL and isMIME settings it doesn't handle, it could return extra bits in the return code, encoded as a negative number. For example: Illegal_Base64_char = 0b001; isMIME_unsupported = 0b010; isURL_unsupported = 0b100; These can be OR'd together as needed and then negated (flip the sign). The Base64 library code could then cache these flags, so it will know not to call the intrinsic again when another decodeBlock is requested but with an unsupported mode. This will save the performance hit of calling the intrinsic when it is guaranteed to fail. I've tested the attached patch with an actual intrinsic coded up for Power9/Power10, but those runtime intrinsics and arch-specific patches aren't attached today. I want to get some consensus on the library-level intrinsic API first. Also attached is a simple test case to test that the new intrinsic API doesn't break anything. I'm open to any comments about this. Thanks for your consideration, - Corey Corey Ashford IBM Systems, Linux Technology Center, OpenJDK team cjashfor at us dot ibm dot com [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro Horie/Japan/IBM] From felix.yang at huawei.com Wed Jun 24 03:24:22 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 24 Jun 2020 03:24:22 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <87wo3yupks.fsf@redhat.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> Message-ID: Hi, > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Tuesday, June 23, 2020 4:58 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # Internal Error > > (/home/yangfei/openjdk-jdk/src/hotspot/share/opto/phaseX.cpp:1135), > > pid=10598, tid=10613 # assert(false) failed: infinite loop in > > PhaseIterGVN::optimize > > For known instances, MemNode::optimize_memory_chain() clones the > bottom phi into a new phi that your patch then clones to create a bottom phi. > > diff --git a/src/hotspot/share/opto/cfgnode.cpp > b/src/hotspot/share/opto/cfgnode.cpp > --- a/src/hotspot/share/opto/cfgnode.cpp > +++ b/src/hotspot/share/opto/cfgnode.cpp > @@ -1337,7 +1337,7 @@ > > // Looking for phis with identical inputs. If we find one that has > // type TypePtr::BOTTOM, replace the current phi with the bottom phi. > - if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != > TypePtr::BOTTOM) { > + if (phase->is_IterGVN() && type() == Type::MEMORY && adr_type() != > + TypePtr::BOTTOM && !adr_type()->is_known_instance()) { > uint phi_len = req(); > Node* phi_reg = region(); > for (DUIterator_Fast imax, i = phi_reg->fast_outs(imax); i < imax; i++) { > > fixes it. Thanks, Roland. I updated accordingly, new webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.02 Tier1-3 tested with fastdebug builds both on x86_64-linux-gnu and aarch64-linux-gnu. Newly added test fail without the fix and pass otherwise. Also committed to the submit repo and test results looks good: Job: mach5-one-fyang-JDK-8243670-2-20200623-1609-12020284 BuildId: 2020-06-23-1608238.felix.yang.source No failed tests Tasks Summary ? EXECUTED_WITH_FAILURE: 0 ? KILLED: 0 ? HARNESS_ERROR: 0 ? PASSED: 103 ? UNABLE_TO_RUN: 0 ? NOTHING_TO_RUN: 0 ? FAILED: 0 ? NA: 0 Shall I do the push? Felix From felix.yang at huawei.com Wed Jun 24 03:49:38 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 24 Jun 2020 03:49:38 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <1017e092-691e-9b76-ff0b-71d5523d4b29@oracle.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <82c1f599-255a-cee0-0818-a10d79315b4e@oracle.com> <1017e092-691e-9b76-ff0b-71d5523d4b29@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Tuesday, June 23, 2020 3:10 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > On 19.06.20 09:49, Tobias Hartmann wrote: > > Let me run this through some performance testing, to see if it makes a > difference. > > Performance testing did not show any significant difference but I would still > prefer the solution suggested by Roland. Thanks for the effort :-) The latest patch is: http://cr.openjdk.java.net/~fyang/8243670/webrev.02 Please also take a look. I guess we might need another round of performance testing for this patch? Felix From Pengfei.Li at arm.com Wed Jun 24 04:21:46 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Wed, 24 Jun 2020 04:21:46 +0000 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: <589465dd-88e7-0de3-a2c4-6a2a0f56b8bb@oracle.com> References: <589465dd-88e7-0de3-a2c4-6a2a0f56b8bb@oracle.com> Message-ID: Hi Tobias, Thanks for review comments. > > http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/ > > I don't understand how the change in loopTransform.cpp is supposed to > work. Shouldn't we bail out if use != polladr? In the original code, it's supposed that no node in the loop can be used by nodes outside the loop in the array fill pattern. But that's not true for CountedLoopNode in strip-mined counted loops. So my change just skips this use. It's a good idea to bail out if use != polladr. I changed the if condition to an assertion and ran the jtreg cases I tested before. There's no new failure so I think there's no other use cases. > Also please add a comment to vm_version_x86.cpp explaining why this > optimization is currently disabled. Done. > Could you add the JMH benchmark to test/micro/org/openjdk/bench/? Done and uploaded a new webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.01/ Please let me know if I didn't explain that clearly. -- Thanks, Pengfei From rwestrel at redhat.com Wed Jun 24 07:07:07 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 24 Jun 2020 09:07:07 +0200 Subject: [11u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <68dd5de1-1bc3-b04c-6101-e022c21d3581@oracle.com> References: <87tuz1vlfz.fsf@redhat.com> <68dd5de1-1bc3-b04c-6101-e022c21d3581@oracle.com> Message-ID: <87r1u5uelg.fsf@redhat.com> Hi Vladimir, > You miss new test you had in original changes. Otherwise it is good. Thanks for reviewing this! Right. Good catch. Here with the test: http://cr.openjdk.java.net/~roland/8240676-11u/webrev.01/ Roland. From vladimir.kozlov at oracle.com Wed Jun 24 15:50:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 24 Jun 2020 08:50:13 -0700 Subject: [11u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <87r1u5uelg.fsf@redhat.com> References: <87tuz1vlfz.fsf@redhat.com> <68dd5de1-1bc3-b04c-6101-e022c21d3581@oracle.com> <87r1u5uelg.fsf@redhat.com> Message-ID: Good. Thanks, Vladimir On 6/24/20 12:07 AM, Roland Westrelin wrote: > > Hi Vladimir, > >> You miss new test you had in original changes. Otherwise it is good. > > Thanks for reviewing this! Right. Good catch. Here with the test: > > http://cr.openjdk.java.net/~roland/8240676-11u/webrev.01/ > > Roland. > From tom.rodriguez at oracle.com Wed Jun 24 17:43:12 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Jun 2020 10:43:12 -0700 Subject: [15] RFR 8247246: Add explicit ResolvedJavaType.link and expose presence of default methods In-Reply-To: References: Message-ID: <834111d5-e4fc-e2ab-630f-837e731a56f2@oracle.com> Thanks! tom Vladimir Kozlov wrote on 6/23/20 3:02 PM: > Looks good. > > Thanks, > Vladimir > > On 6/23/20 1:46 PM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/8247246/webrev >> https://bugs.openjdk.java.net/browse/JDK-8247246 >> >> This adds a couple operations the JVMCI that are necessary for >> interacting with classes that aren't linked. ResolvedJavaType.link >> explicitly attempts linking if the current class isn't linked. >> ResolvedJavaType.getDeclaredMethods and getDeclaredConstructors both >> force linking as part of their implementation though this is more >> historical than necessary though there is code which relies on this >> behavioiur to force linking.? Additionally there's currently no way of >> identifying class and interface which default other than by visiting >> all the methods of the clases, so hasDefaultMethods and >> declaresDefaultMethods have been added to expose this without forcing >> linking.? Mach5 testing is clean and includes unit tests which >> exercise the new API. From sandhya.viswanathan at intel.com Wed Jun 24 19:03:52 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 24 Jun 2020 19:03:52 +0000 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> Message-ID: Hi Andrew/Yang, We couldn?t propose Vector API to target in time for JDK 15 and hoping to do so early in JDK 16 timeframe. The implementation reviews on other components have made good progress. We have so far ok to PPT from (runtime, shared compiler changes, x86 backend). Java API implementation review is in progress. I wanted to check with you both if we have a go ahead from aarch64 backed point of view. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Yang Zhang Sent: Tuesday, May 26, 2020 7:59 PM To: Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes > But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? The new instructions can be classified as: 1. Instructions that can be matched with NEON instructions directly. MulVB and SqrtVF have been merged into jdk master already. The patch of AbsV is in review [1]. 2. Instructions that Jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. May I have a new patch for these? 3. Panama/Vector API specific instructions Such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. Regards Yang [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html -----Original Message----- From: Andrew Haley Sent: Tuesday, May 26, 2020 4:25 PM To: Yang Zhang ; Paul Sandoz Cc: hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; nd Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes On 25/05/2020 09:26, Yang Zhang wrote: > In jdk master, what we need to do is that writing m4 file for existing > vector instructions and placed them to a new file aarch64_neon.ad. > If no question, I will do it right away. I'm not entirely sure that such a change is necessary now. In particular, reorganizing the existing vector instructions is IMO excessive, but I admit that it might be an improvement. But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? It'd help if this was possible. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Thu Jun 25 13:08:04 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 25 Jun 2020 15:08:04 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> Message-ID: Hi Felix, On 24.06.20 05:24, Yangfei (Felix) wrote: > I updated accordingly, new webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.02 Looks good to me but isn't the MergeMemNode::Ideal transformation useless now? Thanks, Tobias From tobias.hartmann at oracle.com Thu Jun 25 13:11:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 25 Jun 2020 15:11:11 +0200 Subject: RFR(S): 8247307: C2: Loop array fill stub routines are not called In-Reply-To: References: <589465dd-88e7-0de3-a2c4-6a2a0f56b8bb@oracle.com> Message-ID: Hi Pengfei, On 24.06.20 06:21, Pengfei Li wrote: > In the original code, it's supposed that no node in the loop can be used by nodes outside the loop in the array fill pattern. But that's not true for CountedLoopNode in strip-mined counted loops. So my change just skips this use. It's a good idea to bail out if use != polladr. I changed the if condition to an assertion and ran the jtreg cases I tested before. There's no new failure so I think there's no other use cases. Right, an assert is better. > Done and uploaded a new webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.01/ Looks good. Best regards, Tobias From daniel.daugherty at oracle.com Thu Jun 25 16:18:55 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 12:18:55 -0400 Subject: RFR(T): 8248335: ProblemList compiler/ciReplay/TestServerVM.java and TestVMNoCompLevel.java with AOT Message-ID: Greetings, I'm doing another round of reduce-the-noise in the CI in preparation for the upcoming weekend... So I have another trivial review... Here's the bug for the failures: ??? JDK-8248265 compiler/ciReplay fail in tier6-comp-aot ??? https://bugs.openjdk.java.net/browse/JDK-8248265 and here's the bug for the ProblemListing: ??? JDK-8248335 ProblemList compiler/ciReplay/TestServerVM.java and TestVMNoCompLevel.java with AOT ??? https://bugs.openjdk.java.net/browse/JDK-8248335 Here's the context diff: $ hg diff diff -r c2e9eadd464c test/hotspot/jtreg/ProblemList-aot.txt --- a/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 08:36:59 2020 -0700 +++ b/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 12:11:40 2020 -0400 @@ -85,5 +85,8 @@ ?compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java 8167430 generic-all ?compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java 8167430 generic-all +compiler/ciReplay/TestServerVM.java????? 8248265 generic-all +compiler/ciReplay/TestVMNoCompLevel.java 8248265 generic-all + ?vmTestbase/vm/mlvm/indy/stress/java/relinkMutableCallSiteFreq/Test.java 8226689 generic-all ?vmTestbase/vm/mlvm/indy/stress/java/relinkVolatileCallSiteFreq/Test.java 8226689 generic-all Thanks, in advance, for any comments, questions or suggestions. Dan From aph at redhat.com Thu Jun 25 16:31:40 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 25 Jun 2020 17:31:40 +0100 Subject: 8248336: AArch64: C2: offset overflow in BoxLockNode::emit Message-ID: <3fa560fa-c1fd-0131-10d2-040bac25b7f7@redhat.com> BoxLockNode::emit only allows a 12-bit offset from register SP to the stack slot that contains the inflated lock. Rather amazingly we've never seen this fail in production, but in theory a BoxLockNode can be anywhere in the stack frame. I have once seen this fail in test code, but it is very hard to reproduce. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Jun 25 16:48:59 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 25 Jun 2020 17:48:59 +0100 Subject: 8248336: AArch64: C2: offset overflow in BoxLockNode::emit In-Reply-To: <3fa560fa-c1fd-0131-10d2-040bac25b7f7@redhat.com> References: <3fa560fa-c1fd-0131-10d2-040bac25b7f7@redhat.com> Message-ID: <2db7b669-63b6-1dbd-6d7a-7bac55144167@redhat.com> On 25/06/2020 17:31, Andrew Haley wrote: > BoxLockNode::emit only allows a 12-bit offset from register SP to the > stack slot that contains the inflated lock. Rather amazingly we've > never seen this fail in production, but in theory a BoxLockNode can be > anywhere in the stack frame. > > I have once seen this fail in test code, but it is very hard to > reproduce. http://cr.openjdk.java.net/~aph/8248336/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Thu Jun 25 16:58:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Jun 2020 09:58:22 -0700 Subject: RFR(T): 8248335: ProblemList compiler/ciReplay/TestServerVM.java and TestVMNoCompLevel.java with AOT In-Reply-To: References: Message-ID: <3ca7069c-3c26-edac-569c-555f5130ff34@oracle.com> Good. Trivial. Thanks, Vladimir On 6/25/20 9:18 AM, Daniel D. Daugherty wrote: > Greetings, > > I'm doing another round of reduce-the-noise in the CI in preparation > for the upcoming weekend... So I have another trivial review... > > Here's the bug for the failures: > > ??? JDK-8248265 compiler/ciReplay fail in tier6-comp-aot > ??? https://bugs.openjdk.java.net/browse/JDK-8248265 > > and here's the bug for the ProblemListing: > > ??? JDK-8248335 ProblemList compiler/ciReplay/TestServerVM.java and TestVMNoCompLevel.java with AOT > ??? https://bugs.openjdk.java.net/browse/JDK-8248335 > > Here's the context diff: > > $ hg diff > diff -r c2e9eadd464c test/hotspot/jtreg/ProblemList-aot.txt > --- a/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 08:36:59 2020 -0700 > +++ b/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 12:11:40 2020 -0400 > @@ -85,5 +85,8 @@ > ?compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java 8167430 generic-all > ?compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java 8167430 generic-all > > +compiler/ciReplay/TestServerVM.java????? 8248265 generic-all > +compiler/ciReplay/TestVMNoCompLevel.java 8248265 generic-all > + > ?vmTestbase/vm/mlvm/indy/stress/java/relinkMutableCallSiteFreq/Test.java 8226689 generic-all > ?vmTestbase/vm/mlvm/indy/stress/java/relinkVolatileCallSiteFreq/Test.java 8226689 generic-all > > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > From daniel.daugherty at oracle.com Thu Jun 25 16:59:10 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 12:59:10 -0400 Subject: RFR(T): 8248335: ProblemList compiler/ciReplay/TestServerVM.java and TestVMNoCompLevel.java with AOT In-Reply-To: <3ca7069c-3c26-edac-569c-555f5130ff34@oracle.com> References: <3ca7069c-3c26-edac-569c-555f5130ff34@oracle.com> Message-ID: Thanks for the fast review! Dan On 6/25/20 12:58 PM, Vladimir Kozlov wrote: > Good. Trivial. > > Thanks, > Vladimir > > On 6/25/20 9:18 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> I'm doing another round of reduce-the-noise in the CI in preparation >> for the upcoming weekend... So I have another trivial review... >> >> Here's the bug for the failures: >> >> ???? JDK-8248265 compiler/ciReplay fail in tier6-comp-aot >> ???? https://bugs.openjdk.java.net/browse/JDK-8248265 >> >> and here's the bug for the ProblemListing: >> >> ???? JDK-8248335 ProblemList compiler/ciReplay/TestServerVM.java and >> TestVMNoCompLevel.java with AOT >> ???? https://bugs.openjdk.java.net/browse/JDK-8248335 >> >> Here's the context diff: >> >> $ hg diff >> diff -r c2e9eadd464c test/hotspot/jtreg/ProblemList-aot.txt >> --- a/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 08:36:59 >> 2020 -0700 >> +++ b/test/hotspot/jtreg/ProblemList-aot.txt??? Thu Jun 25 12:11:40 >> 2020 -0400 >> @@ -85,5 +85,8 @@ >> ??compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java >> 8167430 generic-all >> ??compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java >> 8167430 generic-all >> >> +compiler/ciReplay/TestServerVM.java????? 8248265 generic-all >> +compiler/ciReplay/TestVMNoCompLevel.java 8248265 generic-all >> + >> ??vmTestbase/vm/mlvm/indy/stress/java/relinkMutableCallSiteFreq/Test.java >> 8226689 generic-all >> ??vmTestbase/vm/mlvm/indy/stress/java/relinkVolatileCallSiteFreq/Test.java >> 8226689 generic-all >> >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> From boris.ulasevich at bell-sw.com Thu Jun 25 17:34:33 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 25 Jun 2020 20:34:33 +0300 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: <709f87e9-9c4f-b9ad-5246-90c9c92c5e6b@oracle.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> <709f87e9-9c4f-b9ad-5246-90c9c92c5e6b@oracle.com> Message-ID: <76f3f585-feaf-e606-a35a-8b7d85d2ec9f@bell-sw.com> Vladimir, thank you! I think I need one more review. Can I ask someone else to have a look? Thanks, Boris On 22.06.2020 18:48, Vladimir Kozlov wrote: > On 6/22/20 7:45 AM, Boris Ulasevich wrote: >> Hi Vladimir, >> >> ?> Would be nice to know if any Java benchmark is affected. >> >> With the change we have got 5% performance boost on lucene tokenizer >> method on ARM64. Same time on x86 there is no visible improvement on >> lucene tokenizer. > > Good. > > I ran our benchmarks (mostly jvm2008) on x86 and don't see any effects > too. > > Thanks, > Vladimir > >> >> thanks, >> Boris >> >> import org.apache.lucene.analysis.standard.StandardTokenizerImpl; >> import java.nio.file.Files; >> import java.io.*; >> >> class Test { >> ?? public static void main(String args[]) { >> ???? long count = 0; >> ???? try { >> ?????? byte[] content = Files.readAllBytes(new >> File("aarch64.ad").toPath()); >> ?????? for (int i=0; i < 1000; i++) { >> ???????? Reader reader = new InputStreamReader(new >> ByteArrayInputStream(content)); >> ???????? StandardTokenizerImpl sti = new StandardTokenizerImpl(reader); >> ???????? while (sti.getNextToken() != -1) { >> ?????????? count ++; >> ???????? } >> ?????? } >> ???? } catch (Exception ex) { System.out.println(ex); } >> ???? System.out.println(count); >> ?? } >> } >> >> >> On 19.06.2020 21:36, Vladimir Kozlov wrote: >>> Nice optimization. >>> >>> I don't think we should turn it off on any machine. In real >>> application you will not see such tight loops only with such branch. >>> On other hand reducing code size should help in all cases. >>> >>> Would be nice to know if any Java benchmark is affected. >>> >>> I will try to run our set of benchmarks with these changes. >>> >>> Regards, >>> Vladimir K >>> >>> On 6/19/20 10:07 AM, Andrew Haley wrote: >>>> Hi, >>>> >>>> On 19/06/2020 17:49, Boris Ulasevich wrote: >>>>> I added the expression canonicalization in the BoolNode::Ideal >>>>> method: >>>>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b >>>>> >>>>> The change reduces a number of generated machine instructions on all >>>>> ARM/x86/PPC architectures. Benchmark shows positive results on >>>>> ARM64 and >>>>> ARM32 with the given change. >>>>> >>>>> On x86 benchmark performance improves from +1% to +13% depending >>>>> on the >>>>> CPU generation, except of machines affected by Intel Erratum >>>>> (JDK-8234160) >>>>> issue. Maximum decrease observed is -%11. It does not look like a >>>>> problem >>>>> with the proposed benchmark though, but rather like an issue with >>>>> Erratum mitigation. >>>>> >>>>> On PowerPC result of the micro-benchmark is also positive. I >>>>> changed the >>>>> micro-benchmark to make it a little bulkier so that we don't hit the >>>>> limitations of architectures with a less elaborate branch prediction >>>>> mechanism. The original application performance does not change on >>>>> PowerPC. >>>> >>>> Fantastic work, thanks! You've done a remarkably thorough job. It's >>>> slightly unfortunate that one of the targets regresses. If there had >>>> been no regressions, I'd approve this straight away. >>>> >>>> Forwarding to hotspot-compiler-dev for more comments. >>>> >>>> VladimirK, what do you think? I guess we could turn this off on the >>>> machines affected by JDK-8234160. Should we? >>>> >> From vladimir.kozlov at oracle.com Thu Jun 25 22:47:07 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Jun 2020 15:47:07 -0700 Subject: [16] RFR(XS) 8248347: windows build broken by JDK-8243114 Message-ID: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> Resending to groups mailing lists since hotspot-dev is behaving strangely today. https://bugs.openjdk.java.net/browse/JDK-8248347 I used ULLONG_MAX instead of -1ULL [1]. I ran tier1. Tests still run on MacOS. Linux and Win are finished. Builds passed on all our supported x64 OSs: Linux, Win, MacOS. Tier1 runs MontgomeryMultiplyTest.java test - it passed on Win. Thanks, Vladimir [1] fix diff -r 315169f1f73a src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp --- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp +++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp @@ -3770,7 +3770,7 @@ julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator int i; - assert(inv * n[0] == -1ULL, "broken inverse in Montgomery multiply"); + assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery multiply"); for (i = 0; i < len; i++) { int j; @@ -3812,7 +3812,7 @@ julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator int i; - assert(inv * n[0] == -1ULL, "broken inverse in Montgomery square"); + assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery square"); for (i = 0; i < len; i++) { int j; From daniel.daugherty at oracle.com Thu Jun 25 22:54:15 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 18:54:15 -0400 Subject: [16] RFR(XS) 8248347: windows build broken by JDK-8243114 In-Reply-To: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> References: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> Message-ID: <3252dbaa-145f-0acc-5c23-46212332da14@oracle.com> Thumbs up! This is a trivial fix and does not need to wait 24 hours. Dan On 6/25/20 6:47 PM, Vladimir Kozlov wrote: > Resending to groups mailing lists since hotspot-dev is behaving > strangely today. > > https://bugs.openjdk.java.net/browse/JDK-8248347 > > I used ULLONG_MAX instead of -1ULL [1]. > > I ran tier1. Tests still run on MacOS. Linux and Win are finished. > Builds passed on all our supported x64 OSs: Linux, Win, MacOS. > Tier1 runs MontgomeryMultiplyTest.java test - it passed on Win. > > Thanks, > Vladimir > > [1] fix > > diff -r 315169f1f73a src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > --- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > +++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > @@ -3770,7 +3770,7 @@ > ?? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator > ?? int i; > > -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery multiply"); > +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery > multiply"); > > ?? for (i = 0; i < len; i++) { > ???? int j; > @@ -3812,7 +3812,7 @@ > ?? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator > ?? int i; > > -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery square"); > +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery > square"); > > ?? for (i = 0; i < len; i++) { > ???? int j; From david.holmes at oracle.com Thu Jun 25 22:56:40 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 26 Jun 2020 08:56:40 +1000 Subject: [16] RFR(XS) 8248347: windows build broken by JDK-8243114 In-Reply-To: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> References: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> Message-ID: Hi Vladimir, Looks good and trivial. Thanks, David On 26/06/2020 8:47 am, Vladimir Kozlov wrote: > Resending to groups mailing lists since hotspot-dev is behaving > strangely today. > > https://bugs.openjdk.java.net/browse/JDK-8248347 > > I used ULLONG_MAX instead of -1ULL [1]. > > I ran tier1. Tests still run on MacOS. Linux and Win are finished. > Builds passed on all our supported x64 OSs: Linux, Win, MacOS. > Tier1 runs MontgomeryMultiplyTest.java test - it passed on Win. > > Thanks, > Vladimir > > [1] fix > > diff -r 315169f1f73a src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > --- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > +++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > @@ -3770,7 +3770,7 @@ > ?? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator > ?? int i; > > -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery multiply"); > +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery > multiply"); > > ?? for (i = 0; i < len; i++) { > ???? int j; > @@ -3812,7 +3812,7 @@ > ?? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator > ?? int i; > > -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery square"); > +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery square"); > > ?? for (i = 0; i < len; i++) { > ???? int j; From vladimir.kozlov at oracle.com Thu Jun 25 23:02:12 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Jun 2020 16:02:12 -0700 Subject: [16] RFR(XS) 8248347: windows build broken by JDK-8243114 In-Reply-To: References: <4b005ed7-1878-0319-0e8a-091d9eafc68c@oracle.com> Message-ID: Thank you, Dan and David. Pushed. Vladimir K On 6/25/20 3:56 PM, David Holmes wrote: > Hi Vladimir, > > Looks good and trivial. > > Thanks, > David > > On 26/06/2020 8:47 am, Vladimir Kozlov wrote: >> Resending to groups mailing lists since hotspot-dev is behaving strangely today. >> >> https://bugs.openjdk.java.net/browse/JDK-8248347 >> >> I used ULLONG_MAX instead of -1ULL [1]. >> >> I ran tier1. Tests still run on MacOS. Linux and Win are finished. >> Builds passed on all our supported x64 OSs: Linux, Win, MacOS. >> Tier1 runs MontgomeryMultiplyTest.java test - it passed on Win. >> >> Thanks, >> Vladimir >> >> [1] fix >> >> diff -r 315169f1f73a src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >> --- a/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >> +++ b/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >> @@ -3770,7 +3770,7 @@ >> ??? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator >> ??? int i; >> >> -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery multiply"); >> +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery multiply"); >> >> ??? for (i = 0; i < len; i++) { >> ????? int j; >> @@ -3812,7 +3812,7 @@ >> ??? julong t0 = 0, t1 = 0, t2 = 0; // Triple-precision accumulator >> ??? int i; >> >> -? assert(inv * n[0] == -1ULL, "broken inverse in Montgomery square"); >> +? assert(inv * n[0] == ULLONG_MAX, "broken inverse in Montgomery square"); >> >> ??? for (i = 0; i < len; i++) { >> ????? int j; From xxinliu at amazon.com Fri Jun 26 02:15:15 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 26 Jun 2020 02:15:15 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic Message-ID: <59806033-5c7c-e7b6-b755-8b49ca99b1dc@amazon.com> hi, Reviewers, Could you review this patch? bug: https://bugs.openjdk.java.net/browse/JDK-8247732 webrev: http://cr.openjdk.java.net/~xliu/8247732/00/webrev/ The core logic is class ControlIntrinsicValidator in compilerDirectives.hpp It iterates the ccstrlist option and makes sure user-input intrinsic ids are all valid. It stops and take a record when it meets the first unrecognized intrinsic. I used constraints to validate the global options ControlIntrinsic and DisableIntrinsic. ControlIntrinsic/DisableIntrinsic in compiler directives are more complex. The matched directive is only parsed when hotspot attempts to compile the corresponding method. I validate at that time and JVM will crash if it doesnot meet guarantee() statement. I added Method::external_name_short() which returns a shorter method name in the form of "class::method". Probably hotspot has had similar code, but I failed to discover. please let me know and I will remove mine. Test: passed hotspot:tier1 and gtest:all Wrong cases and error messages can be found here: https://bugs.openjdk.java.net/browse/JDK-8247732?focusedCommentId=14349960&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14349960 thanks, --lx From adinn at redhat.com Fri Jun 26 08:25:29 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 26 Jun 2020 09:25:29 +0100 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: <76f3f585-feaf-e606-a35a-8b7d85d2ec9f@bell-sw.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> <709f87e9-9c4f-b9ad-5246-90c9c92c5e6b@oracle.com> <76f3f585-feaf-e606-a35a-8b7d85d2ec9f@bell-sw.com> Message-ID: <94b49d5e-ee73-64e6-289f-ccb90ef2a47b@redhat.com> Hi Boris, On 25/06/2020 18:34, Boris Ulasevich wrote: > Vladimir, thank you! > > I think I need one more review. Can I ask someone else to have a look? Yes, that looks good. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From boris.ulasevich at bell-sw.com Fri Jun 26 08:34:33 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 26 Jun 2020 11:34:33 +0300 Subject: RFR: C2: Canonicalize (x & 16 == 16) [Was: [aarch64-port-dev ] AARCH64 optimization: using TBZ instruction for bit check] In-Reply-To: <94b49d5e-ee73-64e6-289f-ccb90ef2a47b@redhat.com> References: <9b399c9f-cfc1-7fc4-70ee-536a83e5afa7@redhat.com> <7bea8a5c-312b-b4f3-d545-f83652b73150@bell-sw.com> <14d3f034-b637-f74a-7567-d4e589260887@redhat.com> <709f87e9-9c4f-b9ad-5246-90c9c92c5e6b@oracle.com> <76f3f585-feaf-e606-a35a-8b7d85d2ec9f@bell-sw.com> <94b49d5e-ee73-64e6-289f-ccb90ef2a47b@redhat.com> Message-ID: Hi Andrew, Thanks for the review. Boris On 26.06.2020 11:25, Andrew Dinn wrote: > Hi Boris, > > On 25/06/2020 18:34, Boris Ulasevich wrote: >> Vladimir, thank you! >> >> I think I need one more review. Can I ask someone else to have a look? > Yes, that looks good. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From tobias.hartmann at oracle.com Fri Jun 26 10:14:22 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 12:14:22 +0200 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248265 http://cr.openjdk.java.net/~thartmann/8248265/webrev.00/ With AOT compiled java.base, we call the test method EmptyMain::main through JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 As a result, EmptyMain::main is not called and therefore also not compiled. The test fails because -XX:CICrashAt=1 triggers no VM crash. Empty methods not being called is expected behavior. The test should use a non-empty method to trigger compilation. Thanks, Tobias From tobias.hartmann at oracle.com Fri Jun 26 11:40:45 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 13:40:45 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 Message-ID: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248234 http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely untested and obviously hasn't been used in many years, I propose to remove it. Thanks, Tobias From tobias.hartmann at oracle.com Fri Jun 26 12:10:52 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 14:10:52 +0200 Subject: [15] RFR(S): 8247832: [Graal] Many Javafuzzer tests failures with Graal, due to unexpected results, after last update JDK-8243380 Message-ID: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> Hi, please review the following patch that cherry-picks the Graal fix [1] for JDK 15 because the issue causes massive failure in Fuzzer testing: https://bugs.openjdk.java.net/browse/JDK-8247832 http://cr.openjdk.java.net/~thartmann/8247832/webrev.00/ Thanks, Tobias [1] https://github.com/oracle/graal/commit/3eec42fde4bbdc2c4c36f1dcd44098e968c4feac From rwestrel at redhat.com Fri Jun 26 12:21:04 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 26 Jun 2020 14:21:04 +0200 Subject: [15] RFR(S): 8247832: [Graal] Many Javafuzzer tests failures with Graal, due to unexpected results, after last update JDK-8243380 In-Reply-To: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> References: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> Message-ID: <87ftaiuifj.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8247832/webrev.00/ That looks good to me. Roland. From nils.eliasson at oracle.com Fri Jun 26 13:14:32 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 15:14:32 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility Message-ID: <7f8b9237-f31e-d14f-1b9a-e37fd50c22b1@oracle.com> Hi, This is a diagnostic utility that was requested by Claes to enable better profiling of the compilers. This patch introduces the diagnostic flag RepeatCompilation. RepeatCompilation hold he number of times the compilation gets repeated without having the code installed. RepeatCompilation = 0 is the default and means that only the regular compilation is done. RepeatCompilation = 100 means that an extra 100 compilations are done but without installing the code. I have tried keeping the change small and non-intrusive, contained to the CompilerBroker (except the boolean for disabling code install that is passed to the compilers). RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compilecommand: "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" and a compiler directive: "RepeatCompilation : 100". Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ Please review! Best regards, Nils Eliasson From claes.redestad at oracle.com Fri Jun 26 13:18:03 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 26 Jun 2020 15:18:03 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> Message-ID: <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> Looks good to me! /Claes On 2020-06-26 13:40, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248234 > http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ > > Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely > untested and obviously hasn't been used in many years, I propose to remove it. > > Thanks, > Tobias > From bob.vandette at oracle.com Fri Jun 26 13:52:54 2020 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 26 Jun 2020 09:52:54 -0400 Subject: RFR: 8248410 - Correct Fix for 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode Message-ID: The fix for "8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode" added an inner class which causes problems when generating GraalVM?s libjvmcicompiler.so library. This fix removes the inner class addition and matches the implementation that is in the GraalVMs labsjdk sources. Here?s the proposed fix: diff --git a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java --- a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java @@ -65,25 +65,21 @@ @Override public abstract int getIdentityHashCode(); - static class Fields { - // Initializing these too early causes a hang, so do it here in a subclass - static final HotSpotResolvedJavaField callSiteTargetField = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; - static final HotSpotResolvedJavaField constantCallSiteFrozenField = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; - } - private boolean isFullyInitializedConstantCallSite() { if (!runtime().getConstantCallSite().isInstance(this)) { return false; } // read ConstantCallSite.isFrozen as a volatile field - boolean isFrozen = readFieldValue(Fields.constantCallSiteFrozenField, true /* volatile */).asBoolean(); + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; + boolean isFrozen = readFieldValue(field, true /* volatile */).asBoolean(); // isFrozen true implies fully-initialized return isFrozen; } private HotSpotObjectConstantImpl readTarget() { // read CallSite.target as a volatile field - return (HotSpotObjectConstantImpl) readFieldValue(Fields.callSiteTargetField, true /* volatile */); + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; + return (HotSpotObjectConstantImpl) readFieldValue(field, true /* volatile */); } Bob. From patric.hedlin at oracle.com Fri Jun 26 14:15:16 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 26 Jun 2020 16:15:16 +0200 Subject: RFR(S): 8234605: C2 failed "assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 208 >> request = 101" Message-ID: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8234605 Webrev: http://cr.openjdk.java.net/~phedlin/tr8234605/ Turning assert into (universal) logging. This issue (now) manifest itself very seldom (typically on small estimates) and has thus served its purpose to help trim the node budget estimates to some reasonable level. Logging should be sufficient going forward. Testing: tier1-2 Best regards, Patric From nils.eliasson at oracle.com Fri Jun 26 14:25:14 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 16:25:14 +0200 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: References: Message-ID: <713c2fb1-623e-68f5-c204-25d68bf749bb@oracle.com> Hi Tobias, Looks good. Best regards, Nils Eliasson On 2020-06-26 12:14, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248265 > http://cr.openjdk.java.net/~thartmann/8248265/webrev.00/ > > With AOT compiled java.base, we call the test method EmptyMain::main through > JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are > not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 > > As a result, EmptyMain::main is not called and therefore also not compiled. The test fails because > -XX:CICrashAt=1 triggers no VM crash. Empty methods not being called is expected behavior. The test > should use a non-empty method to trigger compilation. > > Thanks, > Tobias From nils.eliasson at oracle.com Fri Jun 26 14:27:01 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 16:27:01 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> Message-ID: <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> +1 // Nils On 2020-06-26 15:18, Claes Redestad wrote: > Looks good to me! > > /Claes > > On 2020-06-26 13:40, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248234 >> http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ >> >> Turning off UseExactTypes triggers all kinds of asserts in C2. Since >> the flag is also completely >> untested and obviously hasn't been used in many years, I propose to >> remove it. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Fri Jun 26 14:28:21 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 16:28:21 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> Message-ID: Thanks Claes! Best regards, Tobias On 26.06.20 15:18, Claes Redestad wrote: > Looks good to me! > > /Claes > > On 2020-06-26 13:40, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248234 >> http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ >> >> Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely >> untested and obviously hasn't been used in many years, I propose to remove it. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Fri Jun 26 14:28:38 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 16:28:38 +0200 Subject: [15] RFR(S): 8247832: [Graal] Many Javafuzzer tests failures with Graal, due to unexpected results, after last update JDK-8243380 In-Reply-To: <87ftaiuifj.fsf@redhat.com> References: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> <87ftaiuifj.fsf@redhat.com> Message-ID: Thanks Roland! Best regards, Tobias On 26.06.20 14:21, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8247832/webrev.00/ > > That looks good to me. > > Roland. > From christian.hagedorn at oracle.com Fri Jun 26 14:38:23 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 26 Jun 2020 16:38:23 +0200 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8244724 http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ The testcase contains many string concatinations. These are compiled by javac with -XDstringConcat=inline which creates a lot of StringBuilder objects and calls. As a result, we get a huge graph and eventually hit the live node limit assert during code generation when trying to create new nodes - either during PhaseCFG::build_cfg() or later in PhaseCFG::global_code_motion(). We could try to introduce estimates for them to bailout but that appears to be difficult to get right without being too pessimistic about it. But we need to be in order to avoid hitting the assert again by just modifying the testcase. Therefore, my suggestion is to completely skip the assert once the optimization phase is finished as we should not strictly care about the node limit anymore at this point in time and it does not really provide much help for finding bugs. A question remains, though, if we should also get rid of the remaining live node limit bailout checks in Compile::Code_Gen() like [1] as it appears to be a waste to go through all the optimization in the optimization phase to then bailout while generating code based only on the live node limit itself. What do you think about that? I also updated "<" into "<=" in the live node limit assert because we should be allowed to reach the limit but not go beyond. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 From tobias.hartmann at oracle.com Fri Jun 26 14:40:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Jun 2020 16:40:14 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> Message-ID: Thanks Nils! Best regards, Tobias On 26.06.20 16:27, Nils Eliasson wrote: > +1 > > // Nils > > On 2020-06-26 15:18, Claes Redestad wrote: >> Looks good to me! >> >> /Claes >> >> On 2020-06-26 13:40, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8248234 >>> http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ >>> >>> Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely >>> untested and obviously hasn't been used in many years, I propose to remove it. >>> >>> Thanks, >>> Tobias >>> > From nils.eliasson at oracle.com Fri Jun 26 14:48:56 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 16:48:56 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility Message-ID: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> Hi, This is a diagnostic utility that was requested by Claes to enable better profiling of the compilers. This patch introduces the diagnostic flag RepeatCompilation. RepeatCompilation hold he number of times the compilation gets repeated without having the code installed. RepeatCompilation = 0 is the default and means that only the regular compilation is done. RepeatCompilation = 100 means that an extra 100 compilations are done but without installing the code. I have tried keeping the change small and non-intrusive, contained to the CompilerBroker (except the boolean for disabling code install that is passed to the compilers). RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compile command: "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" and a compiler directive: "RepeatCompilation : 100". Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ Please review! Best regards, Nils Eliasson From nils.eliasson at oracle.com Fri Jun 26 14:53:51 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 16:53:51 +0200 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: References: Message-ID: <12d1d1ef-ed48-4faa-ac40-97a370603dd7@oracle.com> Hi Christian, Looks good. And to answer your question - yes - remove the assert in coalesce.cpp. Best regards, Nils Eliasson On 2020-06-26 16:38, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8244724 > http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ > > The testcase contains many string concatinations. These are compiled > by javac with -XDstringConcat=inline which creates a lot of > StringBuilder objects and calls. As a result, we get a huge graph and > eventually hit the live node limit assert during code generation when > trying to create new nodes - either during PhaseCFG::build_cfg() or > later in PhaseCFG::global_code_motion(). > > We could try to introduce estimates for them to bailout but that > appears to be difficult to get right without being too pessimistic > about it. But we need to be in order to avoid hitting the assert again > by just modifying the testcase. > > Therefore, my suggestion is to completely skip the assert once the > optimization phase is finished as we should not strictly care about > the node limit anymore at this point in time and it does not really > provide much help for finding bugs. > > A question remains, though, if we should also get rid of the remaining > live node limit bailout checks in Compile::Code_Gen() like [1] as it > appears to be a waste to go through all the optimization in the > optimization phase to then bailout while generating code based only on > the live node limit itself. What do you think about that? > > I also updated "<" into "<=" in the live node limit assert because we > should be allowed to reach the limit but not go beyond. > > Thank you! > > Best regards, > Christian > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 From nils.eliasson at oracle.com Fri Jun 26 14:55:58 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 26 Jun 2020 16:55:58 +0200 Subject: RFR(S): 8234605: C2 failed "assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 208 >> request = 101" In-Reply-To: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> References: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> Message-ID: Hi Patric, Looks good. Best regards, Nils On 2020-06-26 16:15, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8234605 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8234605/ > > > Turning assert into (universal) logging. This issue (now) manifest > itself very seldom (typically on small estimates) and has thus served > its purpose to help trim the node budget estimates to some reasonable > level. Logging should be sufficient going forward. > > > Testing: tier1-2 > > > Best regards, > Patric From boris.ulasevich at bell-sw.com Fri Jun 26 15:05:37 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 26 Jun 2020 18:05:37 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions Message-ID: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> Hi all, Please review the change to eliminate the unnecessary i2l conversion for expressions like this: "if (intValue == 1L)". http://bugs.openjdk.java.net/browse/JDK-8248043 http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 The provided benchmark shows performance boost on all platforms: - Intel Xeon: 32.705 --> 14.234 ns/op - arm64: 42.060 --> 25.456 ns/op - arm32: 618.763 --> 314.040 ns/op - ppc8:? 81.218 --> 63.026 ns/op Testing done: jtreg, jck. thanks, Boris From claes.redestad at oracle.com Fri Jun 26 15:07:24 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 26 Jun 2020 17:07:24 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> Message-ID: <14a6a1cd-bef1-918b-47c9-7b1132bd6e67@oracle.com> Hi Nils, thanks for doing this! This works perfectly - especially the compiler control directive that allowing us to isolate specific JIT compilations to profile and benchmark them in a controlled manner. I've run a number of startup tests to ensure there's no detectable impact on normal runs. /Claes On 2020-06-26 16:48, Nils Eliasson wrote: > Hi, > > This is a diagnostic utility that was requested by Claes to enable > better profiling of the compilers. > > This patch introduces the diagnostic flag RepeatCompilation. > > RepeatCompilation hold he number of times the compilation gets repeated > without having the code installed. RepeatCompilation = 0 is the default > and means that only the regular compilation is done. RepeatCompilation = > 100 means that an extra 100 compilations are done but without installing > the code. > > I have tried keeping the change small and non-intrusive, contained to > the CompilerBroker (except the boolean for disabling code install that > is passed to the compilers). > > RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compile > command: "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" > and a compiler directive: "RepeatCompilation : 100". > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 > Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ > > Please review! > > Best regards, > Nils Eliasson > From igor.ignatyev at oracle.com Fri Jun 26 17:38:44 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 26 Jun 2020 10:38:44 -0700 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: <713c2fb1-623e-68f5-c204-25d68bf749bb@oracle.com> References: <713c2fb1-623e-68f5-c204-25d68bf749bb@oracle.com> Message-ID: <1CE705BB-F2AD-4A98-8128-304215623524@oracle.com> +1 -- Igor > On Jun 26, 2020, at 7:25 AM, Nils Eliasson wrote: > > Hi Tobias, > > Looks good. > > Best regards, > Nils Eliasson > > On 2020-06-26 12:14, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248265 >> http://cr.openjdk.java.net/~thartmann/8248265/webrev.00/ >> >> With AOT compiled java.base, we call the test method EmptyMain::main through >> JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are >> not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 >> >> As a result, EmptyMain::main is not called and therefore also not compiled. The test fails because >> -XX:CICrashAt=1 triggers no VM crash. Empty methods not being called is expected behavior. The test >> should use a non-empty method to trigger compilation. >> >> Thanks, >> Tobias > From joserz at linux.ibm.com Fri Jun 26 18:26:44 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Fri, 26 Jun 2020 15:26:44 -0300 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Message-ID: <20200626182644.GA262544@pacoca> Hello team! This patch introduces Power10 to OpenJDK and implements three new instructions: - brh - byte-reverse halfword - brw - byte-reverse word - brd - byte-reverse doubleword Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 Thanks for your review! Jose R. Ziviani From claes.redestad at oracle.com Fri Jun 26 18:31:05 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 26 Jun 2020 20:31:05 +0200 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> Message-ID: <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> Hi Boris, this looks like a nice improvement! I just have some comments about the micro. I was curious whether the optimization works when the constant is on the LHS and added a variant of the micro to try that[1]. Results are interesting (Intel Xeon): Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op With your patch: Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op Seems like the optimization is mostly effective, but not getting all the way. I wouldn't worry about it for this RFE, but perhaps something to investigate in a follow-up. Feel free to include such a variant in your patch though (no attribution necessary). The micro also stabilizes very quickly, so you might want to provide some default tuning to keep runtime in check, e.g., something like: @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) @Fork(3) Thanks! /Claes [1] @Benchmark public int skipCastTestLeft() { for (int i = 0; i < ARRAYSIZE_L; i++) { if (ARRAYSIZE_L == intValues[i]) { return i; } } return 0; } On 2020-06-26 17:05, Boris Ulasevich wrote: > Hi all, > > Please review the change to eliminate the unnecessary i2l conversion > for expressions like this: "if (intValue == 1L)". > > http://bugs.openjdk.java.net/browse/JDK-8248043 > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 > > The provided benchmark shows performance boost on all platforms: > - Intel Xeon: 32.705 --> 14.234 ns/op > - arm64: 42.060 --> 25.456 ns/op > - arm32: 618.763 --> 314.040 ns/op > - ppc8:? 81.218 --> 63.026 ns/op > > Testing done: jtreg, jck. > > thanks, > Boris From vladimir.kozlov at oracle.com Fri Jun 26 18:37:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 11:37:34 -0700 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: References: Message-ID: On 6/26/20 3:14 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248265 > http://cr.openjdk.java.net/~thartmann/8248265/webrev.00/ I prefer an assign to a static variable. Printing is very complex code. > > With AOT compiled java.base, we call the test method EmptyMain::main through > JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are > not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 These changes are done for JDK-8247992. JDK-8247832 is not related and not fixed yet. > > As a result, EmptyMain::main is not called and therefore also not compiled. The test fails because > -XX:CICrashAt=1 triggers no VM crash. Empty methods not being called is expected behavior. The test > should use a non-empty method to trigger compilation. Yes. Thanks, Vladimir > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Fri Jun 26 18:56:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 11:56:22 -0700 Subject: [15] RFR(S): 8247832: [Graal] Many Javafuzzer tests failures with Graal, due to unexpected results, after last update JDK-8243380 In-Reply-To: <87ftaiuifj.fsf@redhat.com> References: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> <87ftaiuifj.fsf@redhat.com> Message-ID: Hi Tobias, You need to remove '"Classpath" exception' statement from header in new file and add 2 empty lines after header. We do that when updating Graal in JDK. Example: http://hg.openjdk.java.net/jdk/jdk/file/e92a076bc6a5/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.core.test/src/org/graalvm/compiler/core/test/ZeroSignExtendTest.java Thanks, Vladimir On 6/26/20 5:21 AM, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8247832/webrev.00/ > > That looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Fri Jun 26 19:11:04 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 12:11:04 -0700 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> Message-ID: <578f499d-d71b-3446-983f-e05a435a558e@oracle.com> +2 ;) Vladimir K On 6/26/20 7:27 AM, Nils Eliasson wrote: > +1 > > // Nils > > On 2020-06-26 15:18, Claes Redestad wrote: >> Looks good to me! >> >> /Claes >> >> On 2020-06-26 13:40, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8248234 >>> http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ >>> >>> Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely >>> untested and obviously hasn't been used in many years, I propose to remove it. >>> >>> Thanks, >>> Tobias >>> > From igor.veresov at oracle.com Fri Jun 26 20:31:43 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 26 Jun 2020 13:31:43 -0700 Subject: [15] RFR(S) 8248168: [Graal] jck tests timeout in Graal with -Xcomp mode Message-ID: <0B80B9A3-1B3C-43D2-9A04-87B6DFFCB7D1@oracle.com> This re-enables deadlock avoidance logic for JVMCI and Xcomp. JBS: https://bugs.openjdk.java.net/browse/JDK-8248168 Webrev: http://cr.openjdk.java.net/~iveresov/8248168/webrev.00/ Thanks, igor From vladimir.kozlov at oracle.com Fri Jun 26 21:52:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 14:52:34 -0700 Subject: [15] RFR(S) 8248168: [Graal] jck tests timeout in Graal with -Xcomp mode In-Reply-To: <0B80B9A3-1B3C-43D2-9A04-87B6DFFCB7D1@oracle.com> References: <0B80B9A3-1B3C-43D2-9A04-87B6DFFCB7D1@oracle.com> Message-ID: <66e93b99-6617-c4d2-b23b-d5c159ed8c0e@oracle.com> Good. Thanks, Vladimir On 6/26/20 1:31 PM, Igor Veresov wrote: > This re-enables deadlock avoidance logic for JVMCI and Xcomp. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8248168 > Webrev: http://cr.openjdk.java.net/~iveresov/8248168/webrev.00/ > > Thanks, > igor > > > From igor.veresov at oracle.com Fri Jun 26 22:06:36 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 26 Jun 2020 15:06:36 -0700 Subject: [15] RFR(S) 8248168: [Graal] jck tests timeout in Graal with -Xcomp mode In-Reply-To: <66e93b99-6617-c4d2-b23b-d5c159ed8c0e@oracle.com> References: <0B80B9A3-1B3C-43D2-9A04-87B6DFFCB7D1@oracle.com> <66e93b99-6617-c4d2-b23b-d5c159ed8c0e@oracle.com> Message-ID: <776A1C88-BC54-41F3-9C3D-652744ED9BB2@oracle.com> Thanks Vladimir! igor > On Jun 26, 2020, at 2:52 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 6/26/20 1:31 PM, Igor Veresov wrote: >> This re-enables deadlock avoidance logic for JVMCI and Xcomp. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8248168 >> Webrev: http://cr.openjdk.java.net/~iveresov/8248168/webrev.00/ >> Thanks, >> igor From vladimir.kozlov at oracle.com Fri Jun 26 22:35:15 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 15:35:15 -0700 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: References: Message-ID: You don't need to use 'C->' in Compile::Optimize() method: DEBUG_ONLY(C->set_phase_optimize_finished();) Yes, remove all nodes limit checks after optimization phase. Thanks, Vladimir On 6/26/20 7:38 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8244724 > http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ > > The testcase contains many string concatinations. These are compiled by javac with -XDstringConcat=inline which creates > a lot of StringBuilder objects and calls. As a result, we get a huge graph and eventually hit the live node limit assert > during code generation when trying to create new nodes - either during PhaseCFG::build_cfg() or later in > PhaseCFG::global_code_motion(). > > We could try to introduce estimates for them to bailout but that appears to be difficult to get right without being too > pessimistic about it. But we need to be in order to avoid hitting the assert again by just modifying the testcase. > > Therefore, my suggestion is to completely skip the assert once the optimization phase is finished as we should not > strictly care about the node limit anymore at this point in time and it does not really provide much help for finding bugs. > > A question remains, though, if we should also get rid of the remaining live node limit bailout checks in > Compile::Code_Gen() like [1] as it appears to be a waste to go through all the optimization in the optimization phase to > then bailout while generating code based only on the live node limit itself. What do you think about that? > > I also updated "<" into "<=" in the live node limit assert because we should be allowed to reach the limit but not go > beyond. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 From vladimir.kozlov at oracle.com Fri Jun 26 22:50:47 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 15:50:47 -0700 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> Message-ID: <8cb83e68-42ef-c66f-4725-d4ac37ca2503@oracle.com> Very good. Thanks, Vladimir On 6/26/20 7:48 AM, Nils Eliasson wrote: > Hi, > > This is a diagnostic utility that was requested by Claes to enable better profiling of the compilers. > > This patch introduces the diagnostic flag RepeatCompilation. > > RepeatCompilation hold he number of times the compilation gets repeated without having the code installed. > RepeatCompilation = 0 is the default and means that only the regular compilation is done. RepeatCompilation = 100 means > that an extra 100 compilations are done but without installing the code. > > I have tried keeping the change small and non-intrusive, contained to the CompilerBroker (except the boolean for > disabling code install that is passed to the compilers). > > RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compile command: > "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" > and a compiler directive: "RepeatCompilation : 100". > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 > Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ > > Please review! > > Best regards, > Nils Eliasson > From vladimir.kozlov at oracle.com Fri Jun 26 22:59:41 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Jun 2020 15:59:41 -0700 Subject: RFR(S): 8234605: C2 failed "assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 208 >> request = 101" In-Reply-To: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> References: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> Message-ID: Hi Patric, You can move #ifdef ASSERT to put if(check_estimate) under it. Thanks, Vladimir On 6/26/20 7:15 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8234605 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8234605/ > > > Turning assert into (universal) logging. This issue (now) manifest itself very seldom (typically on small estimates) and > has thus served its purpose to help trim the node budget estimates to some reasonable level. Logging should be > sufficient going forward. > > > Testing: tier1-2 > > > Best regards, > Patric From martin.doerr at sap.com Sat Jun 27 09:29:32 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Sat, 27 Jun 2020 09:29:32 +0000 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200626182644.GA262544@pacoca> References: <20200626182644.GA262544@pacoca> Message-ID: <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> Hi Jose, Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? Please update the Copyright year in vm_version_poc.hpp. I can?t test the change, but it looks good to me. Best regards, Martin > Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" : > > ?Hello team! > > This patch introduces Power10 to OpenJDK and implements three new instructions: > - brh - byte-reverse halfword > - brw - byte-reverse word > - brd - byte-reverse doubleword > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thanks for your review! > > Jose R. Ziviani From sergei.tsypanov at yandex.ru Sat Jun 27 19:17:59 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Sat, 27 Jun 2020 21:17:59 +0200 Subject: Fwd: Scalar replacement issue in JDK 14.0.1 In-Reply-To: <426551593172556@mail.yandex.ru> Message-ID: <620021593285416@mail.yandex.ru> Hello, while looking into an issue I've found out that scalar replacement is not working in trivial case on JDK 14.0.1. This benchmark illustrates the issue: @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class StringCompositeKeyBenchmark { ??@Benchmark ??public Object compositeKey(Data data) { ????return data.keyObjectMap.get(new Key(data.code, data.locale)); ??} ??@State(Scope.Thread) ??public static class Data { ????private final String code = "code1"; ????private final Locale locale = Locale.getDefault(); ????private final HashMap keyObjectMap = new HashMap<>(); ????@Setup ????public void setUp() { ??????keyObjectMap.put(new Key(code, locale), new Object()); ????} ??} ??private static final class Key { ????private final String code; ????private final Locale locale; ????private Key(String code, Locale locale) { ??????this.code = code; ??????this.locale = locale; ????} ????@Override ????public boolean equals(Object o) { ??????if (this == o) return true; ??????if (o == null || getClass() != o.getClass()) return false; ??????Key key = (Key) o; ??????if (!code.equals(key.code)) return false; ??????return locale.equals(key.locale); ????} ????@Override ????public int hashCode() { ??????return 31 * code.hashCode() + locale.hashCode(); ????} ??} } When I run this on JDK 11 (JDK 11.0.7, OpenJDK 64-Bit Server VM, 11.0.7+10-post-Ubuntu-2ubuntu218.04) I get this output: Benchmark Mode Cnt Score Error Units StringCompositeKeyBenchmark.compositeKey avgt 10 5.510 ? 0.121 ns/op StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate avgt 10 ? 10?? MB/sec StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate.norm avgt 10 ? 10?? B/op StringCompositeKeyBenchmark.compositeKey:?gc.count avgt 10 ? 0 counts As I understand Java runtime erases object allocation here and we don't use additional memory. Same run on JDK 14 (JDK 14.0.1, Java HotSpot(TM) 64-Bit Server VM, 14.0.1+7) demonstrate object allocation per each method call: Benchmark Mode Cnt Score Error Units StringCompositeKeyBenchmark.compositeKey avgt 10 7.958 ? 1.360 ns/op StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate avgt 10 1937.551 ? 320.718 MB/sec StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate.norm avgt 10 24.001 ? 0.001 B/op StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Eden_Space avgt 10 1879.111 ? 596.770 MB/sec StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Eden_Space.norm avgt 10 23.244 ? 5.509 B/op StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Survivor_Space avgt 10 0.267 ? 0.750 MB/sec StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Survivor_Space.norm avgt 10 0.003 ? 0.009 B/op StringCompositeKeyBenchmark.compositeKey:?gc.count avgt 10 23.000 counts StringCompositeKeyBenchmark.compositeKey:?gc.time avgt 10 44.000 ms At the same time in more trivial scenario like @Benchmark public int compositeKey(Data data) { ??return new Key(data.code, data.locale).hashCode(); } scalar replacement again eliminates allocation of object. So I'm curious whether this is normal behaviour or a bug? Regards, Sergey Tsypanov From tobias.hartmann at oracle.com Mon Jun 29 05:54:50 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Jun 2020 07:54:50 +0200 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: <1CE705BB-F2AD-4A98-8128-304215623524@oracle.com> References: <713c2fb1-623e-68f5-c204-25d68bf749bb@oracle.com> <1CE705BB-F2AD-4A98-8128-304215623524@oracle.com> Message-ID: Nils, Igor, thanks for the reviews! Best regards, Tobias On 26.06.20 19:38, Igor Ignatyev wrote: > +1 > > -- Igor > >> On Jun 26, 2020, at 7:25 AM, Nils Eliasson wrote: >> >> Hi Tobias, >> >> Looks good. >> >> Best regards, >> Nils Eliasson >> >> On 2020-06-26 12:14, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8248265 >>> http://cr.openjdk.java.net/~thartmann/8248265/webrev.00/ >>> >>> With AOT compiled java.base, we call the test method EmptyMain::main through >>> JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are >>> not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 >>> >>> As a result, EmptyMain::main is not called and therefore also not compiled. The test fails because >>> -XX:CICrashAt=1 triggers no VM crash. Empty methods not being called is expected behavior. The test >>> should use a non-empty method to trigger compilation. >>> >>> Thanks, >>> Tobias >> > From tobias.hartmann at oracle.com Mon Jun 29 06:06:37 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Jun 2020 08:06:37 +0200 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: References: Message-ID: <0e54fbd0-d2d5-c831-f4e8-950e608e0379@oracle.com> Hi Vladimir, Thanks for the review! On 26.06.20 20:37, Vladimir Kozlov wrote: > I prefer an assign to a static variable. Printing is very complex code. Okay, my intention for the printing was that it would help to diagnose similar issues (method not called) but that should never happen with a non-empty method anyway. Here's a new version: http://cr.openjdk.java.net/~thartmann/8248265/webrev.01/ >> With AOT compiled java.base, we call the test method EmptyMain::main through >> JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are >> not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 > > These changes are done for JDK-8247992. JDK-8247832 is not related and not fixed yet. Right, I've linked the right changeset but mentioned the wrong bug (copy-paste error from the clipboard). Thanks, Tobias From tobias.hartmann at oracle.com Mon Jun 29 06:08:39 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Jun 2020 08:08:39 +0200 Subject: [16] RFR(S): 8248234: Disabling UseExactTypes crashes C2 In-Reply-To: <578f499d-d71b-3446-983f-e05a435a558e@oracle.com> References: <52c6dc64-14ab-1553-975a-a170aa16d577@oracle.com> <4e9caefb-10f7-3a74-64cc-4fef8f773200@oracle.com> <31152d54-7266-24d7-629d-dab4915e570e@oracle.com> <578f499d-d71b-3446-983f-e05a435a558e@oracle.com> Message-ID: <2eace117-0170-6621-777d-ef03944f06e2@oracle.com> Thanks Vladimir! :) Best regards, Tobias On 26.06.20 21:11, Vladimir Kozlov wrote: > +2 ;) > > Vladimir K > > On 6/26/20 7:27 AM, Nils Eliasson wrote: >> +1 >> >> // Nils >> >> On 2020-06-26 15:18, Claes Redestad wrote: >>> Looks good to me! >>> >>> /Claes >>> >>> On 2020-06-26 13:40, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8248234 >>>> http://cr.openjdk.java.net/~thartmann/8248234/webrev.00/ >>>> >>>> Turning off UseExactTypes triggers all kinds of asserts in C2. Since the flag is also completely >>>> untested and obviously hasn't been used in many years, I propose to remove it. >>>> >>>> Thanks, >>>> Tobias >>>> >> From tobias.hartmann at oracle.com Mon Jun 29 06:16:30 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Jun 2020 08:16:30 +0200 Subject: [15] RFR(S): 8247832: [Graal] Many Javafuzzer tests failures with Graal, due to unexpected results, after last update JDK-8243380 In-Reply-To: References: <81203e4a-7ec0-0608-88db-1b44a1fcb69b@oracle.com> <87ftaiuifj.fsf@redhat.com> Message-ID: <59ee26e2-1d5f-9063-ed1e-79c69bddcc02@oracle.com> Hi Vladimir, Thanks for the review! On 26.06.20 20:56, Vladimir Kozlov wrote: > You need to remove '"Classpath" exception' statement from header in new file and add 2 empty lines > after header. Okay, will push the following version: http://cr.openjdk.java.net/~thartmann/8247832/webrev.01/ Best regards, Tobias From tobias.hartmann at oracle.com Mon Jun 29 06:33:30 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Jun 2020 08:33:30 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> Message-ID: <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> Hi Nils, Looks good to me! In globals.hpp:543 there is an excess whitespace before "\". Best regards, Tobias On 26.06.20 16:48, Nils Eliasson wrote: > Hi, > > This is a diagnostic utility that was requested by Claes to enable better profiling of the compilers. > > This patch introduces the diagnostic flag RepeatCompilation. > > RepeatCompilation hold he number of times the compilation gets repeated without having the code > installed. RepeatCompilation = 0 is the default and means that only the regular compilation is done. > RepeatCompilation = 100 means that an extra 100 compilations are done but without installing the code. > > I have tried keeping the change small and non-intrusive, contained to the CompilerBroker (except the > boolean for disabling code install that is passed to the compilers). > > RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compile command: > "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" > and a compiler directive: "RepeatCompilation : 100". > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 > Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ > > Please review! > > Best regards, > Nils Eliasson > From Yang.Zhang at arm.com Mon Jun 29 07:48:30 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Mon, 29 Jun 2020 07:48:30 +0000 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> Message-ID: Hi Andrew, 1. Instructions that can be matched with NEON instructions directly. MulVB, SqrtVF and AbsV have been merged into jdk master already. 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. Hi Sandhya, Could you please help to manual merge panama vectorIntrinsics/vector-unstable to jdk master? So that I can update this patch based on latest jdk master. Regards Yang -----Original Message----- From: Viswanathan, Sandhya Sent: Thursday, June 25, 2020 3:04 AM To: Yang Zhang ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Andrew/Yang, We couldn?t propose Vector API to target in time for JDK 15 and hoping to do so early in JDK 16 timeframe. The implementation reviews on other components have made good progress. We have so far ok to PPT from (runtime, shared compiler changes, x86 backend). Java API implementation review is in progress. I wanted to check with you both if we have a go ahead from aarch64 backed point of view. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Yang Zhang Sent: Tuesday, May 26, 2020 7:59 PM To: Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes > But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? The new instructions can be classified as: 1. Instructions that can be matched with NEON instructions directly. MulVB and SqrtVF have been merged into jdk master already. The patch of AbsV is in review [1]. 2. Instructions that Jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. May I have a new patch for these? 3. Panama/Vector API specific instructions Such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. Regards Yang [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html -----Original Message----- From: Andrew Haley Sent: Tuesday, May 26, 2020 4:25 PM To: Yang Zhang ; Paul Sandoz Cc: hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; nd Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes On 25/05/2020 09:26, Yang Zhang wrote: > In jdk master, what we need to do is that writing m4 file for existing > vector instructions and placed them to a new file aarch64_neon.ad. > If no question, I will do it right away. I'm not entirely sure that such a change is necessary now. In particular, reorganizing the existing vector instructions is IMO excessive, but I admit that it might be an improvement. But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? It'd help if this was possible. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Mon Jun 29 08:49:46 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 29 Jun 2020 10:49:46 +0200 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: References: Message-ID: Thank you Nils and Vladimir for your reviews! On 27.06.20 00:35, Vladimir Kozlov wrote: > You don't need to use 'C->' in Compile::Optimize() method: > > DEBUG_ONLY(C->set_phase_optimize_finished();) Oh, right! > Yes, remove all nodes limit checks after optimization phase. I created a new webrev with all those Compile::check_node_count() calls removed and the change above: http://cr.openjdk.java.net/~chagedorn/8244724/webrev.02/ However, since these calls also affect bailout decisions in the product version and this bug is targeted for 15, I suggest to remove these calls/bailouts for 16 only in a separate RFE to minimize the risk. Best regards, Christian > Thanks, > Vladimir > > On 6/26/20 7:38 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8244724 >> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ >> >> The testcase contains many string concatinations. These are compiled >> by javac with -XDstringConcat=inline which creates a lot of >> StringBuilder objects and calls. As a result, we get a huge graph and >> eventually hit the live node limit assert during code generation when >> trying to create new nodes - either during PhaseCFG::build_cfg() or >> later in PhaseCFG::global_code_motion(). >> >> We could try to introduce estimates for them to bailout but that >> appears to be difficult to get right without being too pessimistic >> about it. But we need to be in order to avoid hitting the assert again >> by just modifying the testcase. >> >> Therefore, my suggestion is to completely skip the assert once the >> optimization phase is finished as we should not strictly care about >> the node limit anymore at this point in time and it does not really >> provide much help for finding bugs. >> >> A question remains, though, if we should also get rid of the remaining >> live node limit bailout checks in Compile::Code_Gen() like [1] as it >> appears to be a waste to go through all the optimization in the >> optimization phase to then bailout while generating code based only on >> the live node limit itself. What do you think about that? >> >> I also updated "<" into "<=" in the live node limit assert because we >> should be allowed to reach the limit but not go beyond. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 >> From doug.simon at oracle.com Mon Jun 29 09:36:51 2020 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 29 Jun 2020 11:36:51 +0200 Subject: RFR(M): 8248359: Update JVMCI Message-ID: <322BCF85-A5DA-4463-9250-A6744FABC9AF@oracle.com> Please review this webrev that ports a number of improvements to JVMCI in JDK 16 from jvmci-8 : * Move C++ state and functionality associated with a HotSpotJVMCIRuntime Java object into its peer JVMCIRuntime C++ object (e.g., the _shared_library_javavm moves from being a static field in JVMCIEnv to an instance field in JVMCIRuntime). * The management of JNI globals handles and Metadata handles passed to JVMCI Java code should also be moved to JVMCIRuntime. * Introduce tracing of low frequency JVMCIRuntime lifetime events (e.g. JVMCIRuntime lifetime phase events) at less verbose trace level. * Trace high frequency JVMCI events (e.g. CompilerToVM calls) at more verbose trace level. * Detect unsupported jvmci.* system properties and use fuzzy matching for an error message suggesting closely matching supported properties. * Improve javadoc for HotSpotJVMCIRuntime.attachCurrentThread. * Reduce calls to JavaThread::current() in conjunction with JNIAccessMark. https://bugs.openjdk.java.net/browse/JDK-8248359 https://dougxc.github.io/webrevs/8248359_16.01 Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From felix.yang at huawei.com Mon Jun 29 13:41:22 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 29 Jun 2020 13:41:22 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> Message-ID: Hi, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Thursday, June 25, 2020 9:08 PM > To: Yangfei (Felix) ; Roland Westrelin > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 24.06.20 05:24, Yangfei (Felix) wrote: > > I updated accordingly, new webrev: > http://cr.openjdk.java.net/~fyang/8243670/webrev.02 > > Looks good to me but isn't the MergeMemNode::Ideal transformation > useless now? I added some extra code to check if we still have a chance to replace equivalent phis in MergeMemNode::Ideal. diff -r 6ad5fd9a52df src/hotspot/share/opto/memnode.cpp --- a/src/hotspot/share/opto/memnode.cpp Mon Jun 29 12:28:55 2020 +0000 +++ b/src/hotspot/share/opto/memnode.cpp Mon Jun 29 21:32:27 2020 +0800 @@ -4615,6 +4615,12 @@ } if (phi_mem != NULL) { // equivalent phi nodes; revert to the def + + tty->print("====>\n"); + this->dump(); + new_mem->dump(); + new_base->dump(); + new_mem = new_base; } } Looks like we missed doing PhiNode::Identity for some phis from the dump of doing a slowdebug build. I am still looking into it. Thoughts? Part of the dump looks like: ====> 511 MergeMem === _ 1 1602 1599 1 1 1 1 1 1 1 1 95 [[ 1258 ]] { N1599:rawptr:BotPTR - - - - - - - - N95:java/util/HashMap$Node+24 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: HashMap::putVal @ bci:91 1599 Phi === 1589 95 95 [[ 1287 1277 1301 1269 1309 1299 1312 511 ]] #memory Memory: @rawptr:BotPTR, idx=Raw; !orig=[1223],[943] !jvms: HashMap::putVal @ bci:209 1602 Phi === 1589 95 95 [[ 1315 1278 1350 1389 1395 1247 1401 511 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !orig=[943] !jvms: HashMap::putVal @ bci:209 ====> 502 MergeMem === _ 1 1733 1730 1 1 1 1 1 1 1 1 95 [[ 1322 ]] { N1730:rawptr:BotPTR - - - - - - - - N95:java/util/HashMap$Node+24 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: HashMap::putVal @ bci:81 1730 Phi === 1720 95 95 [[ 1351 1341 1365 1333 1373 1363 1376 502 ]] #memory Memory: @rawptr:BotPTR, idx=Raw; !orig=[1287],[974] !jvms: HashMap::putVal @ bci:209 1733 Phi === 1720 95 95 [[ 1379 1342 1414 1453 1459 1311 1465 502 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !orig=[974] !jvms: HashMap::putVal @ bci:209 ====> 7843 MergeMem === _ 1 7778 7779 7794 1 7781 1 7828 1 1 1 1 1 1 1 1 1 1 1 7783 7784 [[ 7848 ]] { N7779:rawptr:BotPTR N7794:jdk/internal/org/objectweb/asm/Frame+18 * - N7781:jdk/internal/org/objectweb/asm/Frame+36 * [narrow] - N7828:int[int:>=0]:exact+any * - - - - - - - - - - - N7783:java/lang/Object * N7784:java/lang/Object+8 * [narrowklass] } Memory: @BotPTR *+bot, idx=Bot; !jvms: Frame::push @ bci:94 Frame::execute @ bci:1409 7783 Phi === 7509 7773 7 [[ 7843 7797 8094 7814 ]] #memory Memory: @java/lang/Object *, idx=20; !jvms: Frame::push @ bci:60 Frame::execute @ bci:1409 7778 Phi === 7509 7773 7 [[ 8089 7797 7855 7814 7871 7830 7839 7843 7872 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: Frame::push @ bci:60 Frame::execute @ bci:1409 ====> 7843 MergeMem === _ 1 7778 7779 7794 1 7781 1 7828 1 1 1 1 1 1 1 1 1 1 1 1 7784 [[ 7848 ]] { N7779:rawptr:BotPTR N7794:jdk/internal/org/objectweb/asm/Frame+18 * - N7781:jdk/internal/org/objectweb/asm/Frame+36 * [narrow] - N7828:int[int:>=0]:exact+any * - - - - - - - - - - - - N7784:java/lang/Object+8 * [narrowklass] } Memory: @BotPTR *+bot, idx=Bot; !jvms: Frame::push @ bci:94 Frame::execute @ bci:1409 7784 Phi === 7509 7773 7 [[ 7843 7797 8095 7814 ]] #memory Memory: @java/lang/Object+8 * [narrowklass], idx=21; !jvms: Frame::push @ bci:60 Frame::execute @ bci:1409 7778 Phi === 7509 7773 7 [[ 8089 7797 7855 7814 7871 7830 7839 7843 7872 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: Frame::push @ bci:60 Frame::execute @ bci:1409 ====> 3318 MergeMem === _ 1 1614 1615 1616 1 1617 1 1618 1619 1620 1621 1622 1 1623 1624 3112 1 1 1 1 7 7 3489 7 3484 3499 [[ 3321 ]] { N1615:rawptr:BotPTR N1616:java/lang/String:exact+20 * [narrow] - N1617:java/lang/String:exact+16 * - N1618:java/lang/Object * N1619:java/lang/Object+8 * [narrowklass] N1620:java/util/AbstractList+12 * N1621:java/util/ArrayList+16 * N1622:java/util/ArrayList+20 * [narrow] - N1623:java/lang/String:exact+12 * N1624:java/lang/String:exact+17 * N3112:narrowoop: java/lang/Object *[int:>=0]+any * [narrow] - - - - N7:java/util/ArrayList$SubList:NotNull:exact *,iid=2970 N7:java/util/ArrayList$SubList:NotNull:exact+8 *,iid=2970 [narrowklass] N3489:java/util/ArrayList$SubList:NotNull:exact+24 *,iid=2970 [narrow] N7:java/util/ArrayList$SubList:NotNull:exact+16 *,iid=2970 N3484:java/util/AbstractList:NotNull:exact+12 *,iid=2970 N3499:java/util/ArrayList$SubList:NotNull:exact+20 *,iid=2970 } Memory: @BotPTR *+bot, idx=Bot; 3112 Phi === 1612 7 955 1382 1629 1886 2363 2586 2586 3283 [[ 3318 ]] #memory Memory: @narrowoop: java/lang/Object *[int:>=0]+any * [narrow], idx=16; !orig=1614 !jvms: String::split @ bci:-1 1614 Phi === 1612 7 955 1382 1629 1886 2363 2586 2586 3283 [[ 3318 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: String::split @ bci:-1 From patric.hedlin at oracle.com Mon Jun 29 14:38:35 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 29 Jun 2020 16:38:35 +0200 Subject: RFR(S): 8234605: C2 failed "assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 208 >> request = 101" In-Reply-To: References: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> Message-ID: <39b74ee0-0cc0-e42b-9d46-6050fb653f18@oracle.com> Thanks for reviewing Nils. /Patric On 2020-06-26 16:55, Nils Eliasson wrote: > Hi Patric, > > Looks good. > > Best regards, > Nils > > On 2020-06-26 16:15, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8234605 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8234605/ >> >> >> Turning assert into (universal) logging. This issue (now) manifest >> itself very seldom (typically on small estimates) and has thus served >> its purpose to help trim the node budget estimates to some reasonable >> level. Logging should be sufficient going forward. >> >> >> Testing: tier1-2 >> >> >> Best regards, >> Patric From patric.hedlin at oracle.com Mon Jun 29 14:38:39 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 29 Jun 2020 16:38:39 +0200 Subject: RFR(S): 8234605: C2 failed "assert(C->live_nodes() - live_at_begin <= 2 * _nodes_required) failed: Bad node estimate: actual = 208 >> request = 101" In-Reply-To: References: <4a0ca6a4-7143-7c70-5e84-22244d3b56c1@oracle.com> Message-ID: Hi Vladimir, On 2020-06-27 00:59, Vladimir Kozlov wrote: > Hi Patric, > > You can move #ifdef ASSERT to put if(check_estimate) under it. Personally I prefer an empty statement list over an unused parameter/variable, but I can certainly move the *if (check...* inside the conditional ASSERT section. Assuming no refresh required. Thanks for reviewing. /Patric > > Thanks, > Vladimir > > On 6/26/20 7:15 AM, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8234605 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8234605/ >> >> >> Turning assert into (universal) logging. This issue (now) manifest >> itself very seldom (typically on small estimates) and has thus served >> its purpose to help trim the node budget estimates to some reasonable >> level. Logging should be sufficient going forward. >> >> >> Testing: tier1-2 >> >> >> Best regards, >> Patric From vladimir.kozlov at oracle.com Mon Jun 29 14:59:17 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 29 Jun 2020 07:59:17 -0700 Subject: RFR: 8248410 - Correct Fix for 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode In-Reply-To: References: Message-ID: <97EC9124-0C04-4A12-9FE5-3F13757CC88B@oracle.com> Looks good. Thanks Vladimir > On Jun 26, 2020, at 8:20 AM, Bob Vandette wrote: > > ?The fix for "8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode" added > an inner class which causes problems when generating GraalVM?s libjvmcicompiler.so library. This fix removes the > inner class addition and matches the implementation that is in the GraalVMs labsjdk sources. > > > Here?s the proposed fix: > > diff --git a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > --- a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > +++ b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > @@ -65,25 +65,21 @@ > @Override > public abstract int getIdentityHashCode(); > > - static class Fields { > - // Initializing these too early causes a hang, so do it here in a subclass > - static final HotSpotResolvedJavaField callSiteTargetField = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; > - static final HotSpotResolvedJavaField constantCallSiteFrozenField = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; > - } > - > private boolean isFullyInitializedConstantCallSite() { > if (!runtime().getConstantCallSite().isInstance(this)) { > return false; > } > // read ConstantCallSite.isFrozen as a volatile field > - boolean isFrozen = readFieldValue(Fields.constantCallSiteFrozenField, true /* volatile */).asBoolean(); > + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; > + boolean isFrozen = readFieldValue(field, true /* volatile */).asBoolean(); > // isFrozen true implies fully-initialized > return isFrozen; > } > > private HotSpotObjectConstantImpl readTarget() { > // read CallSite.target as a volatile field > - return (HotSpotObjectConstantImpl) readFieldValue(Fields.callSiteTargetField, true /* volatile */); > + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; > + return (HotSpotObjectConstantImpl) readFieldValue(field, true /* volatile */); > } > > Bob. > > From aph at redhat.com Mon Jun 29 16:10:06 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2020 17:10:06 +0100 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> Message-ID: <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> On 29/06/2020 08:48, Yang Zhang wrote: > 1. Instructions that can be matched with NEON instructions directly. > MulVB, SqrtVF and AbsV have been merged into jdk master already. > > 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. > Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. > > 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. > These instructions cannot be moved into jdk master first because there isn't middle-end support. > > I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. Thank you *very* much for your hard work. Appreciated! -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tom.rodriguez at oracle.com Mon Jun 29 16:59:33 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 29 Jun 2020 09:59:33 -0700 Subject: RFR: 8248410 - Correct Fix for 8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode In-Reply-To: References: Message-ID: <116f02ad-1928-172c-7292-44b77fd039a2@oracle.com> Looks good to me. tom Bob Vandette wrote on 6/26/20 6:52 AM: > The fix for "8236647: java/lang/invoke/CallSiteTest.java failed with InvocationTargetException in Graal mode" added > an inner class which causes problems when generating GraalVM?s libjvmcicompiler.so library. This fix removes the > inner class addition and matches the implementation that is in the GraalVMs labsjdk sources. > > > Here?s the proposed fix: > > diff --git a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > --- a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > +++ b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotObjectConstantImpl.java > @@ -65,25 +65,21 @@ > @Override > public abstract int getIdentityHashCode(); > > - static class Fields { > - // Initializing these too early causes a hang, so do it here in a subclass > - static final HotSpotResolvedJavaField callSiteTargetField = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; > - static final HotSpotResolvedJavaField constantCallSiteFrozenField = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; > - } > - > private boolean isFullyInitializedConstantCallSite() { > if (!runtime().getConstantCallSite().isInstance(this)) { > return false; > } > // read ConstantCallSite.isFrozen as a volatile field > - boolean isFrozen = readFieldValue(Fields.constantCallSiteFrozenField, true /* volatile */).asBoolean(); > + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().constantCallSiteFrozenField; > + boolean isFrozen = readFieldValue(field, true /* volatile */).asBoolean(); > // isFrozen true implies fully-initialized > return isFrozen; > } > > private HotSpotObjectConstantImpl readTarget() { > // read CallSite.target as a volatile field > - return (HotSpotObjectConstantImpl) readFieldValue(Fields.callSiteTargetField, true /* volatile */); > + HotSpotResolvedJavaField field = HotSpotMethodHandleAccessProvider.Internals.instance().callSiteTargetField; > + return (HotSpotObjectConstantImpl) readFieldValue(field, true /* volatile */); > } > > Bob. > > From aph at redhat.com Mon Jun 29 17:51:16 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2020 18:51:16 +0100 Subject: Running IGV Message-ID: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> It's been a while since I've run the ideal graph visualizer. I've built the version in jdk-jdk/src/utils/IdealGraphVisualizer, but using igv.sh just results in a program that seems to run for a while, then exits. It doesn't seem to leave anything running in the background. Is there any special trick to make it run? Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Jun 29 18:06:03 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 29 Jun 2020 11:06:03 -0700 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: <0e54fbd0-d2d5-c831-f4e8-950e608e0379@oracle.com> References: <0e54fbd0-d2d5-c831-f4e8-950e608e0379@oracle.com> Message-ID: Good. Thanks, Vladimir On 6/28/20 11:06 PM, Tobias Hartmann wrote: > Hi Vladimir, > > Thanks for the review! > > On 26.06.20 20:37, Vladimir Kozlov wrote: >> I prefer an assign to a static variable. Printing is very complex code. > > Okay, my intention for the printing was that it would help to diagnose similar issues (method not > called) but that should never happen with a non-empty method anyway. Here's a new version: > http://cr.openjdk.java.net/~thartmann/8248265/webrev.01/ > >>> With AOT compiled java.base, we call the test method EmptyMain::main through >>> JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are >>> not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 >> >> These changes are done for JDK-8247992. JDK-8247832 is not related and not fixed yet. > > Right, I've linked the right changeset but mentioned the wrong bug (copy-paste error from the > clipboard). > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Mon Jun 29 18:22:04 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 29 Jun 2020 11:22:04 -0700 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: References: Message-ID: <47f22088-ae24-3c5d-2959-1d2def6fb229@oracle.com> On 6/29/20 1:49 AM, Christian Hagedorn wrote: > Thank you Nils and Vladimir for your reviews! > > On 27.06.20 00:35, Vladimir Kozlov wrote: >> You don't need to use 'C->' in Compile::Optimize() method: >> >> DEBUG_ONLY(C->set_phase_optimize_finished();) > > Oh, right! > >> Yes, remove all nodes limit checks after optimization phase. > > I created a new webrev with all those Compile::check_node_count() calls removed and the change above: > http://cr.openjdk.java.net/~chagedorn/8244724/webrev.02/ > > However, since these calls also affect bailout decisions in the product version and this bug is targeted for 15, I > suggest to remove these calls/bailouts for 16 only in a separate RFE to minimize the risk. Agree. And please keep third failure check in chaitin.cpp because it also follows Split(): 538 if (C->failing()) { 539 return; 540 } It seems okay to remove 1st and 4th checks as you did. Thanks, Vladimir > > Best regards, > Christian > >> Thanks, >> Vladimir >> >> On 6/26/20 7:38 AM, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8244724 >>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ >>> >>> The testcase contains many string concatinations. These are compiled by javac with -XDstringConcat=inline which >>> creates a lot of StringBuilder objects and calls. As a result, we get a huge graph and eventually hit the live node >>> limit assert during code generation when trying to create new nodes - either during PhaseCFG::build_cfg() or later in >>> PhaseCFG::global_code_motion(). >>> >>> We could try to introduce estimates for them to bailout but that appears to be difficult to get right without being >>> too pessimistic about it. But we need to be in order to avoid hitting the assert again by just modifying the testcase. >>> >>> Therefore, my suggestion is to completely skip the assert once the optimization phase is finished as we should not >>> strictly care about the node limit anymore at this point in time and it does not really provide much help for finding >>> bugs. >>> >>> A question remains, though, if we should also get rid of the remaining live node limit bailout checks in >>> Compile::Code_Gen() like [1] as it appears to be a waste to go through all the optimization in the optimization phase >>> to then bailout while generating code based only on the live node limit itself. What do you think about that? >>> >>> I also updated "<" into "<=" in the live node limit assert because we should be allowed to reach the limit but not go >>> beyond. >>> >>> Thank you! >>> >>> Best regards, >>> Christian >>> >>> >>> [1] http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 From vladimir.kozlov at oracle.com Mon Jun 29 18:48:56 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 29 Jun 2020 11:48:56 -0700 Subject: RFR(M): 8248359: Update JVMCI In-Reply-To: <322BCF85-A5DA-4463-9250-A6744FABC9AF@oracle.com> References: <322BCF85-A5DA-4463-9250-A6744FABC9AF@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 6/29/20 2:36 AM, Doug Simon wrote: > Please review this webrev that ports a number of improvements to JVMCI in JDK 16 from jvmci-8 : > > * Move C++ state and functionality associated with a HotSpotJVMCIRuntime Java object into its peer JVMCIRuntime C++ object (e.g., the _shared_library_javavm moves from being a static field in JVMCIEnv to an instance field in JVMCIRuntime). > * The management of JNI globals handles and Metadata handles passed to JVMCI Java code should also be moved to JVMCIRuntime. > * Introduce tracing of low frequency JVMCIRuntime lifetime events (e.g. JVMCIRuntime lifetime phase events) at less verbose trace level. > * Trace high frequency JVMCI events (e.g. CompilerToVM calls) at more verbose trace level. > * Detect unsupported jvmci.* system properties and use fuzzy matching for an error message suggesting closely matching supported properties. > * Improve javadoc for HotSpotJVMCIRuntime.attachCurrentThread. > * Reduce calls to JavaThread::current() in conjunction with JNIAccessMark. > > https://bugs.openjdk.java.net/browse/JDK-8248359 > https://dougxc.github.io/webrevs/8248359_16.01 > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From Charlie.Gracie at microsoft.com Mon Jun 29 21:05:29 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Mon, 29 Jun 2020 21:05:29 +0000 Subject: Stack allocation prototype for C2 Message-ID: Hi hotspot-compiler-dev community, Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, if they wanted to, would also be appreciated (i.e., a repo somewhere). For a quick refresher here is a link to Nikola?s talk at FOSDEM: https://fosdem.org/2020/schedule/event/reducing_gc_times/ Here is a link to our initial webrev: http://cr.openjdk.java.net/~adityam/charlie/stack_alloc/ Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial performance results. This document can be found here: https://github.com/microsoft/openjdk-proposals/blob/master/stack_allocation/Stack_Allocation_JEP.md Thanks in advance for reviews, suggestions, concerns, comments and issues. Charlie and Nikola From joserz at linux.ibm.com Mon Jun 29 21:09:59 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Mon, 29 Jun 2020 18:09:59 -0300 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> References: <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> Message-ID: <20200629210959.GA13167@pacoca> Hello Martin, Sure, I'll send the v2 with there changes. Thank you for reviewing it!! On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > Hi Jose, > > Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? > > Please update the Copyright year in vm_version_poc.hpp. > > I can?t test the change, but it looks good to me. > > Best regards, > Martin > > > Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" : > > > > ?Hello team! > > > > This patch introduces Power10 to OpenJDK and implements three new instructions: > > - brh - byte-reverse halfword > > - brw - byte-reverse word > > - brd - byte-reverse doubleword > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > Thanks for your review! > > > > Jose R. Ziviani From joserz at linux.ibm.com Tue Jun 30 00:15:28 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Mon, 29 Jun 2020 21:15:28 -0300 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> References: <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> Message-ID: <20200630001528.GA26652@pacoca> Hello team, Here's the 2nd version, implementing the suggestions asked by Martin. Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 Thank you!! Jose On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > Hi Jose, > > Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? > > Please update the Copyright year in vm_version_poc.hpp. > > I can?t test the change, but it looks good to me. > > Best regards, > Martin > > > Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" : > > > > ?Hello team! > > > > This patch introduces Power10 to OpenJDK and implements three new instructions: > > - brh - byte-reverse halfword > > - brw - byte-reverse word > > - brd - byte-reverse doubleword > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > Thanks for your review! > > > > Jose R. Ziviani From sergey.kuksenko at oracle.com Tue Jun 30 03:30:40 2020 From: sergey.kuksenko at oracle.com (Sergey Kuksenko) Date: Mon, 29 Jun 2020 20:30:40 -0700 Subject: Stack allocation prototype for C2 In-Reply-To: References: Message-ID: I am just curious. For each benchmark you show allocation reduce size in general. Do you have statistics which stack allocated objects gives major impact? And which code patterns fail scalar replacement except well know Integer cache flow merge? On 6/29/20 2:05 PM, Charlie Gracie wrote: > Hi hotspot-compiler-dev community, > > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). > > For a quick refresher here is a link to Nikola?s talk at FOSDEM: > https://fosdem.org/2020/schedule/event/reducing_gc_times/ > > Here is a link to our initial webrev: > http://cr.openjdk.java.net/~adityam/charlie/stack_alloc/ > > Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission > form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial > performance results. This document can be found here: > https://github.com/microsoft/openjdk-proposals/blob/master/stack_allocation/Stack_Allocation_JEP.md > > Thanks in advance for reviews, suggestions, concerns, comments and issues. > Charlie and Nikola > From tobias.hartmann at oracle.com Tue Jun 30 07:08:08 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 30 Jun 2020 09:08:08 +0200 Subject: [15] RFR(T): 8248265: compiler/ciReplay tests fail with AOT compiled java.base In-Reply-To: References: <0e54fbd0-d2d5-c831-f4e8-950e608e0379@oracle.com> Message-ID: Thanks Vladimir! Best regards, Tobias On 29.06.20 20:06, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/28/20 11:06 PM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> Thanks for the review! >> >> On 26.06.20 20:37, Vladimir Kozlov wrote: >>> I prefer an assign to a static variable. Printing is very complex code. >> >> Okay, my intention for the printing was that it would help to diagnose similar issues (method not >> called) but that should never happen with a non-empty method anyway. Here's a new version: >> http://cr.openjdk.java.net/~thartmann/8248265/webrev.01/ >> >>>> With AOT compiled java.base, we call the test method EmptyMain::main through >>>> JavaCalls::call_helper(). The fix for JDK-8247832 restored the old behavior that empty methods are >>>> not called: https://hg.openjdk.java.net/jdk/jdk15/rev/94025f9e6a0d#l2.36 >>> >>> These changes are done for JDK-8247992. JDK-8247832 is not related and not fixed yet. >> >> Right, I've linked the right changeset but mentioned the wrong bug (copy-paste error from the >> clipboard). >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Jun 30 07:11:08 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 30 Jun 2020 09:11:08 +0200 Subject: Running IGV In-Reply-To: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> References: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> Message-ID: Hi Andrew, igv.sh writes into a log file (.igv.log). The problem might be that you need to run with JDK 8. Best regards, Tobias On 29.06.20 19:51, Andrew Haley wrote: > It's been a while since I've run the ideal graph visualizer. > > I've built the version in jdk-jdk/src/utils/IdealGraphVisualizer, but > using igv.sh just results in a program that seems to run for a while, > then exits. It doesn't seem to leave anything running in the background. > > Is there any special trick to make it run? > > Thanks, > From rwestrel at redhat.com Tue Jun 30 07:52:23 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 30 Jun 2020 09:52:23 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> Message-ID: <87o8p1ours.fsf@redhat.com> > I added some extra code to check if we still have a chance to replace > equivalent phis in MergeMemNode::Ideal. Is this a reliable way of finding a missed transformation opportunity? Could it be that MergeMemNode::Ideal() runs before PhiNode::Identity() has had a chance to run? that is the merge mem is first in the IGVN queue and the Phi is next so the merge mem would first fail to optimize, then the Phi would optimize, cause the merge mem to be enqueued again and this time be properly optimized. Roland. From christian.hagedorn at oracle.com Tue Jun 30 07:55:26 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 30 Jun 2020 09:55:26 +0200 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: <47f22088-ae24-3c5d-2959-1d2def6fb229@oracle.com> References: <47f22088-ae24-3c5d-2959-1d2def6fb229@oracle.com> Message-ID: <30f99c84-0b2d-bc3e-267e-e3e569db5589@oracle.com> Hi Vladimir On 29.06.20 20:22, Vladimir Kozlov wrote: > On 6/29/20 1:49 AM, Christian Hagedorn wrote: >> Thank you Nils and Vladimir for your reviews! >> >> On 27.06.20 00:35, Vladimir Kozlov wrote: >>> You don't need to use 'C->' in Compile::Optimize() method: >>> >>> DEBUG_ONLY(C->set_phase_optimize_finished();) >> >> Oh, right! >> >>> Yes, remove all nodes limit checks after optimization phase. >> >> I created a new webrev with all those Compile::check_node_count() >> calls removed and the change above: >> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.02/ >> >> However, since these calls also affect bailout decisions in the >> product version and this bug is targeted for 15, I suggest to remove >> these calls/bailouts for 16 only in a separate RFE to minimize the risk. > > Agree. And please keep third failure check in chaitin.cpp because it > also follows Split(): > > ?538???? if (C->failing()) { > ?539?????? return; > ?540???? } > > It seems okay to remove 1st and 4th checks as you did. Right, that 3rd check should be kept. Sounds good, then I'll push webrev.01 (without the bailouts being removed) together with this change: >>> You don't need to use 'C->' in Compile::Optimize() method: >>> >>> DEBUG_ONLY(C->set_phase_optimize_finished();) I created a new RFE for 16 [1] to remove the bailouts as done additionally in webrev.02. I will run some testing and then send out another review for [1]. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8248529 > > Thanks, > Vladimir > >> >> Best regards, >> Christian >> >>> Thanks, >>> Vladimir >>> >>> On 6/26/20 7:38 AM, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8244724 >>>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ >>>> >>>> The testcase contains many string concatinations. These are compiled >>>> by javac with -XDstringConcat=inline which creates a lot of >>>> StringBuilder objects and calls. As a result, we get a huge graph >>>> and eventually hit the live node limit assert during code generation >>>> when trying to create new nodes - either during >>>> PhaseCFG::build_cfg() or later in PhaseCFG::global_code_motion(). >>>> >>>> We could try to introduce estimates for them to bailout but that >>>> appears to be difficult to get right without being too pessimistic >>>> about it. But we need to be in order to avoid hitting the assert >>>> again by just modifying the testcase. >>>> >>>> Therefore, my suggestion is to completely skip the assert once the >>>> optimization phase is finished as we should not strictly care about >>>> the node limit anymore at this point in time and it does not really >>>> provide much help for finding bugs. >>>> >>>> A question remains, though, if we should also get rid of the >>>> remaining live node limit bailout checks in Compile::Code_Gen() like >>>> [1] as it appears to be a waste to go through all the optimization >>>> in the optimization phase to then bailout while generating code >>>> based only on the live node limit itself. What do you think about that? >>>> >>>> I also updated "<" into "<=" in the live node limit assert because >>>> we should be allowed to reach the limit but not go beyond. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Christian >>>> >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 >>>> From felix.yang at huawei.com Tue Jun 30 07:59:59 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 30 Jun 2020 07:59:59 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <87o8p1ours.fsf@redhat.com> References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> Message-ID: Hi, > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Tuesday, June 30, 2020 3:52 PM > To: Yangfei (Felix) ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > > > I added some extra code to check if we still have a chance to replace > > equivalent phis in MergeMemNode::Ideal. > > Is this a reliable way of finding a missed transformation opportunity? > Could it be that MergeMemNode::Ideal() runs before PhiNode::Identity() > has had a chance to run? that is the merge mem is first in the IGVN queue Yes, I have just confirmed that. The follow logic in MergeMemNode::Ideal() could catch phis with the same input before PhiNode::Identity() had a change to do. 4608 // replace equivalent phis (unfortunately, they do not GVN together) 4609 if (new_mem != NULL && new_mem != new_base && 4610 new_mem->req() == phi_len && new_mem->in(0) == phi_reg) { 4611 if (new_mem->is_Phi()) { 4612 PhiNode* phi_mem = new_mem->as_Phi(); 4613 for (uint i = 1; i < phi_len; i++) { 4614 if (phi_base->in(i) != phi_mem->in(i)) { 4615 phi_mem = NULL; 4616 break; 4617 } 4618 } 4619 if (phi_mem != NULL) { 4620 // equivalent phi nodes; revert to the def 4621 new_mem = new_base; 4622 } 4623 } 4624 } So I think I will propose a new patch with this logic removed. It should be useless with our fix in PhiNode::Identity(). > and the Phi is next so the merge mem would first fail to optimize, then the > Phi would optimize, cause the merge mem to be enqueued again and this > time be properly optimized. From felix.yang at huawei.com Tue Jun 30 12:35:07 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 30 Jun 2020 12:35:07 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> Message-ID: Hi again, Updated webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.03/ Tier1-3 tested with fastdebug build both on aarch64-linux-gnu & x86_64-linux-gnu. Newly added test case fail without the fix and pass otherwise. Please take another look. Thanks, Felix > -----Original Message----- > From: Yangfei (Felix) > Sent: Tuesday, June 30, 2020 4:00 PM > To: 'Roland Westrelin' ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi, > > > -----Original Message----- > > From: Roland Westrelin [mailto:rwestrel at redhat.com] > > Sent: Tuesday, June 30, 2020 3:52 PM > > To: Yangfei (Felix) ; Tobias Hartmann > > ; hotspot-compiler-dev at openjdk.java.net > > Cc: guoge (A) ; zhouyong (V) > > > > Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 > > MergeMemNode::Ideal > > > > > > > I added some extra code to check if we still have a chance to > > > replace equivalent phis in MergeMemNode::Ideal. > > > > Is this a reliable way of finding a missed transformation opportunity? > > Could it be that MergeMemNode::Ideal() runs before PhiNode::Identity() > > has had a chance to run? that is the merge mem is first in the IGVN > > queue > > Yes, I have just confirmed that. > The follow logic in MergeMemNode::Ideal() could catch phis with the same > input before PhiNode::Identity() had a change to do. > > 4608 // replace equivalent phis (unfortunately, they do not GVN together) > 4609 if (new_mem != NULL && new_mem != new_base && > 4610 new_mem->req() == phi_len && new_mem->in(0) == phi_reg) { > 4611 if (new_mem->is_Phi()) { > 4612 PhiNode* phi_mem = new_mem->as_Phi(); > 4613 for (uint i = 1; i < phi_len; i++) { > 4614 if (phi_base->in(i) != phi_mem->in(i)) { > 4615 phi_mem = NULL; > 4616 break; > 4617 } > 4618 } > 4619 if (phi_mem != NULL) { > 4620 // equivalent phi nodes; revert to the def > 4621 new_mem = new_base; > 4622 } > 4623 } > 4624 } > > So I think I will propose a new patch with this logic removed. It should be > useless with our fix in PhiNode::Identity(). > > > and the Phi is next so the merge mem would first fail to optimize, > > then the Phi would optimize, cause the merge mem to be enqueued again > > and this time be properly optimized. > From vladimir.x.ivanov at oracle.com Tue Jun 30 13:48:28 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Jun 2020 16:48:28 +0300 Subject: Scalar replacement issue in JDK 14.0.1 In-Reply-To: <426551593172556@mail.yandex.ru> References: <426551593172556@mail.yandex.ru> Message-ID: Hi Sergey, I took a look at the benchmark and I think there's more than 11 vs 14 in play here. When I compiled the benchmark with jdk8, I saw the following in the compilation log (irrespective of jdk version used): 1221 67 b org.openjdk.ea.StringCompositeKeyBenchmark::compositeKey (22 bytes) ... @ 15 org.openjdk.ea.StringCompositeKeyBenchmark$Key:: (7 bytes) unloaded signature classes ... The constructor is not inlined, so even if the Key instance doesn't escape globally, it escapes into a call and C2 can't scalar replace it. The reason why inlining fails is private constructor can't be accessed directly, but requires a bridge method. Bridge method has additional method argument which has a non-existent type (with unique name). Inlining heuristics don't inline methods which have unresolved classes in their signatures. But if you recompile the benchmark with jdk11 (or later), inlining happends and the allocation is eliminated [1]. If you look at the bytecodes, there's no bridge method anymore. Javac generates NestMembers attribute instead which is enough to make private constructor accessible from the enclosing class: $ javap -verbose -private target/classes//org/openjdk/ea/StringCompositeKeyBenchmark.class ... NestMembers: org/openjdk/ea/StringCompositeKeyBenchmark$Key org/openjdk/ea/StringCompositeKeyBenchmark$Data ... So, it boils down to the target language level being used. Starting 11, javac doesn't emit bridge methods anymore and it helps with getting EA in C2 to eliminate the allocation. Best regards, Vladimir Ivanov [1] $ javap -verbose -private target/classes//org/openjdk/ea/StringCompositeKeyBenchmark.class public java.lang.Object compositeKey(org.openjdk.ea.StringCompositeKeyBenchmark$Data); descriptor: (Lorg/openjdk/ea/StringCompositeKeyBenchmark$Data;)Ljava/lang/Object; flags: (0x0001) ACC_PUBLIC Code: stack=6, locals=2, args_size=2 0: aload_1 1: invokestatic #9 // Method org/openjdk/ea/StringCompositeKeyBenchmark$Data.access$200:(Lorg/openjdk/ea/StringCompositeKeyBenchmark$Data;)Ljava/util/HashMap; 4: new #13 // class org/openjdk/ea/StringCompositeKeyBenchmark$Key 7: dup 8: ldc #15 // String code1 10: aload_1 11: invokestatic #17 // Method org/openjdk/ea/StringCompositeKeyBenchmark$Data.access$000:(Lorg/openjdk/ea/StringCompositeKeyBenchmark$Data;)Ljava/util/Locale; 14: aconst_null 15: invokespecial #21 // Method org/openjdk/ea/StringCompositeKeyBenchmark$Key."":(Ljava/lang/String;Ljava/util/Locale;Lorg/openjdk/ea/StringCompositeKeyBenchmark$1;)V 18: invokevirtual #24 // Method java/util/HashMap.get:(Ljava/lang/Object;)Ljava/lang/Object; 21: areturn $ javap -verbose -private target/classes//org/openjdk/ea/StringCompositeKeyBenchmark\$Key.class ... org.openjdk.ea.StringCompositeKeyBenchmark$Key(java.lang.String, java.util.Locale, org.openjdk.ea.StringCompositeKeyBenchmark$1); descriptor: (Ljava/lang/String;Ljava/util/Locale;Lorg/openjdk/ea/StringCompositeKeyBenchmark$1;)V ... [2] 1426 67 b org.openjdk.ea.StringCompositeKeyBenchmark::compositeKey (26 bytes) ======== Connection graph for org.openjdk.ea.StringCompositeKeyBenchmark::compositeKey JavaObject NoEscape(NoEscape) [ 148F 142F 144F 137F [ 58 63 ]] 46 Allocate === 29 6 7 8 1 ( 44 43 39 1 1 37 42 ) [[ 47 48 49 56 57 58 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) StringCompositeKeyBenchmark::compositeKey @ bci:4 !jvms: StringCompositeKeyBenchmark::compositeKey @ bci:4 LocalVar [ 46P [ 63 148b 142b ]] 58 Proj === 46 [[ 59 63 142 148 ]] #5 !jvms: StringCompositeKeyBenchmark::compositeKey @ bci:4 LocalVar [ 58 46P [ 144b 137b ]] 63 CheckCastPP === 60 58 [[ 906 865 814 814 865 717 689 689 144 678 717 437 137 137 144 155 166 185 678 678 649 450 223 241 649 634 618 598 576 522 522 990 990 478 478 450 371 371 424 424 437 ]] #org/openjdk/ea/StringCompositeKeyBenchmark$Key:NotNull:exact * Oop:org/openjdk/ea/StringCompositeKeyBenchmark$Key:NotNull:exact * !jvms: StringCompositeKeyBenchmark::compositeKey @ bci:4 Scalar 63 CheckCastPP === 60 58 [[ 906 865 814 814 865 717 689 689 424 990 717 437 522 522 424 437 166 185 990 478 478 450 223 241 450 634 618 598 576 ]] #org/openjdk/ea/StringCompositeKeyBenchmark$Key:NotNull:exact *,iid=46 Oop:org/openjdk/ea/StringCompositeKeyBenchmark$Key:NotNull:exact *,iid=46 !jvms: StringCompositeKeyBenchmark::compositeKey @ bci:4 ++++ Eliminated: 46 Allocate @ 9 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 19 org.openjdk.ea.StringCompositeKeyBenchmark$Key:: (15 bytes) inline (hot) @ 1 java.lang.Object:: (1 bytes) inline (hot) On 26.06.2020 15:06, ?????? ??????? wrote: > Hello, > > while looking into an issue I've found out that scalar replacement is not working in trivial case on JDK 14.0.1. > > This benchmark illustrates the issue: > > @State(Scope.Thread) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class StringCompositeKeyBenchmark { > @Benchmark > public Object compositeKey(Data data) { > return data.keyObjectMap.get(new Key(data.code, data.locale)); > } > > > @State(Scope.Thread) > public static class Data { > private final String code = "code1"; > private final Locale locale = Locale.getDefault(); > > private final HashMap keyObjectMap = new HashMap<>(); > > @Setup > public void setUp() { > keyObjectMap.put(new Key(code, locale), new Object()); > } > } > > private static final class Key { > private final String code; > private final Locale locale; > > private Key(String code, Locale locale) { > this.code = code; > this.locale = locale; > } > > @Override > public boolean equals(Object o) { > if (this == o) return true; > if (o == null || getClass() != o.getClass()) return false; > > Key key = (Key) o; > > if (!code.equals(key.code)) return false; > return locale.equals(key.locale); > } > > @Override > public int hashCode() { > return 31 * code.hashCode() + locale.hashCode(); > } > } > } > > When I run this on JDK 11 (JDK 11.0.7, OpenJDK 64-Bit Server VM, 11.0.7+10-post-Ubuntu-2ubuntu218.04) I get this output: > > Benchmark Mode Cnt Score Error Units > StringCompositeKeyBenchmark.compositeKey avgt 10 5.510 ? 0.121 ns/op > StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate avgt 10 ? 10?? MB/sec > StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate.norm avgt 10 ? 10?? B/op > StringCompositeKeyBenchmark.compositeKey:?gc.count avgt 10 ? 0 counts > > As I understand Java runtime erases object allocation here and we don't use additional memory. > > Same run on JDK 14 (JDK 14.0.1, Java HotSpot(TM) 64-Bit Server VM, 14.0.1+7) demonstrate object allocation per each method call: > > Benchmark Mode Cnt Score Error Units > StringCompositeKeyBenchmark.compositeKey avgt 10 7.958 ? 1.360 ns/op > StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate avgt 10 1937.551 ? 320.718 MB/sec > StringCompositeKeyBenchmark.compositeKey:?gc.alloc.rate.norm avgt 10 24.001 ? 0.001 B/op > StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Eden_Space avgt 10 1879.111 ? 596.770 MB/sec > StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Eden_Space.norm avgt 10 23.244 ? 5.509 B/op > StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Survivor_Space avgt 10 0.267 ? 0.750 MB/sec > StringCompositeKeyBenchmark.compositeKey:?gc.churn.G1_Survivor_Space.norm avgt 10 0.003 ? 0.009 B/op > StringCompositeKeyBenchmark.compositeKey:?gc.count avgt 10 23.000 counts > StringCompositeKeyBenchmark.compositeKey:?gc.time avgt 10 44.000 ms > > At the same time in more trivial scenario like > > @Benchmark > public int compositeKey(Data data) { > return new Key(data.code, data.locale).hashCode(); > } > > scalar replacement again eliminates allocation of object. > > So I'm curious whether this is normal behaviour or a bug? > > Regards, > Sergey Tsypanov > From rwestrel at redhat.com Tue Jun 30 15:33:56 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 30 Jun 2020 17:33:56 +0200 Subject: RFR(S): 8247824: CTW: C2 (Shenandoah) compilation fails with SEGV in SBC2Support::pin_and_expand In-Reply-To: <92f8761b-2a54-3dca-fba0-a526b005480f@oracle.com> References: <87sgeswn1f.fsf@redhat.com> <92f8761b-2a54-3dca-fba0-a526b005480f@oracle.com> Message-ID: <87lfk4pnyz.fsf@redhat.com> Thanks for the reviews Roman & Tobias. Roland. From vladimir.kozlov at oracle.com Tue Jun 30 16:42:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Jun 2020 09:42:20 -0700 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: <30f99c84-0b2d-bc3e-267e-e3e569db5589@oracle.com> References: <47f22088-ae24-3c5d-2959-1d2def6fb229@oracle.com> <30f99c84-0b2d-bc3e-267e-e3e569db5589@oracle.com> Message-ID: <94759a01-c1a9-3f89-4a74-4b1860fd69b1@oracle.com> Good. Thanks, Vladimir On 6/30/20 12:55 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 29.06.20 20:22, Vladimir Kozlov wrote: >> On 6/29/20 1:49 AM, Christian Hagedorn wrote: >>> Thank you Nils and Vladimir for your reviews! >>> >>> On 27.06.20 00:35, Vladimir Kozlov wrote: >>>> You don't need to use 'C->' in Compile::Optimize() method: >>>> >>>> DEBUG_ONLY(C->set_phase_optimize_finished();) >>> >>> Oh, right! >>> >>>> Yes, remove all nodes limit checks after optimization phase. >>> >>> I created a new webrev with all those Compile::check_node_count() calls removed and the change above: >>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.02/ >>> >>> However, since these calls also affect bailout decisions in the product version and this bug is targeted for 15, I >>> suggest to remove these calls/bailouts for 16 only in a separate RFE to minimize the risk. >> >> Agree. And please keep third failure check in chaitin.cpp because it also follows Split(): >> >> ??538???? if (C->failing()) { >> ??539?????? return; >> ??540???? } >> > It seems okay to remove 1st and 4th checks as you did. > > Right, that 3rd check should be kept. > > Sounds good, then I'll push webrev.01 (without the bailouts being removed) together with this change: >>>> You don't need to use 'C->' in Compile::Optimize() method: >>>> >>>> DEBUG_ONLY(C->set_phase_optimize_finished();) > > I created a new RFE for 16 [1] to remove the bailouts as done additionally in webrev.02. I will run some testing and > then send out another review for [1]. > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8248529 > >> >> Thanks, >> Vladimir >> >>> >>> Best regards, >>> Christian >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 6/26/20 7:38 AM, Christian Hagedorn wrote: >>>>> Hi >>>>> >>>>> Please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8244724 >>>>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ >>>>> >>>>> The testcase contains many string concatinations. These are compiled by javac with -XDstringConcat=inline which >>>>> creates a lot of StringBuilder objects and calls. As a result, we get a huge graph and eventually hit the live node >>>>> limit assert during code generation when trying to create new nodes - either during PhaseCFG::build_cfg() or later >>>>> in PhaseCFG::global_code_motion(). >>>>> >>>>> We could try to introduce estimates for them to bailout but that appears to be difficult to get right without being >>>>> too pessimistic about it. But we need to be in order to avoid hitting the assert again by just modifying the testcase. >>>>> >>>>> Therefore, my suggestion is to completely skip the assert once the optimization phase is finished as we should not >>>>> strictly care about the node limit anymore at this point in time and it does not really provide much help for >>>>> finding bugs. >>>>> >>>>> A question remains, though, if we should also get rid of the remaining live node limit bailout checks in >>>>> Compile::Code_Gen() like [1] as it appears to be a waste to go through all the optimization in the optimization >>>>> phase to then bailout while generating code based only on the live node limit itself. What do you think about that? >>>>> >>>>> I also updated "<" into "<=" in the live node limit assert because we should be allowed to reach the limit but not >>>>> go beyond. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> >>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 From christian.hagedorn at oracle.com Tue Jun 30 16:56:33 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 30 Jun 2020 18:56:33 +0200 Subject: [15] RFR(S): 8244724: CTW: C2 compilation fails with "Live Node limit exceeded limit" In-Reply-To: <94759a01-c1a9-3f89-4a74-4b1860fd69b1@oracle.com> References: <47f22088-ae24-3c5d-2959-1d2def6fb229@oracle.com> <30f99c84-0b2d-bc3e-267e-e3e569db5589@oracle.com> <94759a01-c1a9-3f89-4a74-4b1860fd69b1@oracle.com> Message-ID: <9836d7ef-b20e-cb6f-33cf-9212d322ff2f@oracle.com> Thank you Vladimir for your review! Best regards, Christian On 30.06.20 18:42, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 6/30/20 12:55 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 29.06.20 20:22, Vladimir Kozlov wrote: >>> On 6/29/20 1:49 AM, Christian Hagedorn wrote: >>>> Thank you Nils and Vladimir for your reviews! >>>> >>>> On 27.06.20 00:35, Vladimir Kozlov wrote: >>>>> You don't need to use 'C->' in Compile::Optimize() method: >>>>> >>>>> DEBUG_ONLY(C->set_phase_optimize_finished();) >>>> >>>> Oh, right! >>>> >>>>> Yes, remove all nodes limit checks after optimization phase. >>>> >>>> I created a new webrev with all those Compile::check_node_count() >>>> calls removed and the change above: >>>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.02/ >>>> >>>> However, since these calls also affect bailout decisions in the >>>> product version and this bug is targeted for 15, I suggest to remove >>>> these calls/bailouts for 16 only in a separate RFE to minimize the >>>> risk. >>> >>> Agree. And please keep third failure check in chaitin.cpp because it >>> also follows Split(): >>> >>> ??538???? if (C->failing()) { >>> ??539?????? return; >>> ??540???? } >>> > It seems okay to remove 1st and 4th checks as you did. >> >> Right, that 3rd check should be kept. >> >> Sounds good, then I'll push webrev.01 (without the bailouts being >> removed) together with this change: >>>>> You don't need to use 'C->' in Compile::Optimize() method: >>>>> >>>>> DEBUG_ONLY(C->set_phase_optimize_finished();) >> >> I created a new RFE for 16 [1] to remove the bailouts as done >> additionally in webrev.02. I will run some testing and then send out >> another review for [1]. >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8248529 >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best regards, >>>> Christian >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 6/26/20 7:38 AM, Christian Hagedorn wrote: >>>>>> Hi >>>>>> >>>>>> Please review the following patch: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8244724 >>>>>> http://cr.openjdk.java.net/~chagedorn/8244724/webrev.01/ >>>>>> >>>>>> The testcase contains many string concatinations. These are >>>>>> compiled by javac with -XDstringConcat=inline which creates a lot >>>>>> of StringBuilder objects and calls. As a result, we get a huge >>>>>> graph and eventually hit the live node limit assert during code >>>>>> generation when trying to create new nodes - either during >>>>>> PhaseCFG::build_cfg() or later in PhaseCFG::global_code_motion(). >>>>>> >>>>>> We could try to introduce estimates for them to bailout but that >>>>>> appears to be difficult to get right without being too pessimistic >>>>>> about it. But we need to be in order to avoid hitting the assert >>>>>> again by just modifying the testcase. >>>>>> >>>>>> Therefore, my suggestion is to completely skip the assert once the >>>>>> optimization phase is finished as we should not strictly care >>>>>> about the node limit anymore at this point in time and it does not >>>>>> really provide much help for finding bugs. >>>>>> >>>>>> A question remains, though, if we should also get rid of the >>>>>> remaining live node limit bailout checks in Compile::Code_Gen() >>>>>> like [1] as it appears to be a waste to go through all the >>>>>> optimization in the optimization phase to then bailout while >>>>>> generating code based only on the live node limit itself. What do >>>>>> you think about that? >>>>>> >>>>>> I also updated "<" into "<=" in the live node limit assert because >>>>>> we should be allowed to reach the limit but not go beyond. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> >>>>>> [1] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/8fd3e34e8379/src/hotspot/share/opto/coalesce.cpp#l236 >>>>>> From tobias.hartmann at oracle.com Tue Jun 30 17:06:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 30 Jun 2020 19:06:12 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> Message-ID: <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> Hi Felix, that looks good to me. I'll run some perf and correctness testing and report back once it finished. Best regards, Tobias On 30.06.20 14:35, Yangfei (Felix) wrote: > Hi again, > > Updated webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.03/ > Tier1-3 tested with fastdebug build both on aarch64-linux-gnu & x86_64-linux-gnu. > Newly added test case fail without the fix and pass otherwise. > Please take another look. > > Thanks, > Felix > >> -----Original Message----- >> From: Yangfei (Felix) >> Sent: Tuesday, June 30, 2020 4:00 PM >> To: 'Roland Westrelin' ; Tobias Hartmann >> ; hotspot-compiler-dev at openjdk.java.net >> Cc: guoge (A) ; zhouyong (V) >> >> Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 >> MergeMemNode::Ideal >> >> Hi, >> >>> -----Original Message----- >>> From: Roland Westrelin [mailto:rwestrel at redhat.com] >>> Sent: Tuesday, June 30, 2020 3:52 PM >>> To: Yangfei (Felix) ; Tobias Hartmann >>> ; hotspot-compiler-dev at openjdk.java.net >>> Cc: guoge (A) ; zhouyong (V) >>> >>> Subject: RE: RFR(S): 8243670: Unexpected test result caused by C2 >>> MergeMemNode::Ideal >>> >>> >>>> I added some extra code to check if we still have a chance to >>>> replace equivalent phis in MergeMemNode::Ideal. >>> >>> Is this a reliable way of finding a missed transformation opportunity? >>> Could it be that MergeMemNode::Ideal() runs before PhiNode::Identity() >>> has had a chance to run? that is the merge mem is first in the IGVN >>> queue >> >> Yes, I have just confirmed that. >> The follow logic in MergeMemNode::Ideal() could catch phis with the same >> input before PhiNode::Identity() had a change to do. >> >> 4608 // replace equivalent phis (unfortunately, they do not GVN together) >> 4609 if (new_mem != NULL && new_mem != new_base && >> 4610 new_mem->req() == phi_len && new_mem->in(0) == phi_reg) { >> 4611 if (new_mem->is_Phi()) { >> 4612 PhiNode* phi_mem = new_mem->as_Phi(); >> 4613 for (uint i = 1; i < phi_len; i++) { >> 4614 if (phi_base->in(i) != phi_mem->in(i)) { >> 4615 phi_mem = NULL; >> 4616 break; >> 4617 } >> 4618 } >> 4619 if (phi_mem != NULL) { >> 4620 // equivalent phi nodes; revert to the def >> 4621 new_mem = new_base; >> 4622 } >> 4623 } >> 4624 } >> >> So I think I will propose a new patch with this logic removed. It should be >> useless with our fix in PhiNode::Identity(). >> >>> and the Phi is next so the merge mem would first fail to optimize, >>> then the Phi would optimize, cause the merge mem to be enqueued again >>> and this time be properly optimized. >> > From boris.ulasevich at bell-sw.com Tue Jun 30 18:04:36 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 30 Jun 2020 21:04:36 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> Message-ID: <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Hi Claes, > Seems like the optimization is mostly effective, but not getting all the way. Good point about LHS, thanks! CmpL turned to be not canonized on the moment. I moved the optimization to CmpLNode::Ideal and transformations now works as follows: 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL 2. BoolNode::Ideal: Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) I applied your test to the benchmark. The result is: Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 14.288 ? 0.052 ns/op SkipIntToLongCast.skipCastTestRight? avgt??? 5? 14.338 ? 0.088 ns/op Updated webrev: http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b thanks, Boris On 26.06.2020 21:31, Claes Redestad wrote: > Hi Boris, > > this looks like a nice improvement! I just have some comments about the > micro. > > I was curious whether the optimization works when the constant is on > the LHS and added a variant of the micro to try that[1]. Results are > interesting (Intel Xeon): > > Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units > SkipIntToLongCast.skipCastTest?????? avgt??? 5? 30.937 ? 0.056 ns/op > SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 30.937 ? 0.140 ns/op > > With your patch: > Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units > SkipIntToLongCast.skipCastTest?????? avgt??? 5? 14.123 ? 0.035 ns/op > SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 17.420 ? 0.044 ns/op > > Seems like the optimization is mostly effective, but not getting all > the way. I wouldn't worry about it for this RFE, but perhaps something > to investigate in a follow-up. Feel free to include such a variant in > your patch though (no attribution necessary). > > The micro also stabilizes very quickly, so you might want to provide > some default tuning to keep runtime in check, e.g., something like: > > @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > @Measurement(iterations = 5, time = 1000, timeUnit = > TimeUnit.MILLISECONDS) > @Fork(3) > > Thanks! > > /Claes > > [1] > ??? @Benchmark > ??? public int skipCastTestLeft() { > ??????? for (int i = 0; i < ARRAYSIZE_L; i++) { > ??????????? if (ARRAYSIZE_L == intValues[i]) { > ??????????????? return i; > ??????????? } > ??????? } > ??????? return 0; > ??? } > > On 2020-06-26 17:05, Boris Ulasevich wrote: >> Hi all, >> >> Please review the change to eliminate the unnecessary i2l conversion >> for expressions like this: "if (intValue == 1L)". >> >> http://bugs.openjdk.java.net/browse/JDK-8248043 >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >> >> The provided benchmark shows performance boost on all platforms: >> - Intel Xeon: 32.705 --> 14.234 ns/op >> - arm64: 42.060 --> 25.456 ns/op >> - arm32: 618.763 --> 314.040 ns/op >> - ppc8:? 81.218 --> 63.026 ns/op >> >> Testing done: jtreg, jck. >> >> thanks, >> Boris From tom.rodriguez at oracle.com Tue Jun 30 18:05:30 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Jun 2020 11:05:30 -0700 Subject: RFR(M): 8248359: Update JVMCI In-Reply-To: <322BCF85-A5DA-4463-9250-A6744FABC9AF@oracle.com> References: <322BCF85-A5DA-4463-9250-A6744FABC9AF@oracle.com> Message-ID: <17c6acbd-772d-4470-e228-9ba6fec14d83@oracle.com> Looks good. tom Doug Simon wrote on 6/29/20 2:36 AM: > Please review this webrev that ports a number of improvements to JVMCI in JDK 16 from jvmci-8 : > > * Move C++ state and functionality associated with a HotSpotJVMCIRuntime Java object into its peer JVMCIRuntime C++ object (e.g., the _shared_library_javavm moves from being a static field in JVMCIEnv to an instance field in JVMCIRuntime). > * The management of JNI globals handles and Metadata handles passed to JVMCI Java code should also be moved to JVMCIRuntime. > * Introduce tracing of low frequency JVMCIRuntime lifetime events (e.g. JVMCIRuntime lifetime phase events) at less verbose trace level. > * Trace high frequency JVMCI events (e.g. CompilerToVM calls) at more verbose trace level. > * Detect unsupported jvmci.* system properties and use fuzzy matching for an error message suggesting closely matching supported properties. > * Improve javadoc for HotSpotJVMCIRuntime.attachCurrentThread. > * Reduce calls to JavaThread::current() in conjunction with JNIAccessMark. > > https://bugs.openjdk.java.net/browse/JDK-8248359 > https://dougxc.github.io/webrevs/8248359_16.01 > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From sandhya.viswanathan at intel.com Tue Jun 30 18:56:34 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 30 Jun 2020 18:56:34 +0000 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> Message-ID: Hi Yang, I have merged vectorIntrinsics with changes from panama/default. Hope this helps. Best Regards, Sandhya -----Original Message----- From: Yang Zhang Sent: Monday, June 29, 2020 12:49 AM To: Viswanathan, Sandhya ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Andrew, 1. Instructions that can be matched with NEON instructions directly. MulVB, SqrtVF and AbsV have been merged into jdk master already. 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. Hi Sandhya, Could you please help to manual merge panama vectorIntrinsics/vector-unstable to jdk master? So that I can update this patch based on latest jdk master. Regards Yang -----Original Message----- From: Viswanathan, Sandhya Sent: Thursday, June 25, 2020 3:04 AM To: Yang Zhang ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Andrew/Yang, We couldn?t propose Vector API to target in time for JDK 15 and hoping to do so early in JDK 16 timeframe. The implementation reviews on other components have made good progress. We have so far ok to PPT from (runtime, shared compiler changes, x86 backend). Java API implementation review is in progress. I wanted to check with you both if we have a go ahead from aarch64 backed point of view. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Yang Zhang Sent: Tuesday, May 26, 2020 7:59 PM To: Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes > But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? The new instructions can be classified as: 1. Instructions that can be matched with NEON instructions directly. MulVB and SqrtVF have been merged into jdk master already. The patch of AbsV is in review [1]. 2. Instructions that Jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. May I have a new patch for these? 3. Panama/Vector API specific instructions Such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. Regards Yang [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html -----Original Message----- From: Andrew Haley Sent: Tuesday, May 26, 2020 4:25 PM To: Yang Zhang ; Paul Sandoz Cc: hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; nd Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes On 25/05/2020 09:26, Yang Zhang wrote: > In jdk master, what we need to do is that writing m4 file for existing > vector instructions and placed them to a new file aarch64_neon.ad. > If no question, I will do it right away. I'm not entirely sure that such a change is necessary now. In particular, reorganizing the existing vector instructions is IMO excessive, but I admit that it might be an improvement. But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? It'd help if this was possible. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Tue Jun 30 19:01:32 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 30 Jun 2020 19:01:32 +0000 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200630001528.GA26652@pacoca> References: <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca> Message-ID: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> Thanks for the much better flag description. Looks good. Best regards, Martin > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" : > > ?Hello team, > > Here's the 2nd version, implementing the suggestions asked by Martin. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thank you!! > > Jose > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: >> Hi Jose, >> >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? >> >> Please update the Copyright year in vm_version_poc.hpp. >> >> I can?t test the change, but it looks good to me. >> >> Best regards, >> Martin >> >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" : >>> >>> ?Hello team! >>> >>> This patch introduces Power10 to OpenJDK and implements three new instructions: >>> - brh - byte-reverse halfword >>> - brw - byte-reverse word >>> - brd - byte-reverse doubleword >>> >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 >>> >>> Thanks for your review! >>> >>> Jose R. Ziviani From vladimir.kozlov at oracle.com Tue Jun 30 21:13:05 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Jun 2020 14:13:05 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: <13148df7-502b-bd1e-5aa0-fb7a9244cddc@oracle.com> Good optimization. Reviewed. Thanks, Vladimir On 6/30/20 11:04 AM, Boris Ulasevich wrote: > Hi Claes, > > > Seems like the optimization is mostly effective, but not getting all the way. > > Good point about LHS, thanks! CmpL turned to be not canonized on the moment. > I moved the optimization to CmpLNode::Ideal and transformations now works as follows: > 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > 2. BoolNode::Ideal: Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > > I applied your test to the benchmark. The result is: > Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units > SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 14.288 ? 0.052 ns/op > SkipIntToLongCast.skipCastTestRight? avgt??? 5? 14.338 ? 0.088 ns/op > > Updated webrev: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > > thanks, > Boris > > On 26.06.2020 21:31, Claes Redestad wrote: >> Hi Boris, >> >> this looks like a nice improvement! I just have some comments about the >> micro. >> >> I was curious whether the optimization works when the constant is on >> the LHS and added a variant of the micro to try that[1]. Results are >> interesting (Intel Xeon): >> >> Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units >> SkipIntToLongCast.skipCastTest?????? avgt??? 5? 30.937 ? 0.056 ns/op >> SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 30.937 ? 0.140 ns/op >> >> With your patch: >> Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units >> SkipIntToLongCast.skipCastTest?????? avgt??? 5? 14.123 ? 0.035 ns/op >> SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 17.420 ? 0.044 ns/op >> >> Seems like the optimization is mostly effective, but not getting all >> the way. I wouldn't worry about it for this RFE, but perhaps something >> to investigate in a follow-up. Feel free to include such a variant in >> your patch though (no attribution necessary). >> >> The micro also stabilizes very quickly, so you might want to provide >> some default tuning to keep runtime in check, e.g., something like: >> >> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) >> @Fork(3) >> >> Thanks! >> >> /Claes >> >> [1] >> ??? @Benchmark >> ??? public int skipCastTestLeft() { >> ??????? for (int i = 0; i < ARRAYSIZE_L; i++) { >> ??????????? if (ARRAYSIZE_L == intValues[i]) { >> ??????????????? return i; >> ??????????? } >> ??????? } >> ??????? return 0; >> ??? } >> >> On 2020-06-26 17:05, Boris Ulasevich wrote: >>> Hi all, >>> >>> Please review the change to eliminate the unnecessary i2l conversion >>> for expressions like this: "if (intValue == 1L)". >>> >>> http://bugs.openjdk.java.net/browse/JDK-8248043 >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >>> >>> The provided benchmark shows performance boost on all platforms: >>> - Intel Xeon: 32.705 --> 14.234 ns/op >>> - arm64: 42.060 --> 25.456 ns/op >>> - arm32: 618.763 --> 314.040 ns/op >>> - ppc8:? 81.218 --> 63.026 ns/op >>> >>> Testing done: jtreg, jck. >>> >>> thanks, >>> Boris > From claes.redestad at oracle.com Tue Jun 30 21:28:13 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 30 Jun 2020 23:28:13 +0200 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <13148df7-502b-bd1e-5aa0-fb7a9244cddc@oracle.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> <13148df7-502b-bd1e-5aa0-fb7a9244cddc@oracle.com> Message-ID: <15778367-9a55-8fd2-353b-21927650125d@oracle.com> +1 Maybe add tests for reversed variants to TestSkipLongToIntCast too? No need for a new webrev if you do. /Claes On 2020-06-30 23:13, Vladimir Kozlov wrote: > Good optimization. Reviewed. > > Thanks, > Vladimir > > On 6/30/20 11:04 AM, Boris Ulasevich wrote: >> Hi Claes, >> >> ?> Seems like the optimization is mostly effective, but not getting >> all the way. >> >> Good point about LHS, thanks! CmpL turned to be not canonized on the >> moment. >> I moved the optimization to CmpLNode::Ideal and transformations now >> works as follows: >> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >> 2. BoolNode::Ideal: >> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >> >> I applied your test to the benchmark. The result is: >> Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units >> SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 14.288 ? 0.052 ns/op >> SkipIntToLongCast.skipCastTestRight? avgt??? 5? 14.338 ? 0.088 ns/op >> >> Updated webrev: >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >> >> thanks, >> Boris >> >> On 26.06.2020 21:31, Claes Redestad wrote: >>> Hi Boris, >>> >>> this looks like a nice improvement! I just have some comments about the >>> micro. >>> >>> I was curious whether the optimization works when the constant is on >>> the LHS and added a variant of the micro to try that[1]. Results are >>> interesting (Intel Xeon): >>> >>> Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units >>> SkipIntToLongCast.skipCastTest?????? avgt??? 5? 30.937 ? 0.056 ns/op >>> SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 30.937 ? 0.140 ns/op >>> >>> With your patch: >>> Benchmark??????????????????????????? Mode? Cnt?? Score?? Error Units >>> SkipIntToLongCast.skipCastTest?????? avgt??? 5? 14.123 ? 0.035 ns/op >>> SkipIntToLongCast.skipCastTestLeft?? avgt??? 5? 17.420 ? 0.044 ns/op >>> >>> Seems like the optimization is mostly effective, but not getting all >>> the way. I wouldn't worry about it for this RFE, but perhaps something >>> to investigate in a follow-up. Feel free to include such a variant in >>> your patch though (no attribution necessary). >>> >>> The micro also stabilizes very quickly, so you might want to provide >>> some default tuning to keep runtime in check, e.g., something like: >>> >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >>> @Measurement(iterations = 5, time = 1000, timeUnit = >>> TimeUnit.MILLISECONDS) >>> @Fork(3) >>> >>> Thanks! >>> >>> /Claes >>> >>> [1] >>> ??? @Benchmark >>> ??? public int skipCastTestLeft() { >>> ??????? for (int i = 0; i < ARRAYSIZE_L; i++) { >>> ??????????? if (ARRAYSIZE_L == intValues[i]) { >>> ??????????????? return i; >>> ??????????? } >>> ??????? } >>> ??????? return 0; >>> ??? } >>> >>> On 2020-06-26 17:05, Boris Ulasevich wrote: >>>> Hi all, >>>> >>>> Please review the change to eliminate the unnecessary i2l conversion >>>> for expressions like this: "if (intValue == 1L)". >>>> >>>> http://bugs.openjdk.java.net/browse/JDK-8248043 >>>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >>>> >>>> The provided benchmark shows performance boost on all platforms: >>>> - Intel Xeon: 32.705 --> 14.234 ns/op >>>> - arm64: 42.060 --> 25.456 ns/op >>>> - arm32: 618.763 --> 314.040 ns/op >>>> - ppc8:? 81.218 --> 63.026 ns/op >>>> >>>> Testing done: jtreg, jck. >>>> >>>> thanks, >>>> Boris >>