From cjashfor at linux.ibm.com Tue Sep 1 00:17:08 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 31 Aug 2020 17:17:08 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: <5ac1ac19-af1c-4ee6-b478-873031710081@linux.ibm.com> On 8/27/20 8:07 AM, Doerr, Martin wrote: >>> I will use __attribute__ ((align(16))) instead of __vector, and make >> them arrays of 16 unsigned char. > Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb. > I don't remember what we had tried and why it didn't work as desired. I just now tried on gcc-7.5.0, declaring a __vector at 1, 2, 3, 8, 9, and 15 byte offsets in a struct, trying to force a misalignment, but the compiler realigned all of them on 16-byte boundaries. If someone decides to make the intrinsic work on AIX (big endian), and compiles with 7.3.1, I don't know what will happen w.r.t. alignment, so to be on the safe side, I will make the declarations 16-byte unsigned char arrays with an align attribute. Looking a bit deeper, I see that the __vector type comes out of the C preprocessor as: __attribute__((altivec(vector__))). It's part of the compiler's basic set of predefined macros, and can be seen using this command: % gcc -dM -E - < /dev/null | grep __vector #define __vector __attribute__((altivec(vector__))) Some information here: https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Type-Attributes.html I don't know if this is helpful or not, but it might answer part of your question about the meaning of __vector. Regards, - Corey From david.holmes at oracle.com Tue Sep 1 01:18:36 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 1 Sep 2020 11:18:36 +1000 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: References: Message-ID: Hi Yumin, On 1/09/2020 7:32 am, Yumin Qi wrote: > HI, > > ? Please review for > > ? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 > > ? webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ > > > ? Summary: After Solaris supported files removed from repo, there are > some remnants which needs cleaning up. Some comments are not correct, > and some refer to wrong files. Those changes are mostly okay but I have a few minor issues/suggestions below. > There is a flag seems only useful for > Sparc: UseRDPCForConstantTableBase, which got removed in this patch . Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. > Also in postaloc.cpp, the delay slot seems is only for sparc too, but I > am not sure about that. Most of the patch are in comment section. It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. Specific comments: src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp -// 64 bits items (sparc abi) even though java would only store +// 64 bits items even though java would only store Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? --- src/hotspot/cpu/arm/frame_arm.hpp (and other files) // The interpreter and adapters will extend the frame of the caller. // Since oopMaps are based on the sp of the caller before extension - // we need to know that value. However in order to compute the address - // of the return address we need the real "raw" sp. Since sparc already - // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's - // original sp we use that convention. + // we need to know that value. However in order to compute the return + // address we need the real "raw" sp. I think this is losing too much information as it no longer describes the convention. I would suggest: // The interpreter and adapters will extend the frame of the caller. // Since oopMaps are based on the sp of the caller before extension // we need to know that value. However in order to compute the address - // of the return address we need the real "raw" sp. Since sparc already - // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's - // original sp we use that convention. + // of the return address we need the real "raw" sp. By convention we + // use sp() to mean "raw" sp and unextended_sp() to mean the caller's + // original sp. --- src/hotspot/cpu/ppc/jniTypes_ppc.hpp - // stubGenerator_sparc.cpp) reverse the argument list constructed by + // stubGenerator_${CPU}.cpp) reverse the argument list constructed by Just replace sparc with ppc as done for other platforms. --- src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp - // This greatly simplifies the cases here compared to sparc. + // This greatly simplifies the cases here. Just delete the comment as there is nothing to compare simplicity or complexity against. --- src/hotspot/share/c1/c1_LIRGenerator.cpp - // In 64bit the type can be long, sparc doesn't have this assert + // In 64bit the type can be long // assert(offset.type()->tag() == intTag, "invalid type"); compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. --- src/hotspot/share/c1/c1_Runtime1.cpp - case handle_exception_nofpu_id: // Unused on sparc + case handle_exception_nofpu_id: // unused. the new comment is incorrect as this case is not unused. I suggest just deleting the comment. Thanks, David ----- > > > ? Tests passed tier1-4 > > > ? Thanks > > ? Yumin > From yumin.qi at oracle.com Tue Sep 1 04:56:32 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Mon, 31 Aug 2020 21:56:32 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: References: Message-ID: Hi, David ? Thanks for review. I will wait for compiler folks' comments. Thanks Yumin On 8/31/20 6:18 PM, David Holmes wrote: > Hi Yumin, > > On 1/09/2020 7:32 am, Yumin Qi wrote: >> HI, >> >> ?? Please review for >> >> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >> >> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >> >> >> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. > > Those changes are mostly okay but I have a few minor issues/suggestions below. > >> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . > > Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. > >> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. > > It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. > > Specific comments: > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp > > -// 64 bits items (sparc abi) even though java would only store > +// 64 bits items even though java would only store > > Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? > > --- > > src/hotspot/cpu/arm/frame_arm.hpp (and other files) > > ?? // The interpreter and adapters will extend the frame of the caller. > ?? // Since oopMaps are based on the sp of the caller before extension > -? // we need to know that value. However in order to compute the address > -? // of the return address we need the real "raw" sp. Since sparc already > -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's > -? // original sp we use that convention. > +? // we need to know that value. However in order to compute the return > +? // address we need the real "raw" sp. > > I think this is losing too much information as it no longer describes the convention. I would suggest: > > ?? // The interpreter and adapters will extend the frame of the caller. > ?? // Since oopMaps are based on the sp of the caller before extension > ?? // we need to know that value. However in order to compute the address > -? // of the return address we need the real "raw" sp. Since sparc already > -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's > -? // original sp we use that convention. > +? // of the return address we need the real "raw" sp. By convention we > +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's > +? // original sp. > > --- > > src/hotspot/cpu/ppc/jniTypes_ppc.hpp > > -? // stubGenerator_sparc.cpp) reverse the argument list constructed by > +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by > > Just replace sparc with ppc as done for other platforms. > > --- > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > > -? // This greatly simplifies the cases here compared to sparc. > +? // This greatly simplifies the cases here. > > Just delete the comment as there is nothing to compare simplicity or complexity against. > > --- > > src/hotspot/share/c1/c1_LIRGenerator.cpp > > -? // In 64bit the type can be long, sparc doesn't have this assert > +? // In 64bit the type can be long > ?? // assert(offset.type()->tag() == intTag, "invalid type"); > > compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. > > --- > > src/hotspot/share/c1/c1_Runtime1.cpp > > -? case handle_exception_nofpu_id:? // Unused on sparc > +? case handle_exception_nofpu_id:? // unused. > > the new comment is incorrect as this case is not unused. I suggest just deleting the comment. > > Thanks, > David > ----- > >> >> >> ?? Tests passed tier1-4 >> >> >> ?? Thanks >> >> ?? Yumin >> From eric.c.liu at arm.com Tue Sep 1 07:31:07 2020 From: eric.c.liu at arm.com (Eric Liu) Date: Tue, 1 Sep 2020 07:31:07 +0000 Subject: RFR(S): 8252407: Build failure with gcc-8+ and asan Message-ID: Hi all, Please review this simple change to fix some compile warnings. The newer gcc (gcc-8 or higher) would warn for calls to bounded string manipulation functions such as 'strncpy' that may either truncate the copied string or leave the destination unchanged. This patch fixed stringop-truncation warnings reported by gcc, some of them only appear when compiled with "--enable-asan". [TESTS] Jtreg: hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1. No new failure found. http://cr.openjdk.java.net/~qfeng/ericliu/jdk/stringop_trunc/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252407 Thanks, Eric From aph at redhat.com Tue Sep 1 07:52:03 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Sep 2020 08:52:03 +0100 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> Message-ID: On 31/08/2020 10:46, Yangfei (Felix) wrote: > >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Monday, August 31, 2020 4:41 PM >> On 31/08/2020 07:50, Yangfei (Felix) wrote: >>> >> >> This looks like a direct copy of the sha3-cecore.S file.You'll need Linaro to >> contribute it. I don't imagine they'll have any problem with that: they are >> OCA signatories > >> Also, given that we've got the assembly source file, why not just copy that >> into OpenJDK? I can't see the point rewriting it into the HotSpot assembler. > > Actually, we referenced the existing intrinsics implementation and > took a similar way. It looks strange to have one intrinsic that goes > differently. And we won't be able to emit this code on demand if we > go that different way. Some cpu does not support these special sha3 > instructions and thus does need this code at all. I think that's > one advantage of using a stub. OK. But you'll still need Linaro to contribute it to OpenJDK. We could ask Stuart to help with that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Tue Sep 1 10:53:12 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 1 Sep 2020 10:53:12 +0000 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> Message-ID: Hi, > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Tuesday, September 1, 2020 3:52 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net; core-libs-dev at openjdk.java.net > Cc: aarch64-port-dev at openjdk.java.net; Stuart Monteith > > Subject: Re: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 > accelerator/intrinsic > > On 31/08/2020 10:46, Yangfei (Felix) wrote: > > > >> -----Original Message----- > >> From: Andrew Haley [mailto:aph at redhat.com] > >> Sent: Monday, August 31, 2020 4:41 PM On 31/08/2020 07:50, Yangfei > >> (Felix) wrote: > >>> > >> > >> This looks like a direct copy of the sha3-cecore.S file.You'll need > >> Linaro to contribute it. I don't imagine they'll have any problem > >> with that: they are OCA signatories > > > >> Also, given that we've got the assembly source file, why not just > >> copy that into OpenJDK? I can't see the point rewriting it into the HotSpot > assembler. > > > > Actually, we referenced the existing intrinsics implementation and > > took a similar way. It looks strange to have one intrinsic that goes > > differently. And we won't be able to emit this code on demand if we > > go that different way. Some cpu does not support these special sha3 > > instructions and thus does need this code at all. I think that's one > > advantage of using a stub. > > OK. But you'll still need Linaro to contribute it to OpenJDK. We could ask > Stuart to help with that. Sure, I am happy if the original author of the assembly code or someone else from Linaro could help here. I wasn't aware there was such an requirement here given that assembly code is licensed under GPL. Should we separate the patch into two parts: changes for the shared code part and the aarch64 port-specific changes? Thanks, Felix From adinn at redhat.com Tue Sep 1 10:58:47 2020 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 1 Sep 2020 11:58:47 +0100 Subject: RFR(S) 8252311: AArch64: save two words in itable lookup stub In-Reply-To: References: Message-ID: <6f10e948-2c8c-5694-1d4c-d6e3cafc6de7@redhat.com> On 30/08/2020 18:18, Boris Ulasevich wrote: > The interface method lookup stub becomes hot when interface calls > are performed frequently. The stub assembly code can be made > shorter (132->124 bytes) by using a pre-increment instruction variant. > > http://cr.openjdk.java.net/~bulasevich/8252311/webrev.00 > http://bugs.openjdk.java.net/browse/JDK-8252311 > > The benchmark [1] shows [2] performance and icache loads improvement: > performance: 6165206 -> 6307798 ops/s > L1-icache-loads: 307.271 -> 274.604 You really need to be more careful when making claims about improvements. The performance figures are not convincing when you look at the associated error ranges. This could just be noise. The icache load figure is a tad more respectable but not greatly so. That said, this change certainly looks harmless and may well do good because it clearly cuts down code size and executed instruction count. There is no great need for performance figures (real or spurious) to justify a change that is this straightforward. I'm happy to approve it but perhaps Andrew Haley would like to comment. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From jamsheed.c.m at oracle.com Tue Sep 1 12:36:17 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Tue, 1 Sep 2020 18:06:17 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: <03df9364-817d-04d6-6434-80be93a66526@oracle.com> References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> <03df9364-817d-04d6-6434-80be93a66526@oracle.com> Message-ID: Hi David, I reworked the patch, revised webrev here: http://cr.openjdk.java.net/~jcm/8249451/webrev.01/ In addition I moved UnlockFlagSaver fs(this) to more local scope. also removed changes done for JDK-8246727, as it will be separately handled by the bug. Testing: injected and tested async exceptions randomly at compilation request path and deopt path. Best regards, Jamsheed On 24/08/2020 11:06, Jamsheed C M wrote: > Hi David, > > Thank you for the review and feedback. Agree on all of them. I will > rework and get back. > > On 10/08/2020 07:33, David Holmes wrote: >> Hi Jamsheed, >> >> On 6/08/2020 10:07 pm, Jamsheed C M wrote: >>> Hi all, >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 >>> >>> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ >> >> Thanks for tackling this messy issue. Overall I like the use of TRAPS >> to more clearly document which methods can return with an exception >> pending. I think there are some problems with the proposed changes. >> I'll start with those comments and then move on to more general >> comments. >> >> src/hotspot/share/utilities/exceptions.cpp >> src/hotspot/share/utilities/exceptions.hpp >> >> I don't think the changes here are correct or safe in general. >> >> First, adding the new macro and function to only clear non-async >> exceptions is fine itself. But naming wise the fact only non-async >> exceptions are cleared should be evident, and there is no "check" >> involved (in the sense of the existing CHECK_ macros) so I suggest: >> >> s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ >> s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ >> > Ok >> But changing the existing CHECK_AND_CLEAR macros to now leave async >> exceptions pending seems potentially dangerous as calling code may >> not be prepared for there to now be a pending exception. For example >> the use in thread.cpp: >> >> ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); >> ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); >> >> get_java_runtime_name() is currently guaranteed to clear all >> exceptions, so all the other code is known to be safe to call. But >> that would no longer be true. That said, this is VM initialization >> code and an async exception is impossible at this stage. >> >> I think I would rather see CHECK_AND_CLEAR left as-is, and an actual >> CHECK_AND_CLEAR_NONASYNC introduced for those users of >> CHECK_AND_CLEAR that can encounter async exceptions and which should >> not clear them. >> >> +?? if >> (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && >> +?????? _pending_exception->klass() != >> SystemDictionary::InternalError_klass()) { >> > Ok >> Flagging all InternalErrors as async exceptions is probably also not >> correct. I don't see a good solution to this at the moment. I think >> we would need to introduce a new subclass of InternalError for the >> unsafe access error case**. Now it may be that all the other >> InternalError usages are "impossible" in the context of where the new >> macros are to be used, but that is very difficult to establish or >> assert. >> >> ** Or perhaps we could inject a field that allows the VM to identify >> instances related to unsafe access errors ... Ideally of course these >> unsafe access errors would be distinct from the async exception >> mechanism - something I would still like to pursue. >> > Ok >> --- >> >> General comments ... >> >> There is a general change from "JavaThread* thread" to "Thread* >> THREAD" (or TRAPS) to allow the use of the CHECK macros. This is >> unfortunate because the fact the thread is restricted to being a >> JavaThread is no longer evident in the method signatures. That is a >> flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the >> methods no longer take a JavaThread* arg, they should assert that >> THREAD->is_Java_thread(). I will also look at an RFE to have >> as_JavaThread() to avoid the need for separate assertion checks >> before casting from "Thread*" to "JavaThread*". >> > Ok >> Note there's no need to use CHECK when the enclosing method is going >> to return immediately after the call that contains the CHECK. It just >> adds unnecessary checking of the exception state. The use of TRAPS >> shows that the methods may return with an exception pending. I've >> flagged all such occurrences I spotted below. >> > Ok >> --- >> >> +?? // Only metaspace OOM is expected. no Java code executed. >> >> Nit: s/no/No >> >> >> src/hotspot/share/compiler/compilationPolicy.cpp >> >> >> ?410?????? method_invocation_event(method, CHECK_NULL); >> ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, >> comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK); >> >> Nit: there's no need to use CHECK here. >> >> --- >> >> src/hotspot/share/compiler/tieredThresholdPolicy.cpp >> >> ?504???? method_invocation_event(method, inlinee, comp_level, nm, >> CHECK_NULL); >> ?570???????? compile(mh, bci, CompLevel_simple, CHECK); >> ?581???????? compile(mh, bci, CompLevel_simple, CHECK); >> ?595???? CompileBroker::compile_method(mh, bci, level, mh, hot_count, >> CompileTask::Reason_Tiered, CHECK); >> 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK); >> >> Nit: there's no need to use CHECK here. >> >> 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, >> Thread* THREAD) { >> >> Thank you for correcting this misuse of the THREAD name on a >> JavaThread* type. >> >> --- >> >> src/hotspot/share/interpreter/linkResolver.cpp >> >> ?128?? CompilationPolicy::compile_if_required(selected_method, CHECK); >> >> Nit: there's no need to use CHECK here. >> >> --- >> >> src/hotspot/share/jvmci/compilerRuntime.cpp >> >> ?260???? CompilationPolicy::policy()->event(emh, mh, >> InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); >> ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, >> mh, branch_bci, target_bci, CompLevel_aot, cm, CHECK); >> >> Nit: there's no need to use CHECK here. >> >> --- >> >> src/hotspot/share/jvmci/jvmciRuntime.cpp >> >> ?102???????? // Donot clear probable async exceptions. >> >> typo: s/Donot/Do not/ >> >> --- >> >> src/hotspot/share/runtime/deoptimization.cpp >> >> 1686 void Deoptimization::load_class_by_index(const >> constantPoolHandle& constant_pool, int index) { >> >> This method should be declared with TRAPS now. >> >> 1693???? // Donot clear probable Async Exceptions. >> >> typo: s/Donot/Do not/ >> >> > Ok >>> testing : mach1-5(links in jbs) >> >> There is very little existing testing that will actually test the key >> changes you have made here. You will need to do direct >> fault-injection testing anywhere you now allow async exceptions to >> remain, to see if the calling code can tolerate that. It will be >> difficult to test thoroughly. >> > Ok >> Thanks again for tackling this difficult problem! > > Best regards, > > Jamsheed > >> >> David >> ----- >> >>> >>> While working on JDK-8246381 it was noticed that compilation request >>> path clears all exceptions(including async) and doesn't propagate[1]. >>> >>> Fix: patch restores the propagation behavior for the probable async >>> exceptions. >>> >>> Compilation request path propagate exception as in [2]. MDO and >>> MethodCounter doesn't expect any exception other than metaspace >>> OOM(added comments). >>> >>> Deoptimization path doesn't clear probable async exceptions and take >>> unpack_exception path for non uncommontraps. >>> >>> Added java_lang_InternalError to well known classes. >>> >>> Request for review. >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> [1] w.r.t changes done for JDK-7131259 >>> >>> [2] >>> >>> ???? (a) >>> ???? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp >>> ?????? | >>> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp >>> ????????? | >>> ?????????? ------ compileBroker.cpp >>> >>> ???? (b) >>> ???? Xcomp versions >>> ???? ------> compilationPolicy.cpp >>> ??????? | >>> ???????? ------> compileBroker.cpp >>> >>> ???? (c) >>> >>> ???? Direct call to? compile_method in compileBroker.cpp >>> >>> ???? JVMCI bootstrap, whitebox, replayCompile. >>> >>> From aph at redhat.com Tue Sep 1 14:44:07 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Sep 2020 15:44:07 +0100 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> Message-ID: <95965aeb-9d97-3b27-e684-967b6155eb34@redhat.com> On 01/09/2020 11:53, Yangfei (Felix) wrote: > Sure, I am happy if the original author of the assembly code or someone else from Linaro could help here. > I wasn't aware there was such an requirement here given that assembly code is licensed under GPL. There sure is. All code must be contributed by its owner and put on the cr.openjdk site. Especially GPL code. > Should we separate the patch into two parts: changes for the shared code part and the aarch64 port-specific changes? I think not. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From filipp.zhinkin at gmail.com Tue Sep 1 15:29:00 2020 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Tue, 1 Sep 2020 18:29:00 +0300 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash Message-ID: Hi, Test8202414 crashes on ARM32 while writing to memory using an unaligned address. ARM32 supports unaligned memory accesses for some load/store instructions under certain conditions, but LDRD (which is used when we're calling Unsafe::putLong) is always causing alignment fault when called with an unaligned address [1]. The fix is simply skipping the test execution if a platform does not support unaligned memory accesses. Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 Webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, ?A3.2.1 Unaligned data access https://developer.arm.com/documentation/ddi0406/cd Thanks, Filipp. From aph at redhat.com Tue Sep 1 15:48:02 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Sep 2020 16:48:02 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> Message-ID: <11530b87-8124-19ca-936b-16dec5994411@redhat.com> On 31/08/2020 15:28, Dmitry Chuyko wrote: > Here is another version of intrinsics. It is an extension of webrev.03. > Additional thing is that constants 0 and 1 that are used internally by > intrinics are constructed as nodes. This is somehow similar to what is > done for passing pointers to tables. > > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/ > results: > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/signum-facgt_ir-copysign.ods Hi, Thank you. That certainly looks better. It's unfortunate that signum doesn't help in all cases, but I'm happy that we have something positive in general. Certainly the code looks nice. I'm still rather baffled that an intrinsification of copySign actually makes things much worse on blackhole on Neoverse N1, but it doesn't really matter because the copySign intrinsic isn't enabled by default. So please go ahead with this version. Having said all of that, it's a fairly minor improvement for some considerable complexity. And it depends terribly on the micorarchitecture of a particular part, albeit an important one. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From igor.ignatyev at oracle.com Tue Sep 1 16:46:36 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 1 Sep 2020 09:46:36 -0700 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: Message-ID: Hi Filipp, 1st of all, welcome back! it would be better to throw jtreg.SkippedException at L#46 so jtreg will reported the test as skipped (as opposed to just passed). alternative, you could use '@requires vm.simpleArch != "arm"' to exclude the test from arm32 execution. Thanks, -- Igor > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin wrote: > > Hi, > > Test8202414 crashes on ARM32 while writing to memory using an unaligned > address. > ARM32 supports unaligned memory accesses for some load/store instructions > under certain conditions, but LDRD (which is used when we're calling > Unsafe::putLong) is always causing alignment fault when called with an > unaligned address [1]. > > The fix is simply skipping the test execution if a platform does not > support unaligned memory accesses. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 > Webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ > > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, ?A3.2.1 > Unaligned data access https://developer.arm.com/documentation/ddi0406/cd > > Thanks, > Filipp. From vladimir.kozlov at oracle.com Tue Sep 1 17:35:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Sep 2020 10:35:13 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: References: Message-ID: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> On 8/31/20 6:18 PM, David Holmes wrote: > Hi Yumin, > > On 1/09/2020 7:32 am, Yumin Qi wrote: >> HI, >> >> ?? Please review for >> >> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >> >> ?? webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >> >> >> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some >> comments are not correct, and some refer to wrong files. > > Those changes are mostly okay but I have a few minor issues/suggestions below. > >> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . > > Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the > pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC > removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it > can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and > we go through the appropriate deprecate/obsolete/expire process. The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method from Node::pinned() which returns 'false' too. And I agree with David that it should be done separately because it is product flag. > >> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are >> in comment section. > > It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. Please, leave the code as it is. As David said it is about normal spill slots for all platforms. I am not sure it is SPARC specific currently with all platforms OpenJDK supports. If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit the assert. > > Specific comments: > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp > > -// 64 bits items (sparc abi) even though java would only store > +// 64 bits items even though java would only store > > Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? > > --- > > src/hotspot/cpu/arm/frame_arm.hpp (and other files) > > ?? // The interpreter and adapters will extend the frame of the caller. > ?? // Since oopMaps are based on the sp of the caller before extension > -? // we need to know that value. However in order to compute the address > -? // of the return address we need the real "raw" sp. Since sparc already > -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's > -? // original sp we use that convention. > +? // we need to know that value. However in order to compute the return > +? // address we need the real "raw" sp. > > I think this is losing too much information as it no longer describes the convention. I would suggest: > > ?? // The interpreter and adapters will extend the frame of the caller. > ?? // Since oopMaps are based on the sp of the caller before extension > ?? // we need to know that value. However in order to compute the address > -? // of the return address we need the real "raw" sp. Since sparc already > -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's > -? // original sp we use that convention. > +? // of the return address we need the real "raw" sp. By convention we > +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's > +? // original sp. > > --- > > src/hotspot/cpu/ppc/jniTypes_ppc.hpp > > -? // stubGenerator_sparc.cpp) reverse the argument list constructed by > +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by > > Just replace sparc with ppc as done for other platforms. > > --- > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp > > -? // This greatly simplifies the cases here compared to sparc. > +? // This greatly simplifies the cases here. > > Just delete the comment as there is nothing to compare simplicity or complexity against. > > --- > > src/hotspot/share/c1/c1_LIRGenerator.cpp > > -? // In 64bit the type can be long, sparc doesn't have this assert > +? // In 64bit the type can be long > ?? // assert(offset.type()->tag() == intTag, "invalid type"); > > compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. Thanks, Vladimir K > > --- > > src/hotspot/share/c1/c1_Runtime1.cpp > > -? case handle_exception_nofpu_id:? // Unused on sparc > +? case handle_exception_nofpu_id:? // unused. > > the new comment is incorrect as this case is not unused. I suggest just deleting the comment. > > Thanks, > David > ----- > >> >> >> ?? Tests passed tier1-4 >> >> >> ?? Thanks >> >> ?? Yumin >> From yumin.qi at oracle.com Tue Sep 1 17:59:07 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 1 Sep 2020 10:59:07 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> Message-ID: <7ee63f44-18e3-9567-a098-857c98638bdc@oracle.com> HI, Vladimir ? Thanks for review! On 9/1/20 10:35 AM, Vladimir Kozlov wrote: > On 8/31/20 6:18 PM, David Holmes wrote: >> Hi Yumin, >> >> On 1/09/2020 7:32 am, Yumin Qi wrote: >>> HI, >>> >>> ?? Please review for >>> >>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>> >>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>> >>> >>> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. >> >> Those changes are mostly okay but I have a few minor issues/suggestions below. >> >>> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . >> >> Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. > > The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. > It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method from Node::pinned() which returns 'false' too. > > And I agree with David that it should be done separately because it is product flag. > I will file a bug for this be handled separately. >> >>> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. >> >> It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. > > Please, leave the code as it is. As David said it is about normal spill slots for all platforms. > I am not sure it is SPARC specific currently with all platforms OpenJDK supports. > If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit the assert. > OK, I will keep it as it was. I will file a RFE and assign it to the right group (compiler) for further investigation. >> >> Specific comments: >> >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >> >> -// 64 bits items (sparc abi) even though java would only store >> +// 64 bits items even though java would only store >> >> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? >> >> --- >> >> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >> >> ??? // The interpreter and adapters will extend the frame of the caller. >> ??? // Since oopMaps are based on the sp of the caller before extension >> -? // we need to know that value. However in order to compute the address >> -? // of the return address we need the real "raw" sp. Since sparc already >> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >> -? // original sp we use that convention. >> +? // we need to know that value. However in order to compute the return >> +? // address we need the real "raw" sp. >> >> I think this is losing too much information as it no longer describes the convention. I would suggest: >> >> ??? // The interpreter and adapters will extend the frame of the caller. >> ??? // Since oopMaps are based on the sp of the caller before extension >> ??? // we need to know that value. However in order to compute the address >> -? // of the return address we need the real "raw" sp. Since sparc already >> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >> -? // original sp we use that convention. >> +? // of the return address we need the real "raw" sp. By convention we >> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >> +? // original sp. >> >> --- >> >> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >> >> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >> >> Just replace sparc with ppc as done for other platforms. >> >> --- >> >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >> >> -? // This greatly simplifies the cases here compared to sparc. >> +? // This greatly simplifies the cases here. >> >> Just delete the comment as there is nothing to compare simplicity or complexity against. >> >> --- >> >> src/hotspot/share/c1/c1_LIRGenerator.cpp >> >> -? // In 64bit the type can be long, sparc doesn't have this assert >> +? // In 64bit the type can be long >> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >> >> compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. > > Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. > OK, will remove? them all. Thanks Yumin > Thanks, > Vladimir K > >> >> --- >> >> src/hotspot/share/c1/c1_Runtime1.cpp >> >> -? case handle_exception_nofpu_id:? // Unused on sparc >> +? case handle_exception_nofpu_id:? // unused. >> >> the new comment is incorrect as this case is not unused. I suggest just deleting the comment. >> >> Thanks, >> David >> ----- >> >>> >>> >>> ?? Tests passed tier1-4 >>> >>> >>> ?? Thanks >>> >>> ?? Yumin >>> From filipp.zhinkin at gmail.com Tue Sep 1 19:48:06 2020 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Tue, 1 Sep 2020 22:48:06 +0300 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: Message-ID: Hi Igor, On Tue, 1 Sep 2020 at 19:46, Igor Ignatyev wrote: > Hi Filipp, > > 1st of all, welcome back! > thanks! > > it would be better to throw jtreg.SkippedException at L#46 so jtreg will > reported the test as skipped (as opposed to just passed). Thanks, I'll update Test8202414 as well as compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess (which also skips execution the same way). > alternative, you could use '@requires vm.simpleArch != "arm"' to exclude > the test from arm32 execution. > I was thinking about adding something like vm.unalignedAccess.enabled, but it seems to be too complicated solution for two tests (Test8202414 and compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess). I don't want to use 'simpleArch', because if some new platform missing unaligned access support will be added in the future then someone will have to spend time to find out why a test crashes. Thanks, Filipp. > > Thanks, > -- Igor > > > > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin > wrote: > > > > Hi, > > > > Test8202414 crashes on ARM32 while writing to memory using an unaligned > > address. > > ARM32 supports unaligned memory accesses for some load/store instructions > > under certain conditions, but LDRD (which is used when we're calling > > Unsafe::putLong) is always causing alignment fault when called with an > > unaligned address [1]. > > > > The fix is simply skipping the test execution if a platform does not > > support unaligned memory accesses. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 > > Webrev: > http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ > > > > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, > ?A3.2.1 > > Unaligned data access https://developer.arm.com/documentation/ddi0406/cd > > > > Thanks, > > Filipp. > > From yumin.qi at oracle.com Tue Sep 1 22:40:13 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 1 Sep 2020 15:40:13 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> Message-ID: <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> HI, Vladimir and David ?? I have updated new webrev at: http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-02/ ?? Filed two issues to address your concern separately: ?? 1) 8252681: Retire flag UseRDPCForConstantTableBase after solaris removal https://bugs.openjdk.java.net/browse/JDK-8252681 ? 2) 8252682: investigate PhaseChaitin::post_allocate_copy_removal after solaris removal https://bugs.openjdk.java.net/browse/JDK-8252682 ? So following three files leave no change: share/opto/c2_globals.hpp share/opto/machnode.hpp share/opto/postaloc.cpp ? Also update some copyright year for several files. Thanks Yumin On 9/1/20 10:35 AM, Vladimir Kozlov wrote: > On 8/31/20 6:18 PM, David Holmes wrote: >> Hi Yumin, >> >> On 1/09/2020 7:32 am, Yumin Qi wrote: >>> HI, >>> >>> ?? Please review for >>> >>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>> >>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>> >>> >>> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. >> >> Those changes are mostly okay but I have a few minor issues/suggestions below. >> >>> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . >> >> Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. > > The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. > It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method from Node::pinned() which returns 'false' too. > > And I agree with David that it should be done separately because it is product flag. > >> >>> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. >> >> It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. > > Please, leave the code as it is. As David said it is about normal spill slots for all platforms. > I am not sure it is SPARC specific currently with all platforms OpenJDK supports. > If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit the assert. > >> >> Specific comments: >> >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >> >> -// 64 bits items (sparc abi) even though java would only store >> +// 64 bits items even though java would only store >> >> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? >> >> --- >> >> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >> >> ??? // The interpreter and adapters will extend the frame of the caller. >> ??? // Since oopMaps are based on the sp of the caller before extension >> -? // we need to know that value. However in order to compute the address >> -? // of the return address we need the real "raw" sp. Since sparc already >> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >> -? // original sp we use that convention. >> +? // we need to know that value. However in order to compute the return >> +? // address we need the real "raw" sp. >> >> I think this is losing too much information as it no longer describes the convention. I would suggest: >> >> ??? // The interpreter and adapters will extend the frame of the caller. >> ??? // Since oopMaps are based on the sp of the caller before extension >> ??? // we need to know that value. However in order to compute the address >> -? // of the return address we need the real "raw" sp. Since sparc already >> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >> -? // original sp we use that convention. >> +? // of the return address we need the real "raw" sp. By convention we >> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >> +? // original sp. >> >> --- >> >> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >> >> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >> >> Just replace sparc with ppc as done for other platforms. >> >> --- >> >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >> >> -? // This greatly simplifies the cases here compared to sparc. >> +? // This greatly simplifies the cases here. >> >> Just delete the comment as there is nothing to compare simplicity or complexity against. >> >> --- >> >> src/hotspot/share/c1/c1_LIRGenerator.cpp >> >> -? // In 64bit the type can be long, sparc doesn't have this assert >> +? // In 64bit the type can be long >> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >> >> compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. > > Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. > > Thanks, > Vladimir K > >> >> --- >> >> src/hotspot/share/c1/c1_Runtime1.cpp >> >> -? case handle_exception_nofpu_id:? // Unused on sparc >> +? case handle_exception_nofpu_id:? // unused. >> >> the new comment is incorrect as this case is not unused. I suggest just deleting the comment. >> >> Thanks, >> David >> ----- >> >>> >>> >>> ?? Tests passed tier1-4 >>> >>> >>> ?? Thanks >>> >>> ?? Yumin >>> From vladimir.kozlov at oracle.com Tue Sep 1 23:48:58 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Sep 2020 16:48:58 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> Message-ID: Looks good. Thanks, Vladimir K On 9/1/20 3:40 PM, Yumin Qi wrote: > HI, Vladimir and David > > ?? I have updated new webrev at: http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-02/ > > ?? Filed two issues to address your concern separately: > > ?? 1) 8252681: Retire flag UseRDPCForConstantTableBase after solaris removal > > https://bugs.openjdk.java.net/browse/JDK-8252681 > > ? 2) 8252682: investigate PhaseChaitin::post_allocate_copy_removal after solaris removal > > https://bugs.openjdk.java.net/browse/JDK-8252682 > > > ? So following three files leave no change: > > share/opto/c2_globals.hpp > > share/opto/machnode.hpp > > share/opto/postaloc.cpp > > ? Also update some copyright year for several files. > > > Thanks > > Yumin > > > On 9/1/20 10:35 AM, Vladimir Kozlov wrote: >> On 8/31/20 6:18 PM, David Holmes wrote: >>> Hi Yumin, >>> >>> On 1/09/2020 7:32 am, Yumin Qi wrote: >>>> HI, >>>> >>>> ?? Please review for >>>> >>>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>>> >>>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>>> >>>> >>>> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some >>>> comments are not correct, and some refer to wrong files. >>> >>> Those changes are mostly okay but I have a few minor issues/suggestions below. >>> >>>> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . >>> >>> Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the >>> pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the >>> SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether >>> it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out >>> and we go through the appropriate deprecate/obsolete/expire process. >> >> The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. >> It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method >> from Node::pinned() which returns 'false' too. >> >> And I agree with David that it should be done separately because it is product flag. >> >>> >>>> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch >>>> are in comment section. >>> >>> It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. >> >> Please, leave the code as it is. As David said it is about normal spill slots for all platforms. >> I am not sure it is SPARC specific currently with all platforms OpenJDK supports. >> If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit >> the assert. >> >>> >>> Specific comments: >>> >>> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >>> >>> -// 64 bits items (sparc abi) even though java would only store >>> +// 64 bits items even though java would only store >>> >>> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? >>> >>> --- >>> >>> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >>> >>> ??? // The interpreter and adapters will extend the frame of the caller. >>> ??? // Since oopMaps are based on the sp of the caller before extension >>> -? // we need to know that value. However in order to compute the address >>> -? // of the return address we need the real "raw" sp. Since sparc already >>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>> -? // original sp we use that convention. >>> +? // we need to know that value. However in order to compute the return >>> +? // address we need the real "raw" sp. >>> >>> I think this is losing too much information as it no longer describes the convention. I would suggest: >>> >>> ??? // The interpreter and adapters will extend the frame of the caller. >>> ??? // Since oopMaps are based on the sp of the caller before extension >>> ??? // we need to know that value. However in order to compute the address >>> -? // of the return address we need the real "raw" sp. Since sparc already >>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>> -? // original sp we use that convention. >>> +? // of the return address we need the real "raw" sp. By convention we >>> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >>> +? // original sp. >>> >>> --- >>> >>> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >>> >>> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >>> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >>> >>> Just replace sparc with ppc as done for other platforms. >>> >>> --- >>> >>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >>> >>> -? // This greatly simplifies the cases here compared to sparc. >>> +? // This greatly simplifies the cases here. >>> >>> Just delete the comment as there is nothing to compare simplicity or complexity against. >>> >>> --- >>> >>> src/hotspot/share/c1/c1_LIRGenerator.cpp >>> >>> -? // In 64bit the type can be long, sparc doesn't have this assert >>> +? // In 64bit the type can be long >>> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >>> >>> compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. >> >> Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. >> >> Thanks, >> Vladimir K >> >>> >>> --- >>> >>> src/hotspot/share/c1/c1_Runtime1.cpp >>> >>> -? case handle_exception_nofpu_id:? // Unused on sparc >>> +? case handle_exception_nofpu_id:? // unused. >>> >>> the new comment is incorrect as this case is not unused. I suggest just deleting the comment. >>> >>> Thanks, >>> David >>> ----- >>> >>>> >>>> >>>> ?? Tests passed tier1-4 >>>> >>>> >>>> ?? Thanks >>>> >>>> ?? Yumin >>>> From yumin.qi at oracle.com Wed Sep 2 03:12:45 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 1 Sep 2020 20:12:45 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> Message-ID: Hi, Vladimir ? Thanks for re-review! Yumin On 9/1/20 4:48 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 9/1/20 3:40 PM, Yumin Qi wrote: >> HI, Vladimir and David >> >> ??? I have updated new webrev at: http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-02/ >> >> ??? Filed two issues to address your concern separately: >> >> ??? 1) 8252681: Retire flag UseRDPCForConstantTableBase after solaris removal >> >> https://bugs.openjdk.java.net/browse/JDK-8252681 >> >> ?? 2) 8252682: investigate PhaseChaitin::post_allocate_copy_removal after solaris removal >> >> https://bugs.openjdk.java.net/browse/JDK-8252682 >> >> >> ?? So following three files leave no change: >> >> share/opto/c2_globals.hpp >> >> share/opto/machnode.hpp >> >> share/opto/postaloc.cpp >> >> ?? Also update some copyright year for several files. >> >> >> Thanks >> >> Yumin >> >> >> On 9/1/20 10:35 AM, Vladimir Kozlov wrote: >>> On 8/31/20 6:18 PM, David Holmes wrote: >>>> Hi Yumin, >>>> >>>> On 1/09/2020 7:32 am, Yumin Qi wrote: >>>>> HI, >>>>> >>>>> ?? Please review for >>>>> >>>>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>>>> >>>>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>>>> >>>>> >>>>> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. >>>> >>>> Those changes are mostly okay but I have a few minor issues/suggestions below. >>>> >>>>> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . >>>> >>>> Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. >>> >>> The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. >>> It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method from Node::pinned() which returns 'false' too. >>> >>> And I agree with David that it should be done separately because it is product flag. >>> >>>> >>>>> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. >>>> >>>> It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. >>> >>> Please, leave the code as it is. As David said it is about normal spill slots for all platforms. >>> I am not sure it is SPARC specific currently with all platforms OpenJDK supports. >>> If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit the assert. >>> >>>> >>>> Specific comments: >>>> >>>> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >>>> >>>> -// 64 bits items (sparc abi) even though java would only store >>>> +// 64 bits items even though java would only store >>>> >>>> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? >>>> >>>> --- >>>> >>>> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >>>> >>>> ??? // The interpreter and adapters will extend the frame of the caller. >>>> ??? // Since oopMaps are based on the sp of the caller before extension >>>> -? // we need to know that value. However in order to compute the address >>>> -? // of the return address we need the real "raw" sp. Since sparc already >>>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> -? // original sp we use that convention. >>>> +? // we need to know that value. However in order to compute the return >>>> +? // address we need the real "raw" sp. >>>> >>>> I think this is losing too much information as it no longer describes the convention. I would suggest: >>>> >>>> ??? // The interpreter and adapters will extend the frame of the caller. >>>> ??? // Since oopMaps are based on the sp of the caller before extension >>>> ??? // we need to know that value. However in order to compute the address >>>> -? // of the return address we need the real "raw" sp. Since sparc already >>>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> -? // original sp we use that convention. >>>> +? // of the return address we need the real "raw" sp. By convention we >>>> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> +? // original sp. >>>> >>>> --- >>>> >>>> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >>>> >>>> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >>>> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >>>> >>>> Just replace sparc with ppc as done for other platforms. >>>> >>>> --- >>>> >>>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >>>> >>>> -? // This greatly simplifies the cases here compared to sparc. >>>> +? // This greatly simplifies the cases here. >>>> >>>> Just delete the comment as there is nothing to compare simplicity or complexity against. >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_LIRGenerator.cpp >>>> >>>> -? // In 64bit the type can be long, sparc doesn't have this assert >>>> +? // In 64bit the type can be long >>>> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >>>> >>>> compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. >>> >>> Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. >>> >>> Thanks, >>> Vladimir K >>> >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_Runtime1.cpp >>>> >>>> -? case handle_exception_nofpu_id:? // Unused on sparc >>>> +? case handle_exception_nofpu_id:? // unused. >>>> >>>> the new comment is incorrect as this case is not unused. I suggest just deleting the comment. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>> ?? Tests passed tier1-4 >>>>> >>>>> >>>>> ?? Thanks >>>>> >>>>> ?? Yumin >>>>> From david.holmes at oracle.com Wed Sep 2 04:07:17 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 2 Sep 2020 14:07:17 +1000 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> Message-ID: <7858cdf4-25cd-4e14-ec60-b3c256f59c9d@oracle.com> Hi Yumin, Update looks good. Thanks for filing the other RFEs. David On 2/09/2020 8:40 am, Yumin Qi wrote: > HI, Vladimir and David > > ?? I have updated new webrev at: > http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-02/ > > ?? Filed two issues to address your concern separately: > > ?? 1) 8252681: Retire flag UseRDPCForConstantTableBase after solaris > removal > > https://bugs.openjdk.java.net/browse/JDK-8252681 > > ? 2) 8252682: investigate PhaseChaitin::post_allocate_copy_removal > after solaris removal > > https://bugs.openjdk.java.net/browse/JDK-8252682 > > > ? So following three files leave no change: > > share/opto/c2_globals.hpp > > share/opto/machnode.hpp > > share/opto/postaloc.cpp > > ? Also update some copyright year for several files. > > > Thanks > > Yumin > > > On 9/1/20 10:35 AM, Vladimir Kozlov wrote: >> On 8/31/20 6:18 PM, David Holmes wrote: >>> Hi Yumin, >>> >>> On 1/09/2020 7:32 am, Yumin Qi wrote: >>>> HI, >>>> >>>> ?? Please review for >>>> >>>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>>> >>>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>>> >>>> >>>> ?? Summary: After Solaris supported files removed from repo, there >>>> are some remnants which needs cleaning up. Some comments are not >>>> correct, and some refer to wrong files. >>> >>> Those changes are mostly okay but I have a few minor >>> issues/suggestions below. >>> >>>> There is a flag seems only useful for Sparc: >>>> UseRDPCForConstantTableBase, which got removed in this patch . >>> >>> Despite the description of the flag it is far from clear that the use >>> of the flag affects sparc only. It affects the pinned() function so >>> seems somewhat platform agnostic in that sense - which is why this >>> was not dealt with in the SPARC removal process. I think this needs >>> closer examination by the compiler folk, with a recommendation on >>> whether it can/should be changed or not. Regardless as this is a >>> product flag then I think this change should be factored out and we >>> go through the appropriate deprecate/obsolete/expire process. >> >> The flag was used to use special SPARC instruction for CPUs supporting >> it to load base of Constant table. >> It is useless for other platforms. MachConstantBaseNode::pinned() >> method can be removed because it inherits the method from >> Node::pinned() which returns 'false' too. >> >> And I agree with David that it should be done separately because it is >> product flag. >> >>> >>>> Also in postaloc.cpp, the delay slot seems is only for sparc too, >>>> but I am not sure about that. Most of the patch are in comment section. >>> >>> It refers to spill slot not delay slot. I don't see anything >>> obviously sparc specific about that block of code. >> >> Please, leave the code as it is. As David said it is about normal >> spill slots for all platforms. >> I am not sure it is SPARC specific currently with all platforms >> OpenJDK supports. >> If you want you can file RFE to replace code with assert and ask >> community to run a lot of testing to see if we hit the assert. >> >>> >>> Specific comments: >>> >>> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >>> >>> -// 64 bits items (sparc abi) even though java would only store >>> +// 64 bits items even though java would only store >>> >>> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for >>> other platforms? >>> >>> --- >>> >>> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >>> >>> ??? // The interpreter and adapters will extend the frame of the caller. >>> ??? // Since oopMaps are based on the sp of the caller before extension >>> -? // we need to know that value. However in order to compute the >>> address >>> -? // of the return address we need the real "raw" sp. Since sparc >>> already >>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the >>> caller's >>> -? // original sp we use that convention. >>> +? // we need to know that value. However in order to compute the return >>> +? // address we need the real "raw" sp. >>> >>> I think this is losing too much information as it no longer describes >>> the convention. I would suggest: >>> >>> ??? // The interpreter and adapters will extend the frame of the caller. >>> ??? // Since oopMaps are based on the sp of the caller before extension >>> ??? // we need to know that value. However in order to compute the >>> address >>> -? // of the return address we need the real "raw" sp. Since sparc >>> already >>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the >>> caller's >>> -? // original sp we use that convention. >>> +? // of the return address we need the real "raw" sp. By convention we >>> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >>> +? // original sp. >>> >>> --- >>> >>> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >>> >>> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >>> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >>> >>> Just replace sparc with ppc as done for other platforms. >>> >>> --- >>> >>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >>> >>> -? // This greatly simplifies the cases here compared to sparc. >>> +? // This greatly simplifies the cases here. >>> >>> Just delete the comment as there is nothing to compare simplicity or >>> complexity against. >>> >>> --- >>> >>> src/hotspot/share/c1/c1_LIRGenerator.cpp >>> >>> -? // In 64bit the type can be long, sparc doesn't have this assert >>> +? // In 64bit the type can be long >>> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >>> >>> compiler folk should decide what to do here but I think the comment >>> and commented out assert can just be deleted. >> >> Yes, remove commented assert too. Originally it was platform specific >> code - the assert was there for 32-bit. >> >> Thanks, >> Vladimir K >> >>> >>> --- >>> >>> src/hotspot/share/c1/c1_Runtime1.cpp >>> >>> -? case handle_exception_nofpu_id:? // Unused on sparc >>> +? case handle_exception_nofpu_id:? // unused. >>> >>> the new comment is incorrect as this case is not unused. I suggest >>> just deleting the comment. >>> >>> Thanks, >>> David >>> ----- >>> >>>> >>>> >>>> ?? Tests passed tier1-4 >>>> >>>> >>>> ?? Thanks >>>> >>>> ?? Yumin >>>> From headius at headius.com Wed Sep 2 04:31:47 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 1 Sep 2020 23:31:47 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> Message-ID: On Mon, Aug 31, 2020 at 4:58 PM Vladimir Ivanov wrote: > As I can see with the test case, target method is loaded in a separate > instance of OneShotClassLoader (and, moreover, I see j.l.String loaded > there!). So, it doesn't mattter whether a class is loaded in a "parent" > (?) script at all since they are loaded by separate class loaders. Ok this might be the clue I needed! Some years ago, in order to avoid conflicting libraries when running JRuby inside a web container, one of our contributors (with a focus on JRuby's embedding use cases) modified JRuby's classloading to use a "self-first" classloader, which always tries to load the classes from itself *before* trying the parent classloaders. As it so happens, this also ends up being an ancestor classloader of the OneShotClassLoader we use to load compiled code. So, I patched our logic to not use the self-first loader... and the original reported example now appears to inline properly! I will need to do more exploration but I think I now have some idea why this isn't working. Our self-first classloader is likely "re-homing" some of the core JDK classes, making them appear like they're the wrong ones, or not properly resolved, or something. That in turn prevents inlining because the lifecycle and lineage of those classes doesn't look right. We will investigate how to do a safer job of self-first classloading *only* for resources shipped with JRuby, and avoid using it for JDK libraries that must always come from the system classloaders. > It's hard to draw a line here. My feeling is JVM can do a better job > here (but I haven't worked out all the consequences yet). But if you > want to get rid of this quirk running on 8u, you definitely better fix > your app (JRuby). I am also interested to see if there's a way to improve this at the JVM level... but on my end I will continue to explore how our unusual classloading can be adjusted to avoid the issue (and perhaps, to get a more complete understanding of why the current logic breaks inlining so badly.) - Charlie From ningsheng.jian at arm.com Wed Sep 2 06:40:59 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Wed, 2 Sep 2020 14:40:59 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: <39965f4d-af53-524c-36db-917509b2198f@arm.com> Hi, Thanks a lot for the reviews! I think I have addressed the review comments from Andrew, Vladimir and Erik. This is the new webrev: Full: http://cr.openjdk.java.net/~njian/8231441/webrev.05/ Incremental: http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/ Tests: Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on AArch64 systems with and without SVE as well as some x86_64 systems. Mach5 submit test also reported passed. Could you please help to take a look again? OK for jdk/jdk? Thanks, Ningsheng On 8/19/20 5:53 PM, Ningsheng Jian wrote: > Hi Andrew, > > I have updated the patch based on the review comments. Would you mind > taking another look? Thanks! > > Full: > http://cr.openjdk.java.net/~njian/8231441/webrev.04/ > > Incremental: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ > > Also add build-dev, as there's a makefile change. > > And the split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 > CSR: https://bugs.openjdk.java.net/browse/JDK-8248742 > From aph at redhat.com Wed Sep 2 07:51:46 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Sep 2020 08:51:46 +0100 Subject: RFR(S) 8252311: AArch64: save two words in itable lookup stub In-Reply-To: References: Message-ID: On 30/08/2020 18:18, Boris Ulasevich wrote: > > The interface method lookup stub becomes hot when interface calls > are performed frequently. The stub assembly code can be made > shorter (132->124 bytes) by using a pre-increment instruction variant. These things change over time. There was a time when loads with writeback were deprecated because they took extra clock cycles. On some machines, particularly big OOO machines, that might not matter. > http://cr.openjdk.java.net/~bulasevich/8252311/webrev.00 > http://bugs.openjdk.java.net/browse/JDK-8252311 > > The benchmark [1] shows [2] performance and icache loads improvement: > performance: 6165206 -> 6307798 ops/s > L1-icache-loads: 307.271 -> 274.604 > > The change was tested with JTREG. This is a change that may or may not make things better. Either way the benefit is so small that it is in the noise, as Andrew Dinn said. We have to think carefully about such things. AArch64 is an architecture, not a processor, and small benefits one one implementation might be small (or even large) losses on others. We need a clean AArch64 back end that generates simple and generally efficient code. What we don't need is special cases for different microarchitectures, because the back end will become unmaintainable, and very hard to test in all combinations. Go ahead, commit this. I think it's a small improvement, and the code is easier to understand. But every change carries some risk, and I'm going to start pushing back on changes that don't seem to offer much benefit. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Wed Sep 2 07:53:05 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 02 Sep 2020 09:53:05 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <13aba46a-3200-30fa-7f37-b08a42dc9f8e@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> <13aba46a-3200-30fa-7f37-b08a42dc9f8e@oracle.com> Message-ID: <87imcw62e6.fsf@redhat.com> Hi Tobias, > Apart from expected test failures (TestIntVect due to failed vectorization and > UseCountedLoopSafepointsTest due to a missing safepoint) and unrelated/known issues, I'm seeing the > following failure: Thanks for taking care of running tests and analyzing results. I've been investigating the failure you reported and it doesn't appear specific to the long counted loop patch. I filed JDK-8252696 and will propose a fix soon. Roland. From filipp.zhinkin at gmail.com Wed Sep 2 08:06:59 2020 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Wed, 2 Sep 2020 11:06:59 +0300 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: Message-ID: Hi, updated webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.1/ Tests are throwing SkippedException now, as Igor suggested. Seems like I should also update a year in JdkInternalMiscUnsafeUnalignedAccess' copyright, but I'm not sure if there should be a line designating Oracle as intellectual property owner along with SAP. Should I add it too? Thanks, Filipp. On Tue, 1 Sep 2020 at 22:48, Filipp Zhinkin wrote: > Hi Igor, > > On Tue, 1 Sep 2020 at 19:46, Igor Ignatyev > wrote: > >> Hi Filipp, >> >> 1st of all, welcome back! >> > > thanks! > > >> >> it would be better to throw jtreg.SkippedException at L#46 so jtreg will >> reported the test as skipped (as opposed to just passed). > > Thanks, I'll update Test8202414 as well > as compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess (which also skips > execution the same way). > > >> alternative, you could use '@requires vm.simpleArch != "arm"' to exclude >> the test from arm32 execution. >> > > I was thinking about adding something like vm.unalignedAccess.enabled, but > it seems to be too complicated solution for two tests (Test8202414 > and compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess). > I don't want to use 'simpleArch', because if some new platform missing > unaligned access support will be added in the future then someone will have > to spend time to find out why a test crashes. > > Thanks, > Filipp. > > >> >> Thanks, >> -- Igor >> >> >> > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin >> wrote: >> > >> > Hi, >> > >> > Test8202414 crashes on ARM32 while writing to memory using an unaligned >> > address. >> > ARM32 supports unaligned memory accesses for some load/store >> instructions >> > under certain conditions, but LDRD (which is used when we're calling >> > Unsafe::putLong) is always causing alignment fault when called with an >> > unaligned address [1]. >> > >> > The fix is simply skipping the test execution if a platform does not >> > support unaligned memory accesses. >> > >> > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 >> > Webrev: >> http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ >> > >> > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, >> ?A3.2.1 >> > Unaligned data access >> https://developer.arm.com/documentation/ddi0406/cd >> > >> > Thanks, >> > Filipp. >> >> From vladimir.x.ivanov at oracle.com Wed Sep 2 08:07:59 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 2 Sep 2020 11:07:59 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> Message-ID: <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> >> As I can see with the test case, target method is loaded in a separate >> instance of OneShotClassLoader (and, moreover, I see j.l.String loaded >> there!). So, it doesn't mattter whether a class is loaded in a "parent" >> (?) script at all since they are loaded by separate class loaders. > > Ok this might be the clue I needed! > > Some years ago, in order to avoid conflicting libraries when running > JRuby inside a web container, one of our contributors (with a focus on > JRuby's embedding use cases) modified JRuby's classloading to use a > "self-first" classloader, which always tries to load the classes from > itself *before* trying the parent classloaders. > > As it so happens, this also ends up being an ancestor classloader of > the OneShotClassLoader we use to load compiled code. > > So, I patched our logic to not use the self-first loader... and the > original reported example now appears to inline properly! > > I will need to do more exploration but I think I now have some idea > why this isn't working. Our self-first classloader is likely > "re-homing" some of the core JDK classes, making them appear like > they're the wrong ones, or not properly resolved, or something. That > in turn prevents inlining because the lifecycle and lineage of those > classes doesn't look right. Only bootstrap class loader is allowed to define classes under java.lang. (And it was the case long before modules were introduced.) There's simply no way for successully load java.lang.String class unless the request is delegated to bootstrap class loader. There may be other reasons why the problem doesn't show up after the change: e.g., when scheduling a method for compilation by C2, all classes mentioned in its signature are eagerly loaded, or more code is loaded inside the class loader and java.lang.String is loaded there before compilation kicks in. But the root cause stays the same: java.lang.String is absent (hasn't been resolved yet) in the context class loader used for inline.rb script. (If it were resolved, it'll point to the java.lang.String from bootstrap CL.) So, the only solution I see (w/o touching JVM) is to force java.lang.String class resolution for OneShotClassLoader instances eargerly. (Or, alternatively, drop String argument.) > We will investigate how to do a safer job of self-first classloading > *only* for resources shipped with JRuby, and avoid using it for JDK > libraries that must always come from the system classloaders. >> It's hard to draw a line here. My feeling is JVM can do a better job >> here (but I haven't worked out all the consequences yet). But if you >> want to get rid of this quirk running on 8u, you definitely better fix >> your app (JRuby). > > I am also interested to see if there's a way to improve this at the > JVM level... but on my end I will continue to explore how our unusual > classloading can be adjusted to avoid the issue (and perhaps, to get a > more complete understanding of why the current logic breaks inlining > so badly.) After thinking more about it, I still don't see any compelling reason to forbid inlining of a method solely by the presence of not-yet-loaded classes in its signature. I'll experiment with disabling the problematic logic in C2. Best regards, Vladimir Ivanov From dmitry.chuyko at bell-sw.com Wed Sep 2 10:01:13 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 2 Sep 2020 13:01:13 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <11530b87-8124-19ca-936b-16dec5994411@redhat.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> Message-ID: Andrew, thank you very much for the review. Vladimir Ivanov pointed that constructed nested nodes must be introduced to gvn, otherwise there is potentially a violated assert. Here is an updated change with correct creation of constant nodes and a few minor cleanups: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.05/ -Dmitry On 9/1/20 6:48 PM, Andrew Haley wrote: > On 31/08/2020 15:28, Dmitry Chuyko wrote: >> Here is another version of intrinsics. It is an extension of webrev.03. >> Additional thing is that constants 0 and 1 that are used internally by >> intrinics are constructed as nodes. This is somehow similar to what is >> done for passing pointers to tables. >> >> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/ >> results: >> http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/signum-facgt_ir-copysign.ods > Hi, > > Thank you. That certainly looks better. > > It's unfortunate that signum doesn't help in all cases, but I'm happy > that we have something positive in general. Certainly the code looks > nice. I'm still rather baffled that an intrinsification of copySign > actually makes things much worse on blackhole on Neoverse N1, but it > doesn't really matter because the copySign intrinsic isn't enabled by > default. So please go ahead with this version. > > Having said all of that, it's a fairly minor improvement for some > considerable complexity. And it depends terribly on the > micorarchitecture of a particular part, albeit an important one. > From richard.reingruber at sap.com Wed Sep 2 13:48:12 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 2 Sep 2020 13:48:12 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> Message-ID: Hi Robbin, // taking the discussion back to the mailing lists > I still don't understand why you don't deoptimize the objects inside the > handshake/safepoint instead? This is unfortunately not possible. Deoptimizing objects includes reallocating scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This cannot be done at a safepoint or handshake. 1. The vm thread is not allowed to allocate on the java heap See for instance assertions in ParallelScavengeHeap::mem_allocate() https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp#L258 This is not easy to change, I suppose, because it will be difficult to gc if necessary. 2. Using a direct handshake would not work either. The problem there is again gc. Let J be the JavaThread that is executing the direct handshake. The vm would deadlock if the vm thread waits for J to execute the closure of a handshake-all and J waits for the vm thread to execute a gc vm operation. Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 Cheers, Richard. -----Original Message----- From: Robbin Ehn Sent: Mittwoch, 2. September 2020 13:56 To: Reingruber, Richard Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi, I still don't understand why you don't deoptimize the objects inside the handshake/safepoint instead? E.g. JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code from: eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the stack, so: void GetOwnedMonitorInfoClosure::do_thread(Thread *target) { assert(target->is_Java_thread(), "just checking"); JavaThread *jt = (JavaThread *)target; if (!jt->is_exiting() && (jt->threadObj() != NULL)) { + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { _result = ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, _owned_monitors_list); } else { _result = JVMTI_ERROR_OUT_OF_MEMORY; } } } Why try 'suspend' the thread first? When we de-optimize all threads why not just in the following safepoint? E.g. VM_HeapWalkOperation::doit() { + EscapeBarrier::deoptimize_objects_all_threads(); ... } Thanks, Robbin From adinn at redhat.com Wed Sep 2 13:58:38 2020 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 2 Sep 2020 14:58:38 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <39965f4d-af53-524c-36db-917509b2198f@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <39965f4d-af53-524c-36db-917509b2198f@arm.com> Message-ID: Hi Ningsheng, On 02/09/2020 07:40, Ningsheng Jian wrote: > Thanks a lot for the reviews! I think I have addressed the review > comments from Andrew, Vladimir and Erik. This is the new webrev: > > Full: > http://cr.openjdk.java.net/~njian/8231441/webrev.05/ > > Incremental: > http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/ > > Tests: > Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on > AArch64 systems with and without SVE as well as some x86_64 systems. > Mach5 submit test also reported passed. > > Could you please help to take a look again? OK for jdk/jdk? That looks good to me. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From robbin.ehn at oracle.com Wed Sep 2 14:54:13 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 2 Sep 2020 16:54:13 +0200 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> Message-ID: <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> Hi Richard, On 2020-09-02 15:48, Reingruber, Richard wrote: > Hi Robbin, > > // taking the discussion back to the mailing lists > > > I still don't understand why you don't deoptimize the objects inside the > > handshake/safepoint instead? So for handshakes using asynch handshake and allowing blocking inside would fix that. (future fix, I'm working on that now) For safepoint, since we have suspended all threads, ~'safepointed them' with a JavaThread, you _could_ just execute the action directly (e.g. skipping VM_HeapWalkOperation safepoint) since they are suppose to be safely suspended until the destructor of EB, no? So I suggest future work to instead just execute the safepoint with the requesting JT instead of having a this special safepoiting mechanism. Since you are missing above functionality I see why you went this way. If you need to push it, it's fine by me. Thanks for explaining once again :) /Robbin > > This is unfortunately not possible. Deoptimizing objects includes reallocating > scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This > cannot be done at a safepoint or handshake. > > 1. The vm thread is not allowed to allocate on the java heap > See for instance assertions in ParallelScavengeHeap::mem_allocate() > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKya9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ > > This is not easy to change, I suppose, because it will be difficult to gc if > necessary. > > 2. Using a direct handshake would not work either. The problem there is again > gc. Let J be the JavaThread that is executing the direct handshake. The vm > would deadlock if the vm thread waits for J to execute the closure of a > handshake-all and J waits for the vm thread to execute a gc vm operation. > Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 > > Cheers, Richard. > > -----Original Message----- > From: Robbin Ehn > Sent: Mittwoch, 2. September 2020 13:56 > To: Reingruber, Richard > Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi, > > I still don't understand why you don't deoptimize the objects inside the > handshake/safepoint instead? > > E.g. > > JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code > from: > eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the > stack, so: > > void > GetOwnedMonitorInfoClosure::do_thread(Thread *target) { > assert(target->is_Java_thread(), "just checking"); > JavaThread *jt = (JavaThread *)target; > > if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { > _result = > ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, > _owned_monitors_list); > } else { > _result = JVMTI_ERROR_OUT_OF_MEMORY; > } > } > } > > Why try 'suspend' the thread first? > > > When we de-optimize all threads why not just in the following safepoint? > E.g. > VM_HeapWalkOperation::doit() { > + EscapeBarrier::deoptimize_objects_all_threads(); > ... > } > > Thanks, Robbin > > From aph at redhat.com Wed Sep 2 16:07:58 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Sep 2020 17:07:58 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> Message-ID: <9b14a8bc-b4a1-ec74-d86b-1204a9831dfa@redhat.com> On 02/09/2020 11:01, Dmitry Chuyko wrote: > Vladimir Ivanov pointed that constructed nested nodes must be introduced > to gvn, otherwise there is potentially a violated assert. Here is an > updated change with correct creation of constant nodes and a few minor > cleanups: > > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.05/ Oh, good catch. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From yumin.qi at oracle.com Wed Sep 2 16:17:43 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Wed, 2 Sep 2020 09:17:43 -0700 Subject: 8248337: sparc related code clean up after solaris removal In-Reply-To: <7858cdf4-25cd-4e14-ec60-b3c256f59c9d@oracle.com> References: <8476b337-f034-19ce-e502-313da7d048c7@oracle.com> <0e96ff53-2853-960f-3140-87d774345c05@oracle.com> <7858cdf4-25cd-4e14-ec60-b3c256f59c9d@oracle.com> Message-ID: <9a5dc120-b7cc-6ce9-3346-77403e5e534d@oracle.com> HI, David ? Thanks for re-review! Yumin On 9/1/20 9:07 PM, David Holmes wrote: > Hi Yumin, > > Update looks good. Thanks for filing the other RFEs. > > David > > On 2/09/2020 8:40 am, Yumin Qi wrote: >> HI, Vladimir and David >> >> ??? I have updated new webrev at: http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-02/ >> >> ??? Filed two issues to address your concern separately: >> >> ??? 1) 8252681: Retire flag UseRDPCForConstantTableBase after solaris removal >> >> https://bugs.openjdk.java.net/browse/JDK-8252681 >> >> ?? 2) 8252682: investigate PhaseChaitin::post_allocate_copy_removal after solaris removal >> >> https://bugs.openjdk.java.net/browse/JDK-8252682 >> >> >> ?? So following three files leave no change: >> >> share/opto/c2_globals.hpp >> >> share/opto/machnode.hpp >> >> share/opto/postaloc.cpp >> >> ?? Also update some copyright year for several files. >> >> >> Thanks >> >> Yumin >> >> >> On 9/1/20 10:35 AM, Vladimir Kozlov wrote: >>> On 8/31/20 6:18 PM, David Holmes wrote: >>>> Hi Yumin, >>>> >>>> On 1/09/2020 7:32 am, Yumin Qi wrote: >>>>> HI, >>>>> >>>>> ?? Please review for >>>>> >>>>> ?? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 >>>>> >>>>> webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ >>>>> >>>>> >>>>> ?? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. >>>> >>>> Those changes are mostly okay but I have a few minor issues/suggestions below. >>>> >>>>> There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . >>>> >>>> Despite the description of the flag it is far from clear that the use of the flag affects sparc only. It affects the pinned() function so seems somewhat platform agnostic in that sense - which is why this was not dealt with in the SPARC removal process. I think this needs closer examination by the compiler folk, with a recommendation on whether it can/should be changed or not. Regardless as this is a product flag then I think this change should be factored out and we go through the appropriate deprecate/obsolete/expire process. >>> >>> The flag was used to use special SPARC instruction for CPUs supporting it to load base of Constant table. >>> It is useless for other platforms. MachConstantBaseNode::pinned() method can be removed because it inherits the method from Node::pinned() which returns 'false' too. >>> >>> And I agree with David that it should be done separately because it is product flag. >>> >>>> >>>>> Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. >>>> >>>> It refers to spill slot not delay slot. I don't see anything obviously sparc specific about that block of code. >>> >>> Please, leave the code as it is. As David said it is about normal spill slots for all platforms. >>> I am not sure it is SPARC specific currently with all platforms OpenJDK supports. >>> If you want you can file RFE to replace code with assert and ask community to run a lot of testing to see if we hit the assert. >>> >>>> >>>> Specific comments: >>>> >>>> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp >>>> >>>> -// 64 bits items (sparc abi) even though java would only store >>>> +// 64 bits items even though java would only store >>>> >>>> Should "(sparc abi)" be replaced with "(Aarch64 abi)" as you did for other platforms? >>>> >>>> --- >>>> >>>> src/hotspot/cpu/arm/frame_arm.hpp (and other files) >>>> >>>> ??? // The interpreter and adapters will extend the frame of the caller. >>>> ??? // Since oopMaps are based on the sp of the caller before extension >>>> -? // we need to know that value. However in order to compute the address >>>> -? // of the return address we need the real "raw" sp. Since sparc already >>>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> -? // original sp we use that convention. >>>> +? // we need to know that value. However in order to compute the return >>>> +? // address we need the real "raw" sp. >>>> >>>> I think this is losing too much information as it no longer describes the convention. I would suggest: >>>> >>>> ??? // The interpreter and adapters will extend the frame of the caller. >>>> ??? // Since oopMaps are based on the sp of the caller before extension >>>> ??? // we need to know that value. However in order to compute the address >>>> -? // of the return address we need the real "raw" sp. Since sparc already >>>> -? // uses sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> -? // original sp we use that convention. >>>> +? // of the return address we need the real "raw" sp. By convention we >>>> +? // use sp() to mean "raw" sp and unextended_sp() to mean the caller's >>>> +? // original sp. >>>> >>>> --- >>>> >>>> src/hotspot/cpu/ppc/jniTypes_ppc.hpp >>>> >>>> -? // stubGenerator_sparc.cpp) reverse the argument list constructed by >>>> +? // stubGenerator_${CPU}.cpp) reverse the argument list constructed by >>>> >>>> Just replace sparc with ppc as done for other platforms. >>>> >>>> --- >>>> >>>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp >>>> >>>> -? // This greatly simplifies the cases here compared to sparc. >>>> +? // This greatly simplifies the cases here. >>>> >>>> Just delete the comment as there is nothing to compare simplicity or complexity against. >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_LIRGenerator.cpp >>>> >>>> -? // In 64bit the type can be long, sparc doesn't have this assert >>>> +? // In 64bit the type can be long >>>> ??? // assert(offset.type()->tag() == intTag, "invalid type"); >>>> >>>> compiler folk should decide what to do here but I think the comment and commented out assert can just be deleted. >>> >>> Yes, remove commented assert too. Originally it was platform specific code - the assert was there for 32-bit. >>> >>> Thanks, >>> Vladimir K >>> >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_Runtime1.cpp >>>> >>>> -? case handle_exception_nofpu_id:? // Unused on sparc >>>> +? case handle_exception_nofpu_id:? // unused. >>>> >>>> the new comment is incorrect as this case is not unused. I suggest just deleting the comment. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>> ?? Tests passed tier1-4 >>>>> >>>>> >>>>> ?? Thanks >>>>> >>>>> ?? Yumin >>>>> From vladimir.x.ivanov at oracle.com Wed Sep 2 17:40:54 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 2 Sep 2020 20:40:54 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> Message-ID: > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.05/ Overall, shared code changes look fine. src/hotspot/share/opto/intrinsicnode.cpp: +//------------------------------CopySign----------------------------------------- +CopySignDNode* CopySignDNode::make(PhaseGVN& gvn, Node* in1, Node* in2) { + return new CopySignDNode(in1, in2, gvn.makecon(TypeD::ZERO)); +} + +//------------------------------Signum------------------------------------------- +SignumDNode* SignumDNode::make(PhaseGVN& gvn, Node* in) { + return new SignumDNode(in, gvn.makecon(TypeD::ZERO), gvn.makecon(TypeD::ONE)); +} + +SignumFNode* SignumFNode::make(PhaseGVN& gvn, Node* in) { + return new SignumFNode(in, gvn.makecon(TypeF::ZERO), gvn.makecon(TypeF::ONE)); +} Putting auxiliary constants on ideal nodes doesn't look right (platform-specific implementation details leaking into shared code), but possible alternatives are much uglier. C2 doesn't rematerialize FP constants on AArch64, so code quality benefits from sharing ConF/ConD nodes. Introducing "constants as TEMPs" support in Matcher (insert constants as additional inputs during matching) seems the best option, but it is not implemented yet. (Something to consider as a separate enhancement.) Best regards, Vladimir Ivanov > On 9/1/20 6:48 PM, Andrew Haley wrote: >> On 31/08/2020 15:28, Dmitry Chuyko wrote: >>> Here is another version of intrinsics. It is an extension of webrev.03. >>> Additional thing is that constants 0 and 1 that are used internally by >>> intrinics are constructed as nodes. This is somehow similar to what is >>> done for passing pointers to tables. >>> >>> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/ >>> results: >>> http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/signum-facgt_ir-copysign.ods >>> >> Hi, >> >> Thank you. That certainly looks better. >> >> It's unfortunate that signum doesn't help in all cases, but I'm happy >> that we have something positive in general. Certainly the code looks >> nice. I'm still rather baffled that an intrinsification of copySign >> actually makes things much worse on blackhole on Neoverse N1, but it >> doesn't really matter because the copySign intrinsic isn't enabled by >> default. So please go ahead with this version. >> >> Having said all of that, it's a fairly minor improvement for some >> considerable complexity. And it depends terribly on the >> micorarchitecture of a particular part, albeit an important one. >> From igor.ignatyev at oracle.com Wed Sep 2 18:53:22 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 2 Sep 2020 11:53:22 -0700 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: Message-ID: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> > On Sep 2, 2020, at 1:06 AM, Filipp Zhinkin wrote: > > Hi, > > updated webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.1/ > > Tests are throwing SkippedException now, as Igor suggested. thanks, LGTM. > Seems like I should also update a year in JdkInternalMiscUnsafeUnalignedAccess' copyright, but I'm not sure if there should be a line designating Oracle as intellectual property owner along with SAP. Should I add it too? IANAL, but unless the file was modified by Oracle, it doesn't have to have Oracle copyright notice. Thanks, -- Igor > > Thanks, > Filipp. > > On Tue, 1 Sep 2020 at 22:48, Filipp Zhinkin > wrote: > Hi Igor, > > On Tue, 1 Sep 2020 at 19:46, Igor Ignatyev > wrote: > Hi Filipp, > > 1st of all, welcome back! > > thanks! > > > it would be better to throw jtreg.SkippedException at L#46 so jtreg will reported the test as skipped (as opposed to just passed). > Thanks, I'll update Test8202414 as well as compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess (which also skips execution the same way). > > alternative, you could use '@requires vm.simpleArch != "arm"' to exclude the test from arm32 execution. > > I was thinking about adding something like vm.unalignedAccess.enabled, but it seems to be too complicated solution for two tests (Test8202414 and compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess). > I don't want to use 'simpleArch', because if some new platform missing unaligned access support will be added in the future then someone will have to spend time to find out why a test crashes. > > Thanks, > Filipp. > > > Thanks, > -- Igor > > > > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin > wrote: > > > > Hi, > > > > Test8202414 crashes on ARM32 while writing to memory using an unaligned > > address. > > ARM32 supports unaligned memory accesses for some load/store instructions > > under certain conditions, but LDRD (which is used when we're calling > > Unsafe::putLong) is always causing alignment fault when called with an > > unaligned address [1]. > > > > The fix is simply skipping the test execution if a platform does not > > support unaligned memory accesses. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 > > Webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ > > > > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, ?A3.2.1 > > Unaligned data access https://developer.arm.com/documentation/ddi0406/cd > > > > Thanks, > > Filipp. > From igor.ignatyev at oracle.com Wed Sep 2 20:18:38 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 2 Sep 2020 13:18:38 -0700 Subject: RFR(T) : 8252720 : clean up FileInstaller $test.src $cwd in vmTestbase/vm/compiler/optimizations tests Message-ID: <1E77E397-571E-48FF-BC5C-B88663C02A55@oracle.com> http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 > 8 lines changed: 0 ins; 4 del; 4 mod; Hi all, could you please review this trivial patch? from JBS: > somehow I missed vmTestbase/vm/compiler/optimizations when was working on JDK-8251127. this sub-task is to remove FileInstaller actions from vm/compiler/optimizations tests. testing: vmTestbase/vm/compiler/optimizations tests JBS: https://bugs.openjdk.java.net/browse/JDK-8252720 webrev: http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 Thanks, -- Igor From Divino.Cesar at microsoft.com Wed Sep 2 21:26:08 2020 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Wed, 2 Sep 2020 21:26:08 +0000 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: References: Message-ID: Gentle ping. Can anyone PTAL? ________________________________ From: hotspot-compiler-dev on behalf of Cesar Soares Lucas Sent: August 27, 2020 12:36 PM To: hotspot-compiler-dev at openjdk.java.net Cc: Brian Stafford ; Aditya Mandaleeka ; Christian Hagedorn Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc Hi there, RFE: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8250668&data=02%7C01%7CDivino.Cesar%40microsoft.com%7Cd5d3da94303a45a0e23a08d84ac08aea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341538114359103&sdata=bNhB3hNulOzo5umJL93i5rkVwOA%2B0RWFEWKODMIyFcE%3D&reserved=0 Webrev: https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcesar%2F8250668%2F0%2F&data=02%7C01%7CDivino.Cesar%40microsoft.com%7Cd5d3da94303a45a0e23a08d84ac08aea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341538114359103&sdata=TLIvb7p9QAnjNw7IEmjQLIUIiUnl51repMl62szUo%2F0%3D&reserved=0 Need sponsor: Yes Tested on: Windows/Linux/MacOS tiers 1-3 can I please get some reviews for the Webrev linked above? The work consists of renaming "method_oop" ocurrences all around the code base to just "method". I've tested this on x86_64 only?* Can someone please help testing on other architectures as well: x86_32, PPC, ARM32/64, S390? Thank you, Cesar From richard.reingruber at sap.com Wed Sep 2 21:26:52 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 2 Sep 2020 21:26:52 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> Message-ID: Hi Robin, > On 2020-09-02 15:48, Reingruber, Richard wrote: > > Hi Robbin, > > > > // taking the discussion back to the mailing lists > > > > > I still don't understand why you don't deoptimize the objects inside the > > > handshake/safepoint instead? > So for handshakes using asynch handshake and allowing blocking inside > would fix that. (future fix, I'm working on that now) Just to make it clear: I'm not fond of the extra suspension mechanism currently used for JDK-8227745 either. I want to get rid of it and I will work on it. Asynch handshakes (JDK-8238761) could be a replacement for it. At least I think they can be used to suspend the target thread. > For safepoint, since we have suspended all threads, ~'safepointed them' > with a JavaThread, you _could_ just execute the action directly (e.g. > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > safely suspended until the destructor of EB, no? Yes, this should be possible. This would be an advanced change though. I would like EscapeBarriers to be a no-op and fall back to current implementation, if C2-EscapeAnalysis/Graal are disabled. > So I suggest future work to instead just execute the safepoint with the > requesting JT instead of having a this special safepoiting mechanism. > Since you are missing above functionality I see why you went this way. > If you need to push it, it's fine by me. We will work on further improvements. Top of the list would be eliminating the extra suspend mechanism. The implementation has matured for more than 12 months now [1]. It's been tested extensively at SAP over that time and passed also extended testing at Oracle kindly conducted by Vladimir Kozlov. We've got two full Reviews and incorporated extensive feedback from a number of OpenJDK Reviewers (including you, thanks!). Based on that I reckon we're good to push the change as enhancement (JDK-8227745) and bug fix (JDK-8233915). > Thanks for explaining once again :) Pleasure :) Thanks, Richard. [1] http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028729.html -----Original Message----- From: Robbin Ehn Sent: Mittwoch, 2. September 2020 16:54 To: Reingruber, Richard ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, On 2020-09-02 15:48, Reingruber, Richard wrote: > Hi Robbin, > > // taking the discussion back to the mailing lists > > > I still don't understand why you don't deoptimize the objects inside the > > handshake/safepoint instead? So for handshakes using asynch handshake and allowing blocking inside would fix that. (future fix, I'm working on that now) For safepoint, since we have suspended all threads, ~'safepointed them' with a JavaThread, you _could_ just execute the action directly (e.g. skipping VM_HeapWalkOperation safepoint) since they are suppose to be safely suspended until the destructor of EB, no? So I suggest future work to instead just execute the safepoint with the requesting JT instead of having a this special safepoiting mechanism. Since you are missing above functionality I see why you went this way. If you need to push it, it's fine by me. Thanks for explaining once again :) /Robbin > > This is unfortunately not possible. Deoptimizing objects includes reallocating > scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This > cannot be done at a safepoint or handshake. > > 1. The vm thread is not allowed to allocate on the java heap > See for instance assertions in ParallelScavengeHeap::mem_allocate() > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKya9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ > > This is not easy to change, I suppose, because it will be difficult to gc if > necessary. > > 2. Using a direct handshake would not work either. The problem there is again > gc. Let J be the JavaThread that is executing the direct handshake. The vm > would deadlock if the vm thread waits for J to execute the closure of a > handshake-all and J waits for the vm thread to execute a gc vm operation. > Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 > > Cheers, Richard. > > -----Original Message----- > From: Robbin Ehn > Sent: Mittwoch, 2. September 2020 13:56 > To: Reingruber, Richard > Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi, > > I still don't understand why you don't deoptimize the objects inside the > handshake/safepoint instead? > > E.g. > > JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code > from: > eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the > stack, so: > > void > GetOwnedMonitorInfoClosure::do_thread(Thread *target) { > assert(target->is_Java_thread(), "just checking"); > JavaThread *jt = (JavaThread *)target; > > if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { > _result = > ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, > _owned_monitors_list); > } else { > _result = JVMTI_ERROR_OUT_OF_MEMORY; > } > } > } > > Why try 'suspend' the thread first? > > > When we de-optimize all threads why not just in the following safepoint? > E.g. > VM_HeapWalkOperation::doit() { > + EscapeBarrier::deoptimize_objects_all_threads(); > ... > } > > Thanks, Robbin > > From headius at headius.com Wed Sep 2 21:55:22 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 2 Sep 2020 16:55:22 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> Message-ID: On Wed, Sep 2, 2020 at 3:10 AM Vladimir Ivanov wrote: > Only bootstrap class loader is allowed to define classes under > java.lang. (And it was the case long before modules were introduced.) > There's simply no way for successully load java.lang.String class unless > the request is delegated to bootstrap class loader. I realized that after I sent, and also realized I was not running the example correctly. Sorry for the misdirection... I still have no fix and no workaround. I have also tried forcing String and other classes referenced in the generated code to load into each OneShotClassLoader, but the problem remains. Disabling tiered compilation again fixes it, but I suspect that's just a lucky side effect for this case. So I am still without any leads on fixes or workarounds. - Charlie From igor.ignatyev at oracle.com Wed Sep 2 22:18:47 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 2 Sep 2020 15:18:47 -0700 Subject: RFR(T) : 8251997 : remove usage of PropertyResolvingWrapper in vmTestbase/vm/mlvm/ Message-ID: <1D2DD286-0830-468A-8B7D-6B4928F43DB9@oracle.com> http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ > 27 lines changed: 5 ins; 12 del; 10 mod; Hi all, could you please review this small and trivial patch which removes usage of PropertyResolvingWrapper from vm/mlvm tests and reenables "smart action arguments"? a bit of background from main bug (8219140): > CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. testing: vmTestbase/vm/mlvm/ JBS: https://bugs.openjdk.java.net/browse/JDK-8251997 webrev: http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ Thanks, -- Igor From david.holmes at oracle.com Thu Sep 3 00:20:19 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 3 Sep 2020 10:20:19 +1000 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> <03df9364-817d-04d6-6434-80be93a66526@oracle.com> Message-ID: <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> Hi Jamsheed, On 1/09/2020 10:36 pm, Jamsheed C M wrote: > Hi David, > > I reworked the patch, revised webrev here: > http://cr.openjdk.java.net/~jcm/8249451/webrev.01/ Thanks. The new macros and injected field for InternalError look good. A couple of minor comments below but overall this looks good to me. > In addition I moved UnlockFlagSaver fs(this) to more local scope. > > also removed changes done for JDK-8246727, as it will be separately > handled by the bug. > > Testing: injected and tested async exceptions randomly at compilation > request path and deopt path. I noticed in deoptimization.cpp that here: 1965 load_class_by_index(constants, unloaded_class_index, THREAD); we can now return with a pending async exception and it is unclear whether the code following this will be able to handle that, or indeed whether the caller will be able to handle it. Did you specifically test this site? --- src/hotspot/share/jvmci/jvmciRuntime.cpp The comment at: 80 // 1. The pending exception is cleared should be updated now that asyncs are not cleared. --- src/hotspot/share/compiler/tieredThresholdPolicy.* The changes from JavaThread* to Thread* look unnecessary for 90% of the cases, but the overall change seems to be dictated by the few methods that do use CHECK*. :( No point agonising over this now as I'm trying to deal with this general problem as a separate RFE - JDK-8252685. Thanks, David ----- > Best regards, > > Jamsheed > > On 24/08/2020 11:06, Jamsheed C M wrote: >> Hi David, >> >> Thank you for the review and feedback. Agree on all of them. I will >> rework and get back. >> >> On 10/08/2020 07:33, David Holmes wrote: >>> Hi Jamsheed, >>> >>> On 6/08/2020 10:07 pm, Jamsheed C M wrote: >>>> Hi all, >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 >>>> >>>> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ >>> >>> Thanks for tackling this messy issue. Overall I like the use of TRAPS >>> to more clearly document which methods can return with an exception >>> pending. I think there are some problems with the proposed changes. >>> I'll start with those comments and then move on to more general >>> comments. >>> >>> src/hotspot/share/utilities/exceptions.cpp >>> src/hotspot/share/utilities/exceptions.hpp >>> >>> I don't think the changes here are correct or safe in general. >>> >>> First, adding the new macro and function to only clear non-async >>> exceptions is fine itself. But naming wise the fact only non-async >>> exceptions are cleared should be evident, and there is no "check" >>> involved (in the sense of the existing CHECK_ macros) so I suggest: >>> >>> s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ >>> s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ >>> >> Ok >>> But changing the existing CHECK_AND_CLEAR macros to now leave async >>> exceptions pending seems potentially dangerous as calling code may >>> not be prepared for there to now be a pending exception. For example >>> the use in thread.cpp: >>> >>> ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); >>> ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); >>> >>> get_java_runtime_name() is currently guaranteed to clear all >>> exceptions, so all the other code is known to be safe to call. But >>> that would no longer be true. That said, this is VM initialization >>> code and an async exception is impossible at this stage. >>> >>> I think I would rather see CHECK_AND_CLEAR left as-is, and an actual >>> CHECK_AND_CLEAR_NONASYNC introduced for those users of >>> CHECK_AND_CLEAR that can encounter async exceptions and which should >>> not clear them. >>> >>> +?? if >>> (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && >>> +?????? _pending_exception->klass() != >>> SystemDictionary::InternalError_klass()) { >>> >> Ok >>> Flagging all InternalErrors as async exceptions is probably also not >>> correct. I don't see a good solution to this at the moment. I think >>> we would need to introduce a new subclass of InternalError for the >>> unsafe access error case**. Now it may be that all the other >>> InternalError usages are "impossible" in the context of where the new >>> macros are to be used, but that is very difficult to establish or >>> assert. >>> >>> ** Or perhaps we could inject a field that allows the VM to identify >>> instances related to unsafe access errors ... Ideally of course these >>> unsafe access errors would be distinct from the async exception >>> mechanism - something I would still like to pursue. >>> >> Ok >>> --- >>> >>> General comments ... >>> >>> There is a general change from "JavaThread* thread" to "Thread* >>> THREAD" (or TRAPS) to allow the use of the CHECK macros. This is >>> unfortunate because the fact the thread is restricted to being a >>> JavaThread is no longer evident in the method signatures. That is a >>> flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the >>> methods no longer take a JavaThread* arg, they should assert that >>> THREAD->is_Java_thread(). I will also look at an RFE to have >>> as_JavaThread() to avoid the need for separate assertion checks >>> before casting from "Thread*" to "JavaThread*". >>> >> Ok >>> Note there's no need to use CHECK when the enclosing method is going >>> to return immediately after the call that contains the CHECK. It just >>> adds unnecessary checking of the exception state. The use of TRAPS >>> shows that the methods may return with an exception pending. I've >>> flagged all such occurrences I spotted below. >>> >> Ok >>> --- >>> >>> +?? // Only metaspace OOM is expected. no Java code executed. >>> >>> Nit: s/no/No >>> >>> >>> src/hotspot/share/compiler/compilationPolicy.cpp >>> >>> >>> ?410?????? method_invocation_event(method, CHECK_NULL); >>> ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, >>> comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK); >>> >>> Nit: there's no need to use CHECK here. >>> >>> --- >>> >>> src/hotspot/share/compiler/tieredThresholdPolicy.cpp >>> >>> ?504???? method_invocation_event(method, inlinee, comp_level, nm, >>> CHECK_NULL); >>> ?570???????? compile(mh, bci, CompLevel_simple, CHECK); >>> ?581???????? compile(mh, bci, CompLevel_simple, CHECK); >>> ?595???? CompileBroker::compile_method(mh, bci, level, mh, hot_count, >>> CompileTask::Reason_Tiered, CHECK); >>> 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK); >>> >>> Nit: there's no need to use CHECK here. >>> >>> 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, >>> Thread* THREAD) { >>> >>> Thank you for correcting this misuse of the THREAD name on a >>> JavaThread* type. >>> >>> --- >>> >>> src/hotspot/share/interpreter/linkResolver.cpp >>> >>> ?128?? CompilationPolicy::compile_if_required(selected_method, CHECK); >>> >>> Nit: there's no need to use CHECK here. >>> >>> --- >>> >>> src/hotspot/share/jvmci/compilerRuntime.cpp >>> >>> ?260???? CompilationPolicy::policy()->event(emh, mh, >>> InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); >>> ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, >>> mh, branch_bci, target_bci, CompLevel_aot, cm, CHECK); >>> >>> Nit: there's no need to use CHECK here. >>> >>> --- >>> >>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>> >>> ?102???????? // Donot clear probable async exceptions. >>> >>> typo: s/Donot/Do not/ >>> >>> --- >>> >>> src/hotspot/share/runtime/deoptimization.cpp >>> >>> 1686 void Deoptimization::load_class_by_index(const >>> constantPoolHandle& constant_pool, int index) { >>> >>> This method should be declared with TRAPS now. >>> >>> 1693???? // Donot clear probable Async Exceptions. >>> >>> typo: s/Donot/Do not/ >>> >>> >> Ok >>>> testing : mach1-5(links in jbs) >>> >>> There is very little existing testing that will actually test the key >>> changes you have made here. You will need to do direct >>> fault-injection testing anywhere you now allow async exceptions to >>> remain, to see if the calling code can tolerate that. It will be >>> difficult to test thoroughly. >>> >> Ok >>> Thanks again for tackling this difficult problem! >> >> Best regards, >> >> Jamsheed >> >>> >>> David >>> ----- >>> >>>> >>>> While working on JDK-8246381 it was noticed that compilation request >>>> path clears all exceptions(including async) and doesn't propagate[1]. >>>> >>>> Fix: patch restores the propagation behavior for the probable async >>>> exceptions. >>>> >>>> Compilation request path propagate exception as in [2]. MDO and >>>> MethodCounter doesn't expect any exception other than metaspace >>>> OOM(added comments). >>>> >>>> Deoptimization path doesn't clear probable async exceptions and take >>>> unpack_exception path for non uncommontraps. >>>> >>>> Added java_lang_InternalError to well known classes. >>>> >>>> Request for review. >>>> >>>> Best Regards, >>>> >>>> Jamsheed >>>> >>>> [1] w.r.t changes done for JDK-7131259 >>>> >>>> [2] >>>> >>>> ???? (a) >>>> ???? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp >>>> ?????? | >>>> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp >>>> ????????? | >>>> ?????????? ------ compileBroker.cpp >>>> >>>> ???? (b) >>>> ???? Xcomp versions >>>> ???? ------> compilationPolicy.cpp >>>> ??????? | >>>> ???????? ------> compileBroker.cpp >>>> >>>> ???? (c) >>>> >>>> ???? Direct call to? compile_method in compileBroker.cpp >>>> >>>> ???? JVMCI bootstrap, whitebox, replayCompile. >>>> >>>> From ningsheng.jian at arm.com Thu Sep 3 02:58:23 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 3 Sep 2020 10:58:23 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <39965f4d-af53-524c-36db-917509b2198f@arm.com> Message-ID: On 9/2/20 9:58 PM, Andrew Dinn wrote: > Hi Ningsheng, > > On 02/09/2020 07:40, Ningsheng Jian wrote: >> Thanks a lot for the reviews! I think I have addressed the review >> comments from Andrew, Vladimir and Erik. This is the new webrev: >> >> Full: >> http://cr.openjdk.java.net/~njian/8231441/webrev.05/ >> >> Incremental: >> http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/ >> >> Tests: >> Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on >> AArch64 systems with and without SVE as well as some x86_64 systems. >> Mach5 submit test also reported passed. >> >> Could you please help to take a look again? OK for jdk/jdk? > That looks good to me. > Thank you Andrew! Regards, Ningsheng From christian.hagedorn at oracle.com Thu Sep 3 07:41:53 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 3 Sep 2020 09:41:53 +0200 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: References: Message-ID: <4a2f9e44-50ee-eaa4-55cd-4f4db1c3e014@oracle.com> Hi Cesar Looks good to me but I have not tested the additional architectures. Would be good if someone else could verify that. A small comment about existing code you touched (can be updated inline, no new webrev required): - s390.ad: A space is missing before the comment on L2466 and 2469 Best regards, Christian On 02.09.20 23:26, Cesar Soares Lucas wrote: > Gentle ping. Can anyone PTAL? > ------------------------------------------------------------------------ > *From:* hotspot-compiler-dev > on behalf of Cesar Soares > Lucas > *Sent:* August 27, 2020 12:36 PM > *To:* hotspot-compiler-dev at openjdk.java.net > > *Cc:* Brian Stafford ; Aditya Mandaleeka > ; Christian Hagedorn > *Subject:* [16] RFR(S): 8250668: Clean up method_oop names in adlc > Hi there, > > RFE: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8250668&data=02%7C01%7CDivino.Cesar%40microsoft.com%7Cd5d3da94303a45a0e23a08d84ac08aea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341538114359103&sdata=bNhB3hNulOzo5umJL93i5rkVwOA%2B0RWFEWKODMIyFcE%3D&reserved=0 > > Webrev: > https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcesar%2F8250668%2F0%2F&data=02%7C01%7CDivino.Cesar%40microsoft.com%7Cd5d3da94303a45a0e23a08d84ac08aea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341538114359103&sdata=TLIvb7p9QAnjNw7IEmjQLIUIiUnl51repMl62szUo%2F0%3D&reserved=0 > > Need sponsor: Yes > Tested on: Windows/Linux/MacOS tiers 1-3 > > can I please get some reviews for the Webrev linked above? The work > consists of renaming "method_oop" ocurrences all around the code > base to just "method". I've tested this on x86_64 only?* Can someone > please help testing on other architectures as well: x86_32, PPC, > ARM32/64, S390? > > > Thank you, > Cesar From rwestrel at redhat.com Thu Sep 3 08:10:15 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 03 Sep 2020 10:10:15 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed Message-ID: <877dtb5li0.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8252696/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252696 This came up with testing of 8223051 (long counted loops): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039801.html Here are the steps that lead to the crash: The graph includes a counted loop. The loop body contains a LoadP for an array access. Loop predication optimizes the array access range check and the LoadP becomes control dependent on a predicate. The loop is unswitched. 8240227 (Loop predicates should be copied to unswitched loops) added logic so predicates are cloned for both loops on unswitching and data nodes are made control dependent on the per branch predicates. That logic only applies to RangeCheck predicates but in this case the predicate is an If node because PhaseIdealLoop::rc_predicate() built it with a CmpUL (to protect from overflow). So after unswitching, the LoadP is still control dependent on the predicate that was added at predication time and dominates both loops and the test that triggered unswitching. A pre/post loop is then created and the main loop is unrolled a few times. The exit condition of the main loop is proven to be always false so the loop nodes of the main loop are optimized out but the main loop body remains. The LoadP nodes (several of them now because of unrolling) can float above the loop unswitching test which causes and out of bound array load to be perfomed: the main loop is not executed but a LoadP that's only valid when in the loop body "espaces" the loop body. I couldn't write a test case that reproduces it with current jdk code but this doesn't seem specific to the long counted loop patch. The root cause is that 8240227 only considers RangeCheck predicates when it should consider also If predicates. The current logic copies all RangeCheck predicates to both unswitched loops and then eliminates the dominating ones that it finds useless. Removing dominated predicates that have no control dependent data nodes the way it's done in the current logic is suspicious: data nodes are sometimes logically dependent on multiple range checks but control dependent on only one. I also believe only skeleton predicates need to be copied (so the skeletons can be expanded when the pre/main/post loops are created and on unrolling) and that's what I propose in the patch. I also removed the assert because AFAIU: new_node != NULL && old_node->_idx >= idx_before_clone is simply an impossible condition: first part is true if old_node was cloned to new_node and second part is true if old_node is a clone. Roland. From aph at redhat.com Thu Sep 3 08:34:58 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Sep 2020 09:34:58 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> Message-ID: On 02/09/2020 18:40, Vladimir Ivanov wrote: > C2 doesn't rematerialize FP constants on AArch64 I think that's probably right, even though a few FP constants are cheap. Loads from L1 usually have a 4 (or 5) cycle latency, which can make spilling constants expensive around calls. We could be smarter about this. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jamsheed.c.m at oracle.com Thu Sep 3 08:43:09 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 3 Sep 2020 14:13:09 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> <03df9364-817d-04d6-6434-80be93a66526@oracle.com> <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> Message-ID: Hi David, Thank you for the review and feedback. Revised webrev here: http://cr.openjdk.java.net/~jcm/8249451/webrev.02/ On 03/09/2020 05:50, David Holmes wrote: > Hi Jamsheed, > > On 1/09/2020 10:36 pm, Jamsheed C M wrote: >> Hi David, >> >> I reworked the patch, revised webrev here: >> http://cr.openjdk.java.net/~jcm/8249451/webrev.01/ > > Thanks. The new macros and injected field for InternalError look good. > > A couple of minor comments below but overall this looks good to me. > >> In addition I moved UnlockFlagSaver fs(this) to more local scope. >> >> also removed changes done for JDK-8246727, as it will be separately >> handled by the bug. >> >> Testing: injected and tested async exceptions randomly at compilation >> request path and deopt path. > > I noticed in deoptimization.cpp that here: > > 1965?????? load_class_by_index(constants, unloaded_class_index, THREAD); > > we can now return with a pending async exception and it is unclear > whether the code following this will be able to handle that, or indeed > whether the caller will be able to handle it. Did you specifically > test this site? > Yes, I browsed through the code path, it is equipped(let it be c2 only UC, or various deopt variant). JVMCI aot code too is equipped to handle it( there are forwarding code present at foreign call exit) > --- > > src/hotspot/share/jvmci/jvmciRuntime.cpp > > The comment at: > > ? 80 //?? 1. The pending exception is cleared > > should be updated now that asyncs are not cleared. Done. > > --- > > src/hotspot/share/compiler/tieredThresholdPolicy.* > > The changes from JavaThread* to Thread* look unnecessary for 90% of > the cases, but the overall change seems to be dictated by the few > methods that do use CHECK*. :( No point agonising over this now as I'm > trying to deal with this general problem as a separate RFE - JDK-8252685. > Thank you and Best regards, Jamsheed > Thanks, > David > ----- > >> Best regards, >> >> Jamsheed >> >> On 24/08/2020 11:06, Jamsheed C M wrote: >>> Hi David, >>> >>> Thank you for the review and feedback. Agree on all of them. I will >>> rework and get back. >>> >>> On 10/08/2020 07:33, David Holmes wrote: >>>> Hi Jamsheed, >>>> >>>> On 6/08/2020 10:07 pm, Jamsheed C M wrote: >>>>> Hi all, >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ >>>> >>>> Thanks for tackling this messy issue. Overall I like the use of >>>> TRAPS to more clearly document which methods can return with an >>>> exception pending. I think there are some problems with the >>>> proposed changes. I'll start with those comments and then move on >>>> to more general comments. >>>> >>>> src/hotspot/share/utilities/exceptions.cpp >>>> src/hotspot/share/utilities/exceptions.hpp >>>> >>>> I don't think the changes here are correct or safe in general. >>>> >>>> First, adding the new macro and function to only clear non-async >>>> exceptions is fine itself. But naming wise the fact only non-async >>>> exceptions are cleared should be evident, and there is no "check" >>>> involved (in the sense of the existing CHECK_ macros) so I suggest: >>>> >>>> s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ >>>> s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ >>>> >>> Ok >>>> But changing the existing CHECK_AND_CLEAR macros to now leave async >>>> exceptions pending seems potentially dangerous as calling code may >>>> not be prepared for there to now be a pending exception. For >>>> example the use in thread.cpp: >>>> >>>> ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); >>>> ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); >>>> >>>> get_java_runtime_name() is currently guaranteed to clear all >>>> exceptions, so all the other code is known to be safe to call. But >>>> that would no longer be true. That said, this is VM initialization >>>> code and an async exception is impossible at this stage. >>>> >>>> I think I would rather see CHECK_AND_CLEAR left as-is, and an >>>> actual CHECK_AND_CLEAR_NONASYNC introduced for those users of >>>> CHECK_AND_CLEAR that can encounter async exceptions and which >>>> should not clear them. >>>> >>>> +?? if >>>> (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && >>>> +?????? _pending_exception->klass() != >>>> SystemDictionary::InternalError_klass()) { >>>> >>> Ok >>>> Flagging all InternalErrors as async exceptions is probably also >>>> not correct. I don't see a good solution to this at the moment. I >>>> think we would need to introduce a new subclass of InternalError >>>> for the unsafe access error case**. Now it may be that all the >>>> other InternalError usages are "impossible" in the context of where >>>> the new macros are to be used, but that is very difficult to >>>> establish or assert. >>>> >>>> ** Or perhaps we could inject a field that allows the VM to >>>> identify instances related to unsafe access errors ... Ideally of >>>> course these unsafe access errors would be distinct from the async >>>> exception mechanism - something I would still like to pursue. >>>> >>> Ok >>>> --- >>>> >>>> General comments ... >>>> >>>> There is a general change from "JavaThread* thread" to "Thread* >>>> THREAD" (or TRAPS) to allow the use of the CHECK macros. This is >>>> unfortunate because the fact the thread is restricted to being a >>>> JavaThread is no longer evident in the method signatures. That is a >>>> flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the >>>> methods no longer take a JavaThread* arg, they should assert that >>>> THREAD->is_Java_thread(). I will also look at an RFE to have >>>> as_JavaThread() to avoid the need for separate assertion checks >>>> before casting from "Thread*" to "JavaThread*". >>>> >>> Ok >>>> Note there's no need to use CHECK when the enclosing method is >>>> going to return immediately after the call that contains the CHECK. >>>> It just adds unnecessary checking of the exception state. The use >>>> of TRAPS shows that the methods may return with an exception >>>> pending. I've flagged all such occurrences I spotted below. >>>> >>> Ok >>>> --- >>>> >>>> +?? // Only metaspace OOM is expected. no Java code executed. >>>> >>>> Nit: s/no/No >>>> >>>> >>>> src/hotspot/share/compiler/compilationPolicy.cpp >>>> >>>> >>>> ?410?????? method_invocation_event(method, CHECK_NULL); >>>> ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, >>>> comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK); >>>> >>>> Nit: there's no need to use CHECK here. >>>> >>>> --- >>>> >>>> src/hotspot/share/compiler/tieredThresholdPolicy.cpp >>>> >>>> ?504???? method_invocation_event(method, inlinee, comp_level, nm, >>>> CHECK_NULL); >>>> ?570???????? compile(mh, bci, CompLevel_simple, CHECK); >>>> ?581???????? compile(mh, bci, CompLevel_simple, CHECK); >>>> ?595???? CompileBroker::compile_method(mh, bci, level, mh, >>>> hot_count, CompileTask::Reason_Tiered, CHECK); >>>> 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK); >>>> >>>> Nit: there's no need to use CHECK here. >>>> >>>> 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, >>>> Thread* THREAD) { >>>> >>>> Thank you for correcting this misuse of the THREAD name on a >>>> JavaThread* type. >>>> >>>> --- >>>> >>>> src/hotspot/share/interpreter/linkResolver.cpp >>>> >>>> ?128 CompilationPolicy::compile_if_required(selected_method, CHECK); >>>> >>>> Nit: there's no need to use CHECK here. >>>> >>>> --- >>>> >>>> src/hotspot/share/jvmci/compilerRuntime.cpp >>>> >>>> ?260???? CompilationPolicy::policy()->event(emh, mh, >>>> InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); >>>> ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, >>>> mh, branch_bci, target_bci, CompLevel_aot, cm, CHECK); >>>> >>>> Nit: there's no need to use CHECK here. >>>> >>>> --- >>>> >>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>> >>>> ?102???????? // Donot clear probable async exceptions. >>>> >>>> typo: s/Donot/Do not/ >>>> >>>> --- >>>> >>>> src/hotspot/share/runtime/deoptimization.cpp >>>> >>>> 1686 void Deoptimization::load_class_by_index(const >>>> constantPoolHandle& constant_pool, int index) { >>>> >>>> This method should be declared with TRAPS now. >>>> >>>> 1693???? // Donot clear probable Async Exceptions. >>>> >>>> typo: s/Donot/Do not/ >>>> >>>> >>> Ok >>>>> testing : mach1-5(links in jbs) >>>> >>>> There is very little existing testing that will actually test the >>>> key changes you have made here. You will need to do direct >>>> fault-injection testing anywhere you now allow async exceptions to >>>> remain, to see if the calling code can tolerate that. It will be >>>> difficult to test thoroughly. >>>> >>> Ok >>>> Thanks again for tackling this difficult problem! >>> >>> Best regards, >>> >>> Jamsheed >>> >>>> >>>> David >>>> ----- >>>> >>>>> >>>>> While working on JDK-8246381 it was noticed that compilation >>>>> request path clears all exceptions(including async) and doesn't >>>>> propagate[1]. >>>>> >>>>> Fix: patch restores the propagation behavior for the probable >>>>> async exceptions. >>>>> >>>>> Compilation request path propagate exception as in [2]. MDO and >>>>> MethodCounter doesn't expect any exception other than metaspace >>>>> OOM(added comments). >>>>> >>>>> Deoptimization path doesn't clear probable async exceptions and >>>>> take unpack_exception path for non uncommontraps. >>>>> >>>>> Added java_lang_InternalError to well known classes. >>>>> >>>>> Request for review. >>>>> >>>>> Best Regards, >>>>> >>>>> Jamsheed >>>>> >>>>> [1] w.r.t changes done for JDK-7131259 >>>>> >>>>> [2] >>>>> >>>>> ???? (a) >>>>> ???? -----> >>>>> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp >>>>> ?????? | >>>>> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp >>>>> ????????? | >>>>> ?????????? ------ compileBroker.cpp >>>>> >>>>> ???? (b) >>>>> ???? Xcomp versions >>>>> ???? ------> compilationPolicy.cpp >>>>> ??????? | >>>>> ???????? ------> compileBroker.cpp >>>>> >>>>> ???? (c) >>>>> >>>>> ???? Direct call to? compile_method in compileBroker.cpp >>>>> >>>>> ???? JVMCI bootstrap, whitebox, replayCompile. >>>>> >>>>> From xxinliu at amazon.com Thu Sep 3 08:56:45 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 3 Sep 2020 08:56:45 +0000 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> References: , <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> Message-ID: <1599123405363.99306@amazon.com> hi, Nhat and Christian, I reviewed this patch. I feel resuming the old worklist looks suspicious. PhaseRenumberLive renumbers nodes associates with gvn. old _worklist might have wrong idx and types. Instead of resuming old _worklist, you can copy out nodes from new_worklist to your current _worklist. diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp --- a/src/hotspot/share/opto/compile.cpp +++ b/src/hotspot/share/opto/compile.cpp @@ -2089,7 +2089,10 @@ ResourceMark rm; PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); } - set_for_igvn(&new_worklist); + for_igvn()->clear(); + while (new_worklist.size() > 0) { + for_igvn()->push(new_worklist.rpop()); + } igvn = PhaseIterGVN(initial_gvn()); igvn.optimize(); } thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Christian Hagedorn Sent: Thursday, August 27, 2020 7:54 AM To: Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Nhat Looks good to me! Just make sure you that next time you assign the bug to you or a sponsor and/or leave a comment that you intend to work on it to avoid the possibility of some duplicated work (was no problem in this case) ;-) Best regards, Christian On 26.08.20 20:55, Nhat Nguyen wrote: > Hi hotspot-compiler-dev, > > Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 > The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. > I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. > > webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ > > Thank you, > Nhat > From goetz.lindenmaier at sap.com Thu Sep 3 08:59:09 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 3 Sep 2020 08:59:09 +0000 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: <4a2f9e44-50ee-eaa4-55cd-4f4db1c3e014@oracle.com> References: <4a2f9e44-50ee-eaa4-55cd-4f4db1c3e014@oracle.com> Message-ID: Hi, I have put it into our testing queue. I'll see potential problems tomorrow. This includes ppc, s390 and x86_32 but no arm platforms. Best regards, Goetz > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Christian Hagedorn > Sent: Thursday, September 3, 2020 9:42 AM > To: Cesar Soares Lucas ; hotspot-compiler- > dev at openjdk.java.net > Cc: Brian Stafford ; Aditya Mandaleeka > > Subject: Re: [16] RFR(S): 8250668: Clean up method_oop names in adlc > > Hi Cesar > > Looks good to me but I have not tested the additional architectures. > Would be good if someone else could verify that. > > A small comment about existing code you touched (can be updated inline, > no new webrev required): > - s390.ad: A space is missing before the comment on L2466 and 2469 > > Best regards, > Christian > > > On 02.09.20 23:26, Cesar Soares Lucas wrote: > > Gentle ping. Can anyone PTAL? > > ------------------------------------------------------------------------ > > *From:* hotspot-compiler-dev > > on behalf of Cesar Soares > > Lucas > > *Sent:* August 27, 2020 12:36 PM > > *To:* hotspot-compiler-dev at openjdk.java.net > > > > *Cc:* Brian Stafford ; Aditya Mandaleeka > > ; Christian Hagedorn > > > *Subject:* [16] RFR(S): 8250668: Clean up method_oop names in adlc > > Hi there, > > > > RFE: > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > .openjdk.java.net%2Fbrowse%2FJDK- > 8250668&data=02%7C01%7CDivino.Cesar%40microsoft.com%7Cd5d3d > a94303a45a0e23a08d84ac08aea%7C72f988bf86f141af91ab2d7cd011db47% > 7C1%7C0%7C637341538114359103&sdata=bNhB3hNulOzo5umJL93i5r > kVwOA%2B0RWFEWKODMIyFcE%3D&reserved=0 > > > om/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK- > 8250668&data=02*7C01*7CDivino.Cesar*40microsoft.com*7Cd5d3da > 94303a45a0e23a08d84ac08aea*7C72f988bf86f141af91ab2d7cd011db47*7 > C1*7C0*7C637341538114359103&sdata=bNhB3hNulOzo5umJL93i5rkV > wOA*2B0RWFEWKODMIyFcE*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!! > GqivPVa7Brio!Ih6FuXwpcsWfgoea3sa1T1otS- > K5GZjunl76Py2Dd1KIYh0i1GKV_jDasn-6y7PjH4z8rA$> > > Webrev: > > > https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fcr.openj > dk.java.net%2F~adityam%2Fcesar%2F8250668%2F0%2F&data=02%7C0 > 1%7CDivino.Cesar%40microsoft.com%7Cd5d3da94303a45a0e23a08d84ac0 > 8aea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63734153811 > 4359103&sdata=TLIvb7p9QAnjNw7IEmjQLIUIiUnl51repMl62szUo%2F0 > %3D&reserved=0 > > > om/?url=https:*2F*2Fcr.openjdk.java.net*2F*adityam*2Fcesar*2F8250668* > 2F0*2F&data=02*7C01*7CDivino.Cesar*40microsoft.com*7Cd5d3da94 > 303a45a0e23a08d84ac08aea*7C72f988bf86f141af91ab2d7cd011db47*7C1 > *7C0*7C637341538114359103&sdata=TLIvb7p9QAnjNw7IEmjQLIUIiUnl > 51repMl62szUo*2F0*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUlJSUl!!GqivP > Va7Brio!Ih6FuXwpcsWfgoea3sa1T1otS- > K5GZjunl76Py2Dd1KIYh0i1GKV_jDasn-6y7OU4FukqQ$> > > Need sponsor: Yes > > Tested on: Windows/Linux/MacOS tiers 1-3 > > > > can I please get some reviews for the Webrev linked above? The work > > consists of renaming "method_oop" ocurrences all around the code > > base to just "method". I've tested this on x86_64 only?* Can someone > > please help testing on other architectures as well: x86_32, PPC, > > ARM32/64, S390? > > > > > > Thank you, > > Cesar From jamsheed.c.m at oracle.com Thu Sep 3 09:09:12 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 3 Sep 2020 14:39:12 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> <03df9364-817d-04d6-6434-80be93a66526@oracle.com> <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> Message-ID: <082708b2-7a8a-22e7-d35b-99dbb96d6653@oracle.com> Some edits On 03/09/2020 14:13, Jamsheed C M wrote: > Hi David, > > Thank you for the review and feedback. > > Revised webrev here: http://cr.openjdk.java.net/~jcm/8249451/webrev.02/ > > On 03/09/2020 05:50, David Holmes wrote: >> Hi Jamsheed, >> >> On 1/09/2020 10:36 pm, Jamsheed C M wrote: >>> Hi David, >>> >>> I reworked the patch, revised webrev here: >>> http://cr.openjdk.java.net/~jcm/8249451/webrev.01/ >> >> Thanks. The new macros and injected field for InternalError look good. >> >> A couple of minor comments below but overall this looks good to me. >> >>> In addition I moved UnlockFlagSaver fs(this) to more local scope. >>> >>> also removed changes done for JDK-8246727, as it will be separately >>> handled by the bug. >>> >>> Testing: injected and tested async exceptions randomly at >>> compilation request path and deopt path. >> >> I noticed in deoptimization.cpp that here: >> >> 1965?????? load_class_by_index(constants, unloaded_class_index, THREAD); >> >> we can now return with a pending async exception and it is unclear >> whether the code following this will be able to handle that, or >> indeed whether the caller will be able to handle it. Did you >> specifically test this site? >> > Yes, I browsed through the code path, it is equipped(let it be c2 only > UC, or various deopt variant). Below comment is for aot compilation request path. sorry for mixing up > JVMCI aot code too is equipped to handle it( there are forwarding code > present at foreign call exit) > Best regards, Jamsheed >> --- >> >> src/hotspot/share/jvmci/jvmciRuntime.cpp >> >> The comment at: >> >> ? 80 //?? 1. The pending exception is cleared >> >> should be updated now that asyncs are not cleared. > Done. >> >> --- >> >> src/hotspot/share/compiler/tieredThresholdPolicy.* >> >> The changes from JavaThread* to Thread* look unnecessary for 90% of >> the cases, but the overall change seems to be dictated by the few >> methods that do use CHECK*. :( No point agonising over this now as >> I'm trying to deal with this general problem as a separate RFE - >> JDK-8252685. >> > Thank you and Best regards, > > Jamsheed > >> Thanks, >> David >> ----- >> >>> Best regards, >>> >>> Jamsheed >>> >>> On 24/08/2020 11:06, Jamsheed C M wrote: >>>> Hi David, >>>> >>>> Thank you for the review and feedback. Agree on all of them. I will >>>> rework and get back. >>>> >>>> On 10/08/2020 07:33, David Holmes wrote: >>>>> Hi Jamsheed, >>>>> >>>>> On 6/08/2020 10:07 pm, Jamsheed C M wrote: >>>>>> Hi all, >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ >>>>> >>>>> Thanks for tackling this messy issue. Overall I like the use of >>>>> TRAPS to more clearly document which methods can return with an >>>>> exception pending. I think there are some problems with the >>>>> proposed changes. I'll start with those comments and then move on >>>>> to more general comments. >>>>> >>>>> src/hotspot/share/utilities/exceptions.cpp >>>>> src/hotspot/share/utilities/exceptions.hpp >>>>> >>>>> I don't think the changes here are correct or safe in general. >>>>> >>>>> First, adding the new macro and function to only clear non-async >>>>> exceptions is fine itself. But naming wise the fact only non-async >>>>> exceptions are cleared should be evident, and there is no "check" >>>>> involved (in the sense of the existing CHECK_ macros) so I suggest: >>>>> >>>>> s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ >>>>> s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ >>>>> >>>> Ok >>>>> But changing the existing CHECK_AND_CLEAR macros to now leave >>>>> async exceptions pending seems potentially dangerous as calling >>>>> code may not be prepared for there to now be a pending exception. >>>>> For example the use in thread.cpp: >>>>> >>>>> ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); >>>>> ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); >>>>> >>>>> get_java_runtime_name() is currently guaranteed to clear all >>>>> exceptions, so all the other code is known to be safe to call. But >>>>> that would no longer be true. That said, this is VM initialization >>>>> code and an async exception is impossible at this stage. >>>>> >>>>> I think I would rather see CHECK_AND_CLEAR left as-is, and an >>>>> actual CHECK_AND_CLEAR_NONASYNC introduced for those users of >>>>> CHECK_AND_CLEAR that can encounter async exceptions and which >>>>> should not clear them. >>>>> >>>>> +?? if >>>>> (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && >>>>> +?????? _pending_exception->klass() != >>>>> SystemDictionary::InternalError_klass()) { >>>>> >>>> Ok >>>>> Flagging all InternalErrors as async exceptions is probably also >>>>> not correct. I don't see a good solution to this at the moment. I >>>>> think we would need to introduce a new subclass of InternalError >>>>> for the unsafe access error case**. Now it may be that all the >>>>> other InternalError usages are "impossible" in the context of >>>>> where the new macros are to be used, but that is very difficult to >>>>> establish or assert. >>>>> >>>>> ** Or perhaps we could inject a field that allows the VM to >>>>> identify instances related to unsafe access errors ... Ideally of >>>>> course these unsafe access errors would be distinct from the async >>>>> exception mechanism - something I would still like to pursue. >>>>> >>>> Ok >>>>> --- >>>>> >>>>> General comments ... >>>>> >>>>> There is a general change from "JavaThread* thread" to "Thread* >>>>> THREAD" (or TRAPS) to allow the use of the CHECK macros. This is >>>>> unfortunate because the fact the thread is restricted to being a >>>>> JavaThread is no longer evident in the method signatures. That is >>>>> a flaw with the TRAPS/CHECK mechanism unfortunately :( . But as >>>>> the methods no longer take a JavaThread* arg, they should assert >>>>> that THREAD->is_Java_thread(). I will also look at an RFE to have >>>>> as_JavaThread() to avoid the need for separate assertion checks >>>>> before casting from "Thread*" to "JavaThread*". >>>>> >>>> Ok >>>>> Note there's no need to use CHECK when the enclosing method is >>>>> going to return immediately after the call that contains the >>>>> CHECK. It just adds unnecessary checking of the exception state. >>>>> The use of TRAPS shows that the methods may return with an >>>>> exception pending. I've flagged all such occurrences I spotted below. >>>>> >>>> Ok >>>>> --- >>>>> >>>>> +?? // Only metaspace OOM is expected. no Java code executed. >>>>> >>>>> Nit: s/no/No >>>>> >>>>> >>>>> src/hotspot/share/compiler/compilationPolicy.cpp >>>>> >>>>> >>>>> ?410?????? method_invocation_event(method, CHECK_NULL); >>>>> ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, >>>>> comp_level, m, hot_count, CompileTask::Reason_InvocationCount, >>>>> CHECK); >>>>> >>>>> Nit: there's no need to use CHECK here. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/compiler/tieredThresholdPolicy.cpp >>>>> >>>>> ?504???? method_invocation_event(method, inlinee, comp_level, nm, >>>>> CHECK_NULL); >>>>> ?570???????? compile(mh, bci, CompLevel_simple, CHECK); >>>>> ?581???????? compile(mh, bci, CompLevel_simple, CHECK); >>>>> ?595???? CompileBroker::compile_method(mh, bci, level, mh, >>>>> hot_count, CompileTask::Reason_Tiered, CHECK); >>>>> 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK); >>>>> >>>>> Nit: there's no need to use CHECK here. >>>>> >>>>> 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, >>>>> Thread* THREAD) { >>>>> >>>>> Thank you for correcting this misuse of the THREAD name on a >>>>> JavaThread* type. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/interpreter/linkResolver.cpp >>>>> >>>>> ?128 CompilationPolicy::compile_if_required(selected_method, CHECK); >>>>> >>>>> Nit: there's no need to use CHECK here. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/jvmci/compilerRuntime.cpp >>>>> >>>>> ?260???? CompilationPolicy::policy()->event(emh, mh, >>>>> InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); >>>>> ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, >>>>> mh, branch_bci, target_bci, CompLevel_aot, cm, CHECK); >>>>> >>>>> Nit: there's no need to use CHECK here. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>> >>>>> ?102???????? // Donot clear probable async exceptions. >>>>> >>>>> typo: s/Donot/Do not/ >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/runtime/deoptimization.cpp >>>>> >>>>> 1686 void Deoptimization::load_class_by_index(const >>>>> constantPoolHandle& constant_pool, int index) { >>>>> >>>>> This method should be declared with TRAPS now. >>>>> >>>>> 1693???? // Donot clear probable Async Exceptions. >>>>> >>>>> typo: s/Donot/Do not/ >>>>> >>>>> >>>> Ok >>>>>> testing : mach1-5(links in jbs) >>>>> >>>>> There is very little existing testing that will actually test the >>>>> key changes you have made here. You will need to do direct >>>>> fault-injection testing anywhere you now allow async exceptions to >>>>> remain, to see if the calling code can tolerate that. It will be >>>>> difficult to test thoroughly. >>>>> >>>> Ok >>>>> Thanks again for tackling this difficult problem! >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> >>>>>> While working on JDK-8246381 it was noticed that compilation >>>>>> request path clears all exceptions(including async) and doesn't >>>>>> propagate[1]. >>>>>> >>>>>> Fix: patch restores the propagation behavior for the probable >>>>>> async exceptions. >>>>>> >>>>>> Compilation request path propagate exception as in [2]. MDO and >>>>>> MethodCounter doesn't expect any exception other than metaspace >>>>>> OOM(added comments). >>>>>> >>>>>> Deoptimization path doesn't clear probable async exceptions and >>>>>> take unpack_exception path for non uncommontraps. >>>>>> >>>>>> Added java_lang_InternalError to well known classes. >>>>>> >>>>>> Request for review. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> [1] w.r.t changes done for JDK-7131259 >>>>>> >>>>>> [2] >>>>>> >>>>>> ???? (a) >>>>>> ???? -----> >>>>>> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp >>>>>> ?????? | >>>>>> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp >>>>>> ????????? | >>>>>> ?????????? ------ compileBroker.cpp >>>>>> >>>>>> ???? (b) >>>>>> ???? Xcomp versions >>>>>> ???? ------> compilationPolicy.cpp >>>>>> ??????? | >>>>>> ???????? ------> compileBroker.cpp >>>>>> >>>>>> ???? (c) >>>>>> >>>>>> ???? Direct call to? compile_method in compileBroker.cpp >>>>>> >>>>>> ???? JVMCI bootstrap, whitebox, replayCompile. >>>>>> >>>>>> From christian.hagedorn at oracle.com Thu Sep 3 09:31:53 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 3 Sep 2020 11:31:53 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: <877dtb5li0.fsf@redhat.com> References: <877dtb5li0.fsf@redhat.com> Message-ID: <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> Hi Roland Nice analysis! This fix sounds reasonable to me. Have I understood that correctly that in your testcase, the main loop is just unrolled enough such that the loop nodes can be removed (so no over-unrolling)? > [..] and then eliminates the > dominating ones that it finds useless. Removing dominated predicates > that have no control dependent data nodes the way it's done in the > current logic is suspicious: data nodes are sometimes logically > dependent on multiple range checks but control dependent on only one. That's a valid point. Is there a better way we could find out which predicates are useless after cloning them to the slow and fast loop or is it either way not a problem to keep the original ones alive? > new_node != NULL && old_node->_idx >= idx_before_clone > > is simply an impossible condition: first part is true if old_node was > cloned to new_node and second part is true if old_node is a clone. I agree, I think I added this back there because I was temporarily modifying the old_new mapping in a non-conform way on L302. So, I just wanted to be sure that everything is reset and works as intended. But I'm also fine with just removing this assertion code. Some small comment: - You probably don't need the assertion on L257 in loopPredicate.cpp as you are already checking it in the if-condition on L253. - Same file, you should update the assert message on L268 to something like "... projection of an If node". Best regards, Christian On 03.09.20 10:10, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8252696/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252696 > > This came up with testing of 8223051 (long counted loops): > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039801.html > > Here are the steps that lead to the crash: > > The graph includes a counted loop. The loop body contains a LoadP for an > array access. Loop predication optimizes the array access range check > and the LoadP becomes control dependent on a predicate. The loop is > unswitched. 8240227 (Loop predicates should be copied to unswitched > loops) added logic so predicates are cloned for both loops on > unswitching and data nodes are made control dependent on the per branch > predicates. That logic only applies to RangeCheck predicates but in this > case the predicate is an If node because PhaseIdealLoop::rc_predicate() > built it with a CmpUL (to protect from overflow). So after unswitching, > the LoadP is still control dependent on the predicate that was added at > predication time and dominates both loops and the test that triggered > unswitching. A pre/post loop is then created and the main loop is > unrolled a few times. The exit condition of the main loop is proven to > be always false so the loop nodes of the main loop are optimized out but > the main loop body remains. The LoadP nodes (several of them now because > of unrolling) can float above the loop unswitching test which causes and > out of bound array load to be perfomed: the main loop is not executed > but a LoadP that's only valid when in the loop body "espaces" the loop > body. > > I couldn't write a test case that reproduces it with current jdk code > but this doesn't seem specific to the long counted loop patch. The root > cause is that 8240227 only considers RangeCheck predicates when it > should consider also If predicates. The current logic copies all > RangeCheck predicates to both unswitched loops and then eliminates the > dominating ones that it finds useless. Removing dominated predicates > that have no control dependent data nodes the way it's done in the > current logic is suspicious: data nodes are sometimes logically > dependent on multiple range checks but control dependent on only one. I > also believe only skeleton predicates need to be copied (so the > skeletons can be expanded when the pre/main/post loops are created and > on unrolling) and that's what I propose in the patch. I also removed the > assert because AFAIU: > > new_node != NULL && old_node->_idx >= idx_before_clone > > is simply an impossible condition: first part is true if old_node was > cloned to new_node and second part is true if old_node is a clone. > > Roland. > From christian.hagedorn at oracle.com Thu Sep 3 09:50:52 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 3 Sep 2020 11:50:52 +0200 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <1599123405363.99306@amazon.com> References: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> <1599123405363.99306@amazon.com> Message-ID: Hi Xin I'm not sure if we really want to copy all the nodes from new_worklist back to for_igvn() when it's not used anymore afterwards. I thought that this RFE just intents to keep a valid pointer in Compile::_for_igvn. Nevertheless, I also took a closer look at that code and it seems that for_igvn() is not even required for PhaseRenumberLive? It's cleared just before the phase on L2086 and then PhaseRenumberLive does not seem to add anything to it and neither does PhaseRemoveUseless. So we are basically restoring an empty list afterwards. However, this could be cleaned up in an additional RFE. Best regards, Christian On 03.09.20 10:56, Liu, Xin wrote: > hi, Nhat and Christian, > > I reviewed this patch. I feel resuming the old worklist looks suspicious. > PhaseRenumberLive renumbers nodes associates with gvn. old _worklist might have wrong idx and types. > > Instead of resuming old _worklist, you can copy out nodes from new_worklist to your current _worklist. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,7 +2089,10 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > - set_for_igvn(&new_worklist); > + for_igvn()->clear(); > + while (new_worklist.size() > 0) { > + for_igvn()->push(new_worklist.rpop()); > + } > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > } > > > thanks, > --lx > > ________________________________________ > From: hotspot-compiler-dev on behalf of Christian Hagedorn > Sent: Thursday, August 27, 2020 7:54 AM > To: Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Nhat > > Looks good to me! > > Just make sure you that next time you assign the bug to you or a sponsor > and/or leave a comment that you intend to work on it to avoid the > possibility of some duplicated work (was no problem in this case) ;-) > > Best regards, > Christian > > On 26.08.20 20:55, Nhat Nguyen wrote: >> Hi hotspot-compiler-dev, >> >> Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 >> The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. >> I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. >> >> webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ >> >> Thank you, >> Nhat >> From jamsheed.c.m at oracle.com Thu Sep 3 10:08:56 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 3 Sep 2020 15:38:56 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> <03df9364-817d-04d6-6434-80be93a66526@oracle.com> <1cc00e89-467e-f976-56a7-630f8782d61a@oracle.com> Message-ID: Hi David, On 03/09/2020 05:50, David Holmes wrote: > > we can now return with a pending async exception and it is unclear > whether the code following this will be able to handle that, or indeed > whether the caller will be able to handle it. Did you specifically > test this site? Yes i specifically tested this site for C2 UC trap case. It works fine. Best regards, Jamsheed From vladimir.x.ivanov at oracle.com Thu Sep 3 10:20:04 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 3 Sep 2020 13:20:04 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> Message-ID: <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> > I have also tried forcing String and other classes referenced in the > generated code to load into each OneShotClassLoader, but the problem > remains. After taking a closer look at your patch, I noticed 2 important aspects needed to make it work: (1) Class loading request should go through the JVM, so its result is persisted in SystemDictionary. It means that ClassLoader::loadClass() is not enough, Class::forName() should be used instead. (2) Class loading request should be performed in the context of the script. Since JRuby classes have protection domain set for them, there's additional check required for them as part of class loading request. It is implemented on Java side by ClassLoader::checkPackageAccess, but since JIT-compilers aren't allowed to call into Java, they rely on the cache which keeps all protection domains for which the check succeeded. So, there should be a class loading request performed which goes through the JVM with the very same protection domain instance. So, triggering class loading from static initializer of the script class should do the job. But, while looking through Class::forName() implementation, I noticed that it optionally fills caller info depending on whether SecurityManager is set, but JVM always checks protection domain. So, the only reliable option left to satisfy both #1 and #2 is to trigger loading on bytecode level (for example, by using ldc + CP_Class). Best regards, Vladimir Ivanov > Disabling tiered compilation again fixes it, but I suspect that's just > a lucky side effect for this case. > > So I am still without any leads on fixes or workarounds. > > - Charlie > From dmitry.chuyko at bell-sw.com Thu Sep 3 12:03:48 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Thu, 3 Sep 2020 15:03:48 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> Message-ID: <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> I pushed the change for signum, and opened an RFE [1] for fp re-materialization on aarch64. We've made some experiments already. On N1 simply enabling it makes workloads like Lucene geo search ~0.7% faster, will get back with more data. I haven't made RFE for "Constants as TEMPs" but will be glad to participate. -Dmitry [1] https://bugs.openjdk.java.net/browse/JDK-8252760 On 9/3/20 11:34 AM, Andrew Haley wrote: > On 02/09/2020 18:40, Vladimir Ivanov wrote: >> C2 doesn't rematerialize FP constants on AArch64 > I think that's probably right, even though a few FP constants are > cheap. Loads from L1 usually have a 4 (or 5) cycle latency, which can > make spilling constants expensive around calls. We could be smarter > about this. > From tobias.hartmann at oracle.com Thu Sep 3 12:06:45 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Sep 2020 14:06:45 +0200 Subject: RFR(T) : 8252720 : clean up FileInstaller $test.src $cwd in vmTestbase/vm/compiler/optimizations tests In-Reply-To: <1E77E397-571E-48FF-BC5C-B88663C02A55@oracle.com> References: <1E77E397-571E-48FF-BC5C-B88663C02A55@oracle.com> Message-ID: <575176aa-79d5-96bb-e12d-696681e08097@oracle.com> Hi Igor, looks good and trivial. Best regards, Tobias On 02.09.20 22:18, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 >> 8 lines changed: 0 ins; 4 del; 4 mod; > > > Hi all, > > could you please review this trivial patch? > from JBS: >> somehow I missed vmTestbase/vm/compiler/optimizations when was working on JDK-8251127. this sub-task is to remove FileInstaller actions from vm/compiler/optimizations tests. > > testing: vmTestbase/vm/compiler/optimizations tests > JBS: https://bugs.openjdk.java.net/browse/JDK-8252720 > webrev: http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 > > > Thanks, > -- Igor > From tobias.hartmann at oracle.com Thu Sep 3 12:08:34 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Sep 2020 14:08:34 +0200 Subject: RFR(T) : 8251997 : remove usage of PropertyResolvingWrapper in vmTestbase/vm/mlvm/ In-Reply-To: <1D2DD286-0830-468A-8B7D-6B4928F43DB9@oracle.com> References: <1D2DD286-0830-468A-8B7D-6B4928F43DB9@oracle.com> Message-ID: Hi Igor, looks good and trivial to me. Best regards, Tobias On 03.09.20 00:18, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ >> 27 lines changed: 5 ins; 12 del; 10 mod; > > > Hi all, > > could you please review this small and trivial patch which removes usage of PropertyResolvingWrapper from vm/mlvm tests and reenables "smart action arguments"? > a bit of background from main bug (8219140): >> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. > > testing: vmTestbase/vm/mlvm/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8251997 > webrev: http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ > > Thanks, > -- Igor > From aph at redhat.com Thu Sep 3 12:11:32 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Sep 2020 13:11:32 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> Message-ID: <1641b820-529f-ee61-0871-fe12f41a8cbc@redhat.com> On 03/09/2020 13:03, Dmitry Chuyko wrote: > I pushed the change for signum, and opened an RFE [1] for fp > re-materialization on aarch64. We've made some experiments already. On > N1 simply enabling it makes workloads like Lucene geo search ~0.7% > faster, will get back with more data. Even though we have to push FP constants into the constant pool? Interesting. I guess there's less spilling gets done. It'd be interesting to see the difference in the code. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Thu Sep 3 12:18:24 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Sep 2020 14:18:24 +0200 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> Message-ID: Hi Lutz, looks good to me. Best regards, Tobias On 26.08.20 17:18, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this small enhancement? Instead of calling a method doing complicated and fancy (hard to understand) checks, the iteration over all nmethods is now protected by holding the Compile_lock in addition to the CodeCache_lock. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > Thank you! > Lutz > > From tobias.hartmann at oracle.com Thu Sep 3 12:30:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Sep 2020 14:30:11 +0200 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <1598731717217.87517@amazon.com> References: <1598731717217.87517@amazon.com> Message-ID: <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> Hi Xin, I'm concerned that the output quickly becomes unreadable when dumping a large graph. For example, isn't -XX:+PrintIdeal affected by this as well? Also, looking at the example output you've posted [1], shouldn't the node id be indented as well? This might be helpful when dumping small parts of the graph but then it should be optional (i.e. can be turned on via a flag/argument if needed). Best regards, Tobias [1] https://bugs.openjdk.java.net/secure/attachment/89800/dump2.txt On 29.08.20 22:08, Liu, Xin wrote: > hi, Reviewers, > > > Could you review this patch? > > JBS:https://bugs.openjdk.java.net/browse/JDK-8251464 > > Webrev: > > http://cr.openjdk.java.net/~xliu/8251464/00/webrev/ > > > This patch attempts to improve the formation of nodes when developers try to dump an ideal graph or snippet of a graph. In practice, I found it's pretty handy if Node::dump(int d) can support indent. > > The basic idea is to support indention for the utility function: > > collect_nodes_i(GrowableArray* queue, const Node* start, int direction, uint depth, bool include_start, bool only_ctrl, bool only_data) > > It only affects Node::dump family and -XX::PrintIdeal. It won't impact the output for igv. > This can help developers who try to inspect a cluster of nodes in gdb. > > Another change is naming. collect_nodes_i uses breadth-first search. the container is used in fifo way instead of filo. > I think the name "queue" serve better. > > TEST: > hotspot:tier1 and gtest. > mach-5 > > thanks, > --lx > > From tobias.hartmann at oracle.com Thu Sep 3 13:29:51 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Sep 2020 15:29:51 +0200 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: References: Message-ID: <78ea86f9-a4e1-5a4b-b102-d06bcdb3aaf8@oracle.com> Hi Cesar, looks good to me. constantPool.hpp:498 "Method" -> "Methods" Best regards, Tobias On 27.08.20 21:36, Cesar Soares Lucas wrote: > Hi there, > > RFE: https://bugs.openjdk.java.net/browse/JDK-8250668 > Webrev: https://cr.openjdk.java.net/~adityam/cesar/8250668/0/ > Need sponsor: Yes > Tested on: Windows/Linux/MacOS tiers 1-3 > > can I please get some reviews for the Webrev linked above? The work > consists of renaming "method_oop" ocurrences all around the code > base to just "method". I've tested this on x86_64 only?* Can someone > please help testing on other architectures as well: x86_32, PPC, > ARM32/64, S390? > > > Thank you, > Cesar > From rwestrel at redhat.com Thu Sep 3 15:35:21 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 03 Sep 2020 17:35:21 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> References: <877dtb5li0.fsf@redhat.com> <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> Message-ID: <874koe6fgm.fsf@redhat.com> Hi Christian, Thanks for looking at this. > Nice analysis! This fix sounds reasonable to me. Have I understood that > correctly that in your testcase, the main loop is just unrolled enough > such that the loop nodes can be removed (so no over-unrolling)? Right. >> [..] and then eliminates the >> dominating ones that it finds useless. Removing dominated predicates >> that have no control dependent data nodes the way it's done in the >> current logic is suspicious: data nodes are sometimes logically >> dependent on multiple range checks but control dependent on only one. > > That's a valid point. Is there a better way we could find out which > predicates are useless after cloning them to the slow and fast loop or > is it either way not a problem to keep the original ones alive? AFAIU, that's not a problem with the patch I propose which only clones the skeleton predicates. All of them are required to be copied in case of unrolling. > I agree, I think I added this back there because I was temporarily > modifying the old_new mapping in a non-conform way on L302. So, I just > wanted to be sure that everything is reset and works as intended. But > I'm also fine with just removing this assertion code. Should I remove L302 as well? I'm confused by what's going on here. > Some small comment: > - You probably don't need the assertion on L257 in loopPredicate.cpp as > you are already checking it in the if-condition on L253. Right. I'll remove it. > - Same file, you should update the assert message on L268 to something > like "... projection of an If node". I'll make that change too. Roland. From igor.ignatyev at oracle.com Thu Sep 3 16:23:01 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 3 Sep 2020 09:23:01 -0700 Subject: RFR(T) : 8251997 : remove usage of PropertyResolvingWrapper in vmTestbase/vm/mlvm/ In-Reply-To: References: <1D2DD286-0830-468A-8B7D-6B4928F43DB9@oracle.com> Message-ID: Hi Tobias, thanks for your prompt review, pushed. -- Igor > On Sep 3, 2020, at 5:08 AM, Tobias Hartmann wrote: > > Hi Igor, > > looks good and trivial to me. > > Best regards, > Tobias > > On 03.09.20 00:18, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ >>> 27 lines changed: 5 ins; 12 del; 10 mod; >> >> >> Hi all, >> >> could you please review this small and trivial patch which removes usage of PropertyResolvingWrapper from vm/mlvm tests and reenables "smart action arguments"? >> a bit of background from main bug (8219140): >>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. >> >> testing: vmTestbase/vm/mlvm/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251997 >> webrev: http://cr.openjdk.java.net/~iignatyev//8251997/webrev.00/ >> >> Thanks, >> -- Igor >> From igor.ignatyev at oracle.com Thu Sep 3 16:23:15 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 3 Sep 2020 09:23:15 -0700 Subject: RFR(T) : 8252720 : clean up FileInstaller $test.src $cwd in vmTestbase/vm/compiler/optimizations tests In-Reply-To: <575176aa-79d5-96bb-e12d-696681e08097@oracle.com> References: <1E77E397-571E-48FF-BC5C-B88663C02A55@oracle.com> <575176aa-79d5-96bb-e12d-696681e08097@oracle.com> Message-ID: <04A67E94-964A-4CDB-AD98-926968F75014@oracle.com> Hi Tobias, thanks for your prompt review, pushed. -- Igor > On Sep 3, 2020, at 5:06 AM, Tobias Hartmann wrote: > > Hi Igor, > > looks good and trivial. > > Best regards, > Tobias > > On 02.09.20 22:18, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 >>> 8 lines changed: 0 ins; 4 del; 4 mod; >> >> >> Hi all, >> >> could you please review this trivial patch? >> from JBS: >>> somehow I missed vmTestbase/vm/compiler/optimizations when was working on JDK-8251127. this sub-task is to remove FileInstaller actions from vm/compiler/optimizations tests. >> >> testing: vmTestbase/vm/compiler/optimizations tests >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252720 >> webrev: http://cr.openjdk.java.net/~iignatyev//8252720/webrev.00 >> >> >> Thanks, >> -- Igor >> From headius at headius.com Thu Sep 3 16:32:14 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 3 Sep 2020 11:32:14 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> Message-ID: Ok that's a lot to parse... On Thu, Sep 3, 2020 at 5:20 AM Vladimir Ivanov wrote: > (1) Class loading request should go through the JVM, so its result is > persisted in SystemDictionary. It means that ClassLoader::loadClass() is > not enough, Class::forName() should be used instead. I've switched these loads to forName throughout. > (2) Class loading request should be performed in the context of the > script. Since JRuby classes have protection domain set for them, there's > additional check required for them as part of class loading request. It > is implemented on Java side by ClassLoader::checkPackageAccess, but > since JIT-compilers aren't allowed to call into Java, they rely on the > cache which keeps all protection domains for which the check succeeded. > So, there should be a class loading request performed which goes through > the JVM with the very same protection domain instance. I'm not sure what you mean by "the context of the script". There is no such context; the secondary loaded script (inline.rb) is never itself compiled. It is interpreted for its initial load, and then the "foo" and "bar" methods later get jit-compiled to bytecode as they get hot enough. > So, triggering class loading from static initializer of the script class > should do the job. But, while looking through Class::forName() > implementation, I noticed that it optionally fills caller info depending > on whether SecurityManager is set, but JVM always checks protection > domain. So, the only reliable option left to satisfy both #1 and #2 is > to trigger loading on bytecode level (for example, by using ldc + > CP_Class). I'm not sure I understand fully, but I'm now generously sprinkling class LDCs into my jitted methods' static initializers now. This seems like an incredibly onerous series of requirements just to load some dynamic code. - Charlie From igor.ignatyev at oracle.com Thu Sep 3 17:12:55 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 3 Sep 2020 10:12:55 -0700 Subject: RFR(T) : 8252774 : remove jdk.test.lib.FileInstaller action from graalunit tests Message-ID: http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 > 341 lines changed: 0 ins; 240 del; 101 mod; Hi all, could you please review this small and trivial clean up in test/hotspot/jtreg/compiler/graalunit? from JBS: > test/hotspot/jtreg/compiler/graalunit tests use jdk.test.lib.FileInstaller to copy ProblemList-graal.txt from test/hotspot/jtreg/ to the current working directory as ExcludeList.txt, and then run compiler.graalunit.common.GraalUnitTestLauncher w/ -exclude ExcludeList.txt. > > j.t.l.FileInstaller actions aren't needed as c.g.c.GraalUnitTestLauncher interpeters `-exclude`'s value as path to file (as oppose to the file name in current directory), so we can use ${test.root}/ProblemList-graal.txt instead of ExcludeList.txt there. the patch modifies generateTests.sh to use ${test.root}/ProblemList-graal.txt, cleans it up (removes trailing spaces, empty @summary tag, and redundant explicit @build) and regenerates graalunit tests. testing: test/hotspot/jtreg/compiler/graalunit on {linux,windows,macos}-x64 JBS: https://bugs.openjdk.java.net/browse/JDK-8252774 webrev: http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 Thanks, -- Igor From xxinliu at amazon.com Thu Sep 3 17:29:41 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 3 Sep 2020 17:29:41 +0000 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: References: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> <1599123405363.99306@amazon.com>, Message-ID: <1599154180784.19506@amazon.com> hi, Christian and Nhat, I notice that for_igvn() is cleared before PhaseRenumberLive too, but it's not true that it's an empty worklist in PhaseRenumberLive. The base constructor PhaseRemoveUseless appends some "unique_out" to the for_igvn() Check out "record_for_igvn(n->unique_out())" in C->remove_useless_nodes(_useful); I guess the original code works because new_worklist is in the C->comp_arena. There's no demolition code for Unique_Node_List, so the object is still in good shape even it is out of scope. No doubt that Nhat's patch restores the valid pointer. The bad thing is that its content is corrupted to use. We are lucky because nobody uses it to construct a PhaseIterGVN after RenumberLiveNodes. How about this? We can clear() the worklist out. diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp --- a/src/hotspot/share/opto/compile.cpp +++ b/src/hotspot/share/opto/compile.cpp @@ -2089,9 +2089,12 @@ ResourceMark rm; PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); } + Unique_Node_List* save_for_igvn = for_igvn(); set_for_igvn(&new_worklist); igvn = PhaseIterGVN(initial_gvn()); igvn.optimize(); + set_for_igvn(save_for_igvn); + for_igvn()->clear(); } thanks, --lx ________________________________________ From: Christian Hagedorn Sent: Thursday, September 3, 2020 2:50 AM To: Liu, Xin; Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin I'm not sure if we really want to copy all the nodes from new_worklist back to for_igvn() when it's not used anymore afterwards. I thought that this RFE just intents to keep a valid pointer in Compile::_for_igvn. Nevertheless, I also took a closer look at that code and it seems that for_igvn() is not even required for PhaseRenumberLive? It's cleared just before the phase on L2086 and then PhaseRenumberLive does not seem to add anything to it and neither does PhaseRemoveUseless. So we are basically restoring an empty list afterwards. However, this could be cleaned up in an additional RFE. Best regards, Christian On 03.09.20 10:56, Liu, Xin wrote: > hi, Nhat and Christian, > > I reviewed this patch. I feel resuming the old worklist looks suspicious. > PhaseRenumberLive renumbers nodes associates with gvn. old _worklist might have wrong idx and types. > > Instead of resuming old _worklist, you can copy out nodes from new_worklist to your current _worklist. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,7 +2089,10 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > - set_for_igvn(&new_worklist); > + for_igvn()->clear(); > + while (new_worklist.size() > 0) { > + for_igvn()->push(new_worklist.rpop()); > + } > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > } > > > thanks, > --lx > > ________________________________________ > From: hotspot-compiler-dev on behalf of Christian Hagedorn > Sent: Thursday, August 27, 2020 7:54 AM > To: Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Nhat > > Looks good to me! > > Just make sure you that next time you assign the bug to you or a sponsor > and/or leave a comment that you intend to work on it to avoid the > possibility of some duplicated work (was no problem in this case) ;-) > > Best regards, > Christian > > On 26.08.20 20:55, Nhat Nguyen wrote: >> Hi hotspot-compiler-dev, >> >> Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 >> The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. >> I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. >> >> webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ >> >> Thank you, >> Nhat >> From headius at headius.com Thu Sep 3 18:13:20 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 3 Sep 2020 13:13:20 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> Message-ID: OMG, it might be working... On Thu, Sep 3, 2020 at 11:32 AM Charles Oliver Nutter wrote: > I'm not sure I understand fully, but I'm now generously sprinkling > class LDCs into my jitted methods' static initializers now. This seems > like an incredibly onerous series of requirements just to load some > dynamic code. So I moved the Class.forName's out of the OneShot constructor and into the static initializer (as LDC) in the jitted methods' classes. I went ahead and loaded everything that is in the signatures in question. https://gist.github.com/headius/6408b8392096d7932020870022374a9d Running the original script, I no longer see "unloaded signature class" warnings from PrintInlining, and as shown in the above gist I eventually get the asm I expect! But this is a hacky workaround, right? Do other frameworks that dynamically generate code also have to do this aggressive classloading within that generated code? This doesn't seem right, does it? - Charlie > > - Charlie From gromero at linux.vnet.ibm.com Thu Sep 3 18:31:14 2020 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 3 Sep 2020 15:31:14 -0300 Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: On 8/18/20 5:38 AM, Michihiro Horie wrote: > > Dear all, > > Would you please review a small change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251926 > Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/ > > The load_const_optimized function in assembler_ppc.cpp has an unused > variable named return_xd. It looks unnecessary in the current code. Hi Michi, The change looks good to me. As Thomas said that change can be considered trivial, so you can push it right through to jdk master repo without a second Review (only one Review is sufficient for trivial changes). Best regards, Gustavo From igor.ignatyev at oracle.com Thu Sep 3 18:40:59 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 3 Sep 2020 11:40:59 -0700 Subject: RFR(T) : 8252778 : remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test Message-ID: http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ > 2 lines changed: 0 ins; 1 del; 1 mod; Hi all, could you please review this small and trivial cleanup? from JBS: > `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" testing: compiler/c2/stemmer on {linux,windows,macos}-x64 JBS: https://bugs.openjdk.java.net/browse/JDK-8252778 webrev: http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ Thanks, -- Igor From forax at univ-mlv.fr Thu Sep 3 19:42:51 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 3 Sep 2020 21:42:51 +0200 (CEST) Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> Message-ID: <345888053.671840.1599162171937.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Charles Oliver Nutter" > ?: "Vladimir Ivanov" > Cc: "hotspot compiler" > Envoy?: Jeudi 3 Septembre 2020 20:13:20 > Objet: Re: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby > OMG, it might be working... > > On Thu, Sep 3, 2020 at 11:32 AM Charles Oliver Nutter > wrote: >> I'm not sure I understand fully, but I'm now generously sprinkling >> class LDCs into my jitted methods' static initializers now. This seems >> like an incredibly onerous series of requirements just to load some >> dynamic code. > > So I moved the Class.forName's out of the OneShot constructor and into > the static initializer (as LDC) in the jitted methods' classes. I went > ahead and loaded everything that is in the signatures in question. > > https://gist.github.com/headius/6408b8392096d7932020870022374a9d > > Running the original script, I no longer see "unloaded signature > class" warnings from PrintInlining, and as shown in the above gist I > eventually get the asm I expect! > > But this is a hacky workaround, right? Do other frameworks that > dynamically generate code also have to do this aggressive classloading > within that generated code? This doesn't seem right, does it? Hi Charles, I will say something that doesn't help you, i tries to not use one classloader per method and instead use a combination of ahead of time compilation + lookup.defineClass + defineAnonymousClass. I can do that because my base version is Java 11 not Java 8. I plan to use Hidden classes soon, i've already a prototype but i still need to study where to use weak or strong hidden classes. R?mi > > - Charlie > >> > > - Charlie From headius at headius.com Thu Sep 3 19:48:10 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 3 Sep 2020 14:48:10 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <345888053.671840.1599162171937.JavaMail.zimbra@u-pem.fr> References: <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> <345888053.671840.1599162171937.JavaMail.zimbra@u-pem.fr> Message-ID: On Thu, Sep 3, 2020 at 2:42 PM Remi Forax wrote: > I will say something that doesn't help you, > i tries to not use one classloader per method and instead use a combination of ahead of time compilation + lookup.defineClass + defineAnonymousClass. > I can do that because my base version is Java 11 not Java 8. > I plan to use Hidden classes soon, i've already a prototype but i still need to study where to use weak or strong hidden classes. I'm willing to do any or all of those things! defineAnonymousClass is still not a publicly-accessible API, though, which is why I have not used it up to now. We don't have any plans to baseline on 11 yet, but could possibly be convinced. I think you understand my use case pretty well... maybe you can point me at an example of what you think would be the "best" way, assuming we moved our baseline to 11? - Charlie From vladimir.kozlov at oracle.com Thu Sep 3 19:56:44 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 3 Sep 2020 12:56:44 -0700 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> Message-ID: <586271e8-0b0e-cedf-8837-6df1cb95de31@oracle.com> Please, run changes through jdk/submit before push. Current fix caused tier1 failure: https://bugs.openjdk.java.net/browse/JDK-8252779 Thanks, Vladimir K PS: currently jdk/submit under planned maintenance until Sep 7 On 9/3/20 5:03 AM, Dmitry Chuyko wrote: > I pushed the change for signum, and opened an RFE [1] for fp re-materialization on aarch64. We've made some experiments > already. On N1 simply enabling it makes workloads like Lucene geo search ~0.7% faster, will get back with more data. > > I haven't made RFE for "Constants as TEMPs" but will be glad to participate. > > -Dmitry > > [1] https://bugs.openjdk.java.net/browse/JDK-8252760 > > On 9/3/20 11:34 AM, Andrew Haley wrote: >> On 02/09/2020 18:40, Vladimir Ivanov wrote: >>> C2 doesn't rematerialize FP constants on AArch64 >> I think that's probably right, even though a few FP constants are >> cheap. Loads from L1 usually have a 4 (or 5) cycle latency, which can >> make spilling constants expensive around calls. We could be smarter >> about this. >> From dmitry.chuyko at bell-sw.com Thu Sep 3 21:14:06 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Fri, 4 Sep 2020 00:14:06 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <586271e8-0b0e-cedf-8837-6df1cb95de31@oracle.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> <11530b87-8124-19ca-936b-16dec5994411@redhat.com> <8cb53861-1c52-2fcb-3e7e-558aa48373c8@bell-sw.com> <586271e8-0b0e-cedf-8837-6df1cb95de31@oracle.com> Message-ID: <5c983d58-a5ef-a672-96c0-51df4d6d2ba3@bell-sw.com> Sorry, I really should have used dev submit. Vladimir, thanks for immediate fix. It looks like the test was excluded in our CI for 16, will inspect that. -Dmitry On 9/3/20 10:56 PM, Vladimir Kozlov wrote: > Please, run changes through jdk/submit before push. Current fix caused > tier1 failure: > > https://bugs.openjdk.java.net/browse/JDK-8252779 > > Thanks, > Vladimir K > > PS: currently jdk/submit under planned maintenance until Sep 7 > > On 9/3/20 5:03 AM, Dmitry Chuyko wrote: >> I pushed the change for signum, and opened an RFE [1] for fp >> re-materialization on aarch64. We've made some experiments already. >> On N1 simply enabling it makes workloads like Lucene geo search ~0.7% >> faster, will get back with more data. >> >> I haven't made RFE for "Constants as TEMPs" but will be glad to >> participate. >> >> -Dmitry >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252760 >> >> On 9/3/20 11:34 AM, Andrew Haley wrote: >>> On 02/09/2020 18:40, Vladimir Ivanov wrote: >>>> C2 doesn't rematerialize FP constants on AArch64 >>> I think that's probably right, even though a few FP constants are >>> cheap. Loads from L1 usually have a 4 (or 5) cycle latency, which can >>> make spilling constants expensive around calls. We could be smarter >>> about this. >>> From ekaterina.pavlova at oracle.com Thu Sep 3 21:15:34 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 3 Sep 2020 14:15:34 -0700 Subject: RFR(T) : 8252774 : remove jdk.test.lib.FileInstaller action from graalunit tests In-Reply-To: References: Message-ID: <12adb4ea-a8cf-f331-6c23-66e592e2e47d@oracle.com> Looks good. Thanks Igor for fixing it! -katya On 9/3/20 10:12 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 >> 341 lines changed: 0 ins; 240 del; 101 mod; > > Hi all, > > could you please review this small and trivial clean up in test/hotspot/jtreg/compiler/graalunit? > from JBS: >> test/hotspot/jtreg/compiler/graalunit tests use jdk.test.lib.FileInstaller to copy ProblemList-graal.txt from test/hotspot/jtreg/ to the current working directory as ExcludeList.txt, and then run compiler.graalunit.common.GraalUnitTestLauncher w/ -exclude ExcludeList.txt. >> >> j.t.l.FileInstaller actions aren't needed as c.g.c.GraalUnitTestLauncher interpeters `-exclude`'s value as path to file (as oppose to the file name in current directory), so we can use ${test.root}/ProblemList-graal.txt instead of ExcludeList.txt there. > > > the patch modifies generateTests.sh to use ${test.root}/ProblemList-graal.txt, cleans it up (removes trailing spaces, empty @summary tag, and redundant explicit @build) and regenerates graalunit tests. > > testing: test/hotspot/jtreg/compiler/graalunit on {linux,windows,macos}-x64 > JBS: https://bugs.openjdk.java.net/browse/JDK-8252774 > webrev: http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 > > Thanks, > -- Igor > From ekaterina.pavlova at oracle.com Thu Sep 3 21:16:36 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 3 Sep 2020 14:16:36 -0700 Subject: RFR(T) : 8252778 : remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test In-Reply-To: References: Message-ID: <6189b097-a1e5-7236-8f62-8c88c1ee4f74@oracle.com> Looks good. -katya On 9/3/20 11:40 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ >> 2 lines changed: 0 ins; 1 del; 1 mod; > > Hi all, > > could you please review this small and trivial cleanup? > from JBS: >> `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" > > testing: compiler/c2/stemmer on {linux,windows,macos}-x64 > JBS: https://bugs.openjdk.java.net/browse/JDK-8252778 > webrev: http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ > > Thanks, > -- Igor > From jptatton at amazon.com Thu Sep 3 21:28:56 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Thu, 3 Sep 2020 21:28:56 +0000 Subject: JDK-8173585: Intrinsify StringLatin1.indexOf(char) Message-ID: <0cbe7d8f594349b59504c42f89c6f268@EX13D46EUB003.ant.amazon.com> Hi All, Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8173585 http://cr.openjdk.java.net/~phh/8173585/webrev.00/ This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and java/lang/StringLatin1.java Details of testing: ============== I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the input String. Hence the test has been designed to cover all these cases. In summary they are: ? A ?short? string of < 16 characters. ? A SIMD String of 16 ? 31 characters. ? A AVX2 SIMD String of 32 characters+. Hardware used for testing: ------------------------------------ ? Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). ? AWS Graviton 2 (ARM 64 processor). I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. Possible future enhancements: ========================= For the x86 implementation there may be two further improvements we can make in order to improve performance of both the StringUTF16 and StringLatin1 indexOf(char) intrinsics: 1. Make use of AVX-512 instructions. 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD instructions instead of a loop. JMH Benchmark results: ==================== The benchmarks examine the 3 codepaths for StringLatin1 and StringUTF16. Here are the results for Intel x86 (ARM is similar): FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16): ?????? Latin1? UTF16 Short: 15?????? 7 SSE4:? 16?????? 8 AVX2:? 32?????? 16 Without StringLatin1 indexofchar intrinsic: Benchmark????????????????????????????? Mode? Cnt?????? Score????? Error? Units IndexOfBenchmark.latin1_AVX2_String?? thrpt??? 5? 121781.424 ? 355.085? ops/s IndexOfBenchmark.latin1_AVX2_char???? thrpt ???5?? 46060.612 ? 151.274? ops/s IndexOfBenchmark.latin1_SSE4_String?? thrpt??? 5? 197339.146 ?? 90.333? ops/s IndexOfBenchmark.latin1_SSE4_char???? thrpt??? 5?? 61401.204 ? 426.761? ops/s IndexOfBenchmark.latin1_Short_String? thrpt??? 5? 175389.355 ? 294.976? ops/s IndexOfBenchmark.latin1_Short_char??? thrpt??? 5?? 60759.868 ? 124.349? ops/s IndexOfBenchmark.utf16_AVX2_String??? thrpt??? 5? 123601.020 ? 111.981? ops/s IndexOfBenchmark.utf16_AVX2_char????? thrpt??? 5? 141116.832 ? 380.489? ops/s IndexOfBenchmark.utf16_SSE4_String??? thrpt??? 5? 178136.762 ? 143.227? ops/s IndexOfBenchmark.utf16_SSE4_char????? thrpt??? 5? 181430.649 ? 120.097? ops/s IndexOfBenchmark.utf16_Short_String?? thrpt??? 5? 158301.361 ? 182.738? ops/s IndexOfBenchmark.utf16_Short_char???? thrpt??? 5?? 84876.919 ? 247.769? ops/s With StringLatin1 indexofchar intrinsic: Benchmark????????????????????????????? Mode? Cnt?????? Score????? Error? Units IndexOfBenchmark.latin1_AVX2_String?? thrpt??? 5? 113621.676 ??? 68.235? ops/s IndexOfBenchmark.latin1_AVX2_char???? thrpt??? 5? 177757.909 ?? 727.308? ops/s IndexOfBenchmark.latin1_SSE4_String?? thrpt??? 5? 180529.049 ??? 57.356? ops/s IndexOfBenchmark.latin1_SSE4_char???? thrpt??? 5? 235087.776 ?? 457.024? ops/s IndexOfBenchmark.latin1_Short_String? thrpt??? 5? 165914.990 ?? 329.024? ops/s IndexOfBenchmark.latin1_Short_char??? thrpt??? 5?? 53989.544 ??? 65.393? ops/s IndexOfBenchmark.utf16_AVX2_String??? thrpt??? 5? 107632.783 ?? 446.272? ops/s IndexOfBenchmark.utf16_AVX2_char????? thrpt??? 5? 143131.734 ?? 159.944? ops/s IndexOfBenchmark.utf16_SSE4_String??? thrpt??? 5? 169882.703 ? 1024.367? ops/s IndexOfBenchmark.utf16_SSE4_char????? thrpt??? 5? 175693.972 ?? 775.423? ops/s IndexOfBenchmark.utf16_Short_String?? thrpt??? 5? 163595.993 ?? 225.089? ops/s IndexOfBenchmark.utf16_Short_char???? thrpt??? 5?? 90126.154 ?? 365.642? ops/s We can see above that indexOf(char) now behaves similarly between StringUTF16 and StringLatin1. JMH benchmark code: ------------------------------ package?org.sample; import?java.util.Random; import?org.openjdk.jmh.annotations.Benchmark; import?org.openjdk.jmh.annotations.Scope; import?org.openjdk.jmh.annotations.State; @State(Scope.Thread) public?class?IndexOfBenchmark?{ ????private?static?final?int?loops?=?100000; ????private?static?final?Random?rng?=?new?Random(1999); ????private?static?final?int?pathCnt?=?1000; ????private?static?final?String?[]?latn1_short?=?new?String[pathCnt]; ????private?static?final?String?[]?latn1_sse4??=?new?String[pathCnt]; ????private?static?final?String?[]?latn1_avx2??=?new?String[pathCnt]; ????private?static?final?String?[]?utf16_short?=?new?String[pathCnt]; ????private?static?final?String?[]?utf16_sse4??=?new?String[pathCnt]; ????private?static?final?String?[]?utf16_avx2??=?new?String[pathCnt]; ????static?{ ????????for?(int?i?=?0;?i? https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252188 Code added by 8248830 [1] uses Node::is_Con() check when looking for constant shift values. Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. I used C2 types to check and get shift values. I also refactor code to consolidate checks. Tested: tier1, hs-tier2, hs-tier3. Verified fix with replay file from bug report. I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. I created subtask to add new regerssion test later because this fix is urgent and I did not have time to prepare it. Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From hohensee at amazon.com Thu Sep 3 22:46:27 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 3 Sep 2020 22:46:27 +0000 Subject: RFR 8239090: Improve CPU feature support in VM_version Message-ID: <8DBEA512-283E-443A-A820-E215D5EF03EA@amazon.com> Taking over from Eric... Thank you for the review, Igor. I took a completely different (and very old approach), however, and defined a method Abstract_VM_Version:: insert_features_names() that iterates over the feature flags set. If a feature bit is on, it appends to an output buffer a corresponding name string from an array indexed by the bit number. I've implemented it only for x86: using the mechanism for other platforms can be follow-on RFEs. I'd greatly appreciate a review. Webrev: http://cr.openjdk.java.net/~phh/8239090/webrev.00/ To add a feature bit, all one now has to do is add a CPU_ definition and corresponding name string in the FEATURES_NAMES macro. I've also included a few small changes to the x86 implementation beyond the above. 1. Unified the previous two bitset definitions into a single Feature_Flag enum and made it a uint64_t. 2. supports_tscinv_bit() referenced the CPU_TSCINV bit, which was a bit misleading, so added a new CPU_TSCINV_BIT mask and used it instead. 3. Repurposed CPU_TSCINV for supports_tscinv(), which was a "composite" property, but is now computed once in feature_flags(). 4. Made supports_clflushopt() and supports_clwb() common to both 32 and 64-bit rather than have 32-bit versions that always return 'false'. These bits are never set by the hardware on 32-bit, so no need for separate methods. 5. Renamed CPU_HV_PRESENT to CPU_HV to conform with the CPU_ bit naming scheme. "_PRESENT" is redundant and not used for any other CPU_ name, and the feature string uses "hv", not "hv_present". Added CPU_HV to vmStructs_x86.hpp and vmStructs_jvmci.cpp. Tested using -Xlog:os+cpu on my macbook pro: the same feature string is returned after the patch as before it. Suggestions for how to more thoroughly test the patch are very welcome. Thanks, Paul ?On 8/27/20, 6:22 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. That would be a good way to get rid of the ugly macro (we allow c++14 now). For example foo() in this example: enum E { a, b, c }; struct P { E _e; // key int _v; // value constexpr P(E e, int v) : _e(e), _v(v) { } }; constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; constexpr int match(E e) { for (const auto& p : ps) { if (p._e == e) { return p._v; } } return -1; } int foo(E e) { return match(e); } Will be compiled into: __Z3foo1E: ## @_Z3foo1E .cfi_startproc ## %bb.0: movl $-1, %eax cmpl $2, %edi ja LBB0_2 ## %bb.1: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movslq %edi, %rax leaq l_switch.table._Z3foo1E(%rip), %rcx movq (%rcx,%rax,8), %rax movl 4(%rax), %eax popq %rbp LBB0_2: retq .cfi_endproc ## -- End function .section __TEXT,__const .p2align 4 ## @_ZL2ps __ZL2ps: .long 0 ## 0x0 .long 57005 ## 0xdead .long 1 ## 0x1 .long 48879 ## 0xbeef .long 2 ## 0x2 .long 61453 ## 0xf00d .section __DATA,__const .p2align 3 ## @switch.table._Z3foo1E l_switch.table._Z3foo1E: .quad __ZL2ps .quad __ZL2ps+8 .quad __ZL2ps+16 igor > On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: > > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 > > Yesterday I sent a wrong one, so I send it again, > I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. > > Regards, > Eric Chen > From xxinliu at amazon.com Fri Sep 4 04:33:07 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 4 Sep 2020 04:33:07 +0000 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> References: <1598731717217.87517@amazon.com>, <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> Message-ID: <1599193987757.63967@amazon.com> Hi, Tobias, Thank you to review the patch. Yes, My change does affect PrintIdeal. Indeed, if line-wrapping happens, the readability will drop a lot. Taking your advice, I introduce a new c2 option PrintIdealIndentThreshold=5. if users attempt to dump an ideal graph deeper than that level, the indention function will be automatically disable. -XX:+PrintIdeal is root()->dump(9999) under the hook. Of course it doesn't indent. Indention still happens if users call "node->dump(-3)" in a debugger. Making a beautiful formation is a surprisingly hard task. Previously, I would like to treat Node::_idx as a line number in vim. That's why I make them align on left side. In this new revision, I indent a few whitespaces in first place. The result looks like this. What do you think? https://bugs.openjdk.java.net/secure/attachment/90041/indent_with_idx.log here is the new revison: http://cr.openjdk.java.net/~xliu/8251464/01/webrev/ thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Thursday, September 3, 2020 5:30 AM To: Liu, Xin; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: [EXTERNAL] RFR: 8251464: make Node::dump(int depth) support indent CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, I'm concerned that the output quickly becomes unreadable when dumping a large graph. For example, isn't -XX:+PrintIdeal affected by this as well? Also, looking at the example output you've posted [1], shouldn't the node id be indented as well? This might be helpful when dumping small parts of the graph but then it should be optional (i.e. can be turned on via a flag/argument if needed). Best regards, Tobias [1] https://bugs.openjdk.java.net/secure/attachment/89800/dump2.txt On 29.08.20 22:08, Liu, Xin wrote: > hi, Reviewers, > > > Could you review this patch? > > JBS:https://bugs.openjdk.java.net/browse/JDK-8251464 > > Webrev: > > http://cr.openjdk.java.net/~xliu/8251464/00/webrev/ > > > This patch attempts to improve the formation of nodes when developers try to dump an ideal graph or snippet of a graph. In practice, I found it's pretty handy if Node::dump(int d) can support indent. > > The basic idea is to support indention for the utility function: > > collect_nodes_i(GrowableArray* queue, const Node* start, int direction, uint depth, bool include_start, bool only_ctrl, bool only_data) > > It only affects Node::dump family and -XX::PrintIdeal. It won't impact the output for igv. > This can help developers who try to inspect a cluster of nodes in gdb. > > Another change is naming. collect_nodes_i uses breadth-first search. the container is used in fifo way instead of filo. > I think the name "queue" serve better. > > TEST: > hotspot:tier1 and gtest. > mach-5 > > thanks, > --lx > > From jatin.bhateja at intel.com Fri Sep 4 05:08:24 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Fri, 4 Sep 2020 05:08:24 +0000 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: Hi Vladimir, Thanks for taking care of this. Similar strict check for constant shift is needed in OrVNode::Ideal routine in vectornode.cpp. Regards, Jatin > -----Original Message----- > From: hotspot-compiler-dev On > Behalf Of Vladimir Kozlov > Sent: Friday, September 4, 2020 3:14 AM > To: hotspot compiler > Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, > bool)+0x8b9 > > https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252188 > > Code added by 8248830 [1] uses Node::is_Con() check when looking for > constant shift values. > Unfortunately it does not guarantee that it will be Integer constant > because TOP node is also ConNode. > I used C2 types to check and get shift values. I also refactor code to > consolidate checks. > > Tested: tier1, hs-tier2, hs-tier3. > Verified fix with replay file from bug report. > I also checked that RotateBenchmark.java added by 8248830 still creates > Rotate vectors after this fix. > > I created subtask to add new regerssion test later because this fix is > urgent and I did not have time to prepare it. > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From christian.hagedorn at oracle.com Fri Sep 4 05:19:05 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 4 Sep 2020 07:19:05 +0200 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <1599154180784.19506@amazon.com> References: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> <1599123405363.99306@amazon.com> <1599154180784.19506@amazon.com> Message-ID: Hi Xin On 03.09.20 19:29, Liu, Xin wrote: > hi, Christian and Nhat, > > I notice that for_igvn() is cleared before PhaseRenumberLive too, but it's not true that it's an empty worklist in PhaseRenumberLive. > > The base constructor PhaseRemoveUseless appends some "unique_out" to the for_igvn() > Check out "record_for_igvn(n->unique_out())" in C->remove_useless_nodes(_useful); You're right! I missed that one. > I guess the original code works because new_worklist is in the C->comp_arena. > There's no demolition code for Unique_Node_List, so the object is still in good shape even it is out of scope. > > No doubt that Nhat's patch restores the valid pointer. The bad thing is that its content is corrupted to use. > We are lucky because nobody uses it to construct a PhaseIterGVN after RenumberLiveNodes. Right. > How about this? We can clear() the worklist out. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,9 +2089,12 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > + Unique_Node_List* save_for_igvn = for_igvn(); > set_for_igvn(&new_worklist); > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > + set_for_igvn(save_for_igvn); > + for_igvn()->clear(); > } This looks good while having no users afterwards. But I'm fine with both solutions. Best regards, Christian > > thanks, > --lx > ________________________________________ > From: Christian Hagedorn > Sent: Thursday, September 3, 2020 2:50 AM > To: Liu, Xin; Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Xin > > I'm not sure if we really want to copy all the nodes from new_worklist > back to for_igvn() when it's not used anymore afterwards. I thought that > this RFE just intents to keep a valid pointer in Compile::_for_igvn. > > Nevertheless, I also took a closer look at that code and it seems that > for_igvn() is not even required for PhaseRenumberLive? It's cleared just > before the phase on L2086 and then PhaseRenumberLive does not seem to > add anything to it and neither does PhaseRemoveUseless. So we are > basically restoring an empty list afterwards. However, this could be > cleaned up in an additional RFE. > > Best regards, > Christian > > On 03.09.20 10:56, Liu, Xin wrote: >> hi, Nhat and Christian, >> >> I reviewed this patch. I feel resuming the old worklist looks suspicious. >> PhaseRenumberLive renumbers nodes associates with gvn. old _worklist might have wrong idx and types. >> >> Instead of resuming old _worklist, you can copy out nodes from new_worklist to your current _worklist. >> >> diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp >> --- a/src/hotspot/share/opto/compile.cpp >> +++ b/src/hotspot/share/opto/compile.cpp >> @@ -2089,7 +2089,10 @@ >> ResourceMark rm; >> PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); >> } >> - set_for_igvn(&new_worklist); >> + for_igvn()->clear(); >> + while (new_worklist.size() > 0) { >> + for_igvn()->push(new_worklist.rpop()); >> + } >> igvn = PhaseIterGVN(initial_gvn()); >> igvn.optimize(); >> } >> >> >> thanks, >> --lx >> >> ________________________________________ >> From: hotspot-compiler-dev on behalf of Christian Hagedorn >> Sent: Thursday, August 27, 2020 7:54 AM >> To: Nhat Nguyen; hotspot-compiler-dev at openjdk.java.net >> Subject: RE: [EXTERNAL] RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> Hi Nhat >> >> Looks good to me! >> >> Just make sure you that next time you assign the bug to you or a sponsor >> and/or leave a comment that you intend to work on it to avoid the >> possibility of some duplicated work (was no problem in this case) ;-) >> >> Best regards, >> Christian >> >> On 26.08.20 20:55, Nhat Nguyen wrote: >>> Hi hotspot-compiler-dev, >>> >>> Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 >>> The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. >>> I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. >>> >>> webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ >>> >>> Thank you, >>> Nhat >>> From christian.hagedorn at oracle.com Fri Sep 4 05:43:27 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 4 Sep 2020 07:43:27 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: <874koe6fgm.fsf@redhat.com> References: <877dtb5li0.fsf@redhat.com> <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> <874koe6fgm.fsf@redhat.com> Message-ID: <09ab280a-1e1b-f259-7fd4-108088c4a58d@oracle.com> On 03.09.20 17:35, Roland Westrelin wrote: > > Hi Christian, > > Thanks for looking at this. > >> Nice analysis! This fix sounds reasonable to me. Have I understood that >> correctly that in your testcase, the main loop is just unrolled enough >> such that the loop nodes can be removed (so no over-unrolling)? > > Right. > >>> [..] and then eliminates the >>> dominating ones that it finds useless. Removing dominated predicates >>> that have no control dependent data nodes the way it's done in the >>> current logic is suspicious: data nodes are sometimes logically >>> dependent on multiple range checks but control dependent on only one. >> >> That's a valid point. Is there a better way we could find out which >> predicates are useless after cloning them to the slow and fast loop or >> is it either way not a problem to keep the original ones alive? > > AFAIU, that's not a problem with the patch I propose which only clones > the skeleton predicates. All of them are required to be copied in case > of unrolling. I see, thanks for explaining. >> I agree, I think I added this back there because I was temporarily >> modifying the old_new mapping in a non-conform way on L302. So, I just >> wanted to be sure that everything is reset and works as intended. But >> I'm also fine with just removing this assertion code. > > Should I remove L302 as well? I'm confused by what's going on here. No this line should be fine. Just to be sure we are talking about the same line, I was referring to L302 in your patch (old_new.map(..)). I used the old_new mapping here to get a quick reference from the cloned node in the slow loop (when processing the slow loop after the fast loop) back to the original node in the fast loop on line L280. But I added additional asserts there to ensure that everything is reset as intended. So, it's fine that you removed the assertion code on L257-267. Best regards, Christian >> Some small comment: >> - You probably don't need the assertion on L257 in loopPredicate.cpp as >> you are already checking it in the if-condition on L253. > > Right. I'll remove it. > >> - Same file, you should update the assert message on L268 to something >> like "... projection of an If node". > > I'll make that change too. > > Roland. From HORIE at jp.ibm.com Fri Sep 4 05:47:21 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 4 Sep 2020 14:47:21 +0900 Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: Hi Gustavo, Thanks for your additional review and guideline for the trivial change! Best regards, Michihiro From: "Gustavo Romero" To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-compiler-dev at openjdk.java.net Cc: "Thomas St?fe" Date: 2020/09/04 03:31 Subject: Re: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp On 8/18/20 5:38 AM, Michihiro Horie wrote: > > Dear all, > > Would you please review a small change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251926 > Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/ > > The load_const_optimized function in assembler_ppc.cpp has an unused > variable named return_xd. It looks unnecessary in the current code. Hi Michi, The change looks good to me. As Thomas said that change can be considered trivial, so you can push it right through to jdk master repo without a second Review (only one Review is sufficient for trivial changes). Best regards, Gustavo From tobias.hartmann at oracle.com Fri Sep 4 05:58:41 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 4 Sep 2020 07:58:41 +0200 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: <8eead99d-cc14-05b2-10da-e7378a9251ea@oracle.com> Hi Vladimir, looks good to me. Some minor comments: - This notation is confusing "0/{32|64}" (looks like a division). I think it should be something like "{0|32|64}". - Extra whitespaces in the lines with "return new" and "val , " should be removed. Best regards, Tobias On 03.09.20 23:43, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252188 > > Code added by 8248830 [1] uses Node::is_Con() check when looking for constant shift values. > Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. > I used C2 types to check and get shift values. I also refactor code to consolidate checks. > > Tested: tier1, hs-tier2, hs-tier3. > Verified fix with replay file from bug report. > I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. > > I created subtask to add new regerssion test later because this fix is urgent and I did not have > time to prepare it. > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From filipp.zhinkin at gmail.com Fri Sep 4 07:46:56 2020 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Fri, 4 Sep 2020 10:46:56 +0300 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> References: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> Message-ID: Hi, updated webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.2/ (I've updated years in the copyright). IIRC it's enough to have only one reviewer for such a small change, right? Thanks, Filipp. On Wed, 2 Sep 2020 at 21:53, Igor Ignatyev wrote: > > > On Sep 2, 2020, at 1:06 AM, Filipp Zhinkin > wrote: > > Hi, > > updated webrev: > http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.1/ > > Tests are throwing SkippedException now, as Igor suggested. > > thanks, LGTM. > > Seems like I should also update a year in > JdkInternalMiscUnsafeUnalignedAccess' copyright, but I'm not sure if there > should be a line designating Oracle as intellectual property owner along > with SAP. Should I add it too? > > IANAL, but unless the file was modified by Oracle, it doesn't have to have > Oracle copyright notice. > > Thanks, > -- Igor > > > Thanks, > Filipp. > > On Tue, 1 Sep 2020 at 22:48, Filipp Zhinkin > wrote: > >> Hi Igor, >> >> On Tue, 1 Sep 2020 at 19:46, Igor Ignatyev >> wrote: >> >>> Hi Filipp, >>> >>> 1st of all, welcome back! >>> >> >> thanks! >> >> >>> >>> it would be better to throw jtreg.SkippedException at L#46 so jtreg will >>> reported the test as skipped (as opposed to just passed). >> >> Thanks, I'll update Test8202414 as well >> as compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess (which also skips >> execution the same way). >> >> >>> alternative, you could use '@requires vm.simpleArch != "arm"' to exclude >>> the test from arm32 execution. >>> >> >> I was thinking about adding something like vm.unalignedAccess.enabled, >> but it seems to be too complicated solution for two tests (Test8202414 >> and compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess). >> I don't want to use 'simpleArch', because if some new platform missing >> unaligned access support will be added in the future then someone will have >> to spend time to find out why a test crashes. >> >> Thanks, >> Filipp. >> >> >>> >>> Thanks, >>> -- Igor >>> >>> >>> > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin >>> wrote: >>> > >>> > Hi, >>> > >>> > Test8202414 crashes on ARM32 while writing to memory using an unaligned >>> > address. >>> > ARM32 supports unaligned memory accesses for some load/store >>> instructions >>> > under certain conditions, but LDRD (which is used when we're calling >>> > Unsafe::putLong) is always causing alignment fault when called with an >>> > unaligned address [1]. >>> > >>> > The fix is simply skipping the test execution if a platform does not >>> > support unaligned memory accesses. >>> > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 >>> > Webrev: >>> http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ >>> > >>> > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, >>> ?A3.2.1 >>> > Unaligned data access >>> https://developer.arm.com/documentation/ddi0406/cd >>> >>> > >>> > Thanks, >>> > Filipp. >>> >>> > From tobias.hartmann at oracle.com Fri Sep 4 08:58:08 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 4 Sep 2020 10:58:08 +0200 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> Message-ID: Hi Lutz, hard to review but looks reasonable to me. In compiledMethod.cpp:72, the brackets and comment are confusing. Why should these fields be treated specially? Best regards, Tobias On 26.08.20 17:20, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected. > > There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes: > > 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks. > > 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8219586 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/ > > This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. > > Thank you! > Lutz > From forax at univ-mlv.fr Fri Sep 4 09:03:06 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 4 Sep 2020 11:03:06 +0200 (CEST) Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> <345888053.671840.1599162171937.JavaMail.zimbra@u-pem.fr> Message-ID: <1701200058.843958.1599210186276.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Charles Oliver Nutter" > ?: "Remi Forax" > Cc: "Vladimir Ivanov" , "hotspot compiler" > Envoy?: Jeudi 3 Septembre 2020 21:48:10 > Objet: Re: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby > On Thu, Sep 3, 2020 at 2:42 PM Remi Forax wrote: >> I will say something that doesn't help you, >> i tries to not use one classloader per method and instead use a combination of >> ahead of time compilation + lookup.defineClass + defineAnonymousClass. >> I can do that because my base version is Java 11 not Java 8. >> I plan to use Hidden classes soon, i've already a prototype but i still need to >> study where to use weak or strong hidden classes. > > I'm willing to do any or all of those things! defineAnonymousClass is > still not a publicly-accessible API, though, which is why I have not > used it up to now. yep, i should not have mention it, i use it in a very special case for doing loop customisation where i want to have access to annotations like @ForceInline I should have say that instead of using a new classloader, you can directly inject the class in an existing classloader by calling defineClass by reflection in 8 and using lookup.defineClass() in 11. It's far less resources hungry because each classloader has its own metaspace but you are loosing the ability to unload classes. > > We don't have any plans to baseline on 11 yet, but could possibly be convinced. As i said above, you can call ClassLoader.defineClass by reflection in 8. > > I think you understand my use case pretty well... maybe you can point > me at an example of what you think would be the "best" way, assuming > we moved our baseline to 11? We moved to 11 for another reason, being able to use Shenandoah (and maybe in the future ZGC). > > - Charlie R?mi From tobias.hartmann at oracle.com Fri Sep 4 09:09:56 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 4 Sep 2020 11:09:56 +0200 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <1599193987757.63967@amazon.com> References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> Message-ID: <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> Hi Xin, thanks for making these changes. That looks good to me. But lets wait for some more opinions from other reviewers. Best regards, Tobias On 04.09.20 06:33, Liu, Xin wrote: > Hi, Tobias, > > Thank you to review the patch. > > Yes, My change does affect PrintIdeal. Indeed, if line-wrapping happens, the readability will drop a lot. > Taking your advice, I introduce a new c2 option PrintIdealIndentThreshold=5. if users attempt to dump an ideal graph > deeper than that level, the indention function will be automatically disable. > > -XX:+PrintIdeal is root()->dump(9999) under the hook. Of course it doesn't indent. Indention still happens if users call "node->dump(-3)" in a debugger. > > Making a beautiful formation is a surprisingly hard task. Previously, I would like to treat Node::_idx as a line number in vim. That's why I make them align on left side. > In this new revision, I indent a few whitespaces in first place. The result looks like this. What do you think? > https://bugs.openjdk.java.net/secure/attachment/90041/indent_with_idx.log > > here is the new revison: > http://cr.openjdk.java.net/~xliu/8251464/01/webrev/ > > thanks, > --lx > > > ________________________________________ > From: Tobias Hartmann > Sent: Thursday, September 3, 2020 5:30 AM > To: Liu, Xin; 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: [EXTERNAL] RFR: 8251464: make Node::dump(int depth) support indent > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Xin, > > I'm concerned that the output quickly becomes unreadable when dumping a large graph. > For example, isn't -XX:+PrintIdeal affected by this as well? > > Also, looking at the example output you've posted [1], shouldn't the node id be indented as well? > > This might be helpful when dumping small parts of the graph but then it should be optional (i.e. can > be turned on via a flag/argument if needed). > > Best regards, > Tobias > > [1] https://bugs.openjdk.java.net/secure/attachment/89800/dump2.txt > > On 29.08.20 22:08, Liu, Xin wrote: >> hi, Reviewers, >> >> >> Could you review this patch? >> >> JBS:https://bugs.openjdk.java.net/browse/JDK-8251464 >> >> Webrev: >> >> http://cr.openjdk.java.net/~xliu/8251464/00/webrev/ >> >> >> This patch attempts to improve the formation of nodes when developers try to dump an ideal graph or snippet of a graph. In practice, I found it's pretty handy if Node::dump(int d) can support indent. >> >> The basic idea is to support indention for the utility function: >> >> collect_nodes_i(GrowableArray* queue, const Node* start, int direction, uint depth, bool include_start, bool only_ctrl, bool only_data) >> >> It only affects Node::dump family and -XX::PrintIdeal. It won't impact the output for igv. >> This can help developers who try to inspect a cluster of nodes in gdb. >> >> Another change is naming. collect_nodes_i uses breadth-first search. the container is used in fifo way instead of filo. >> I think the name "queue" serve better. >> >> TEST: >> hotspot:tier1 and gtest. >> mach-5 >> >> thanks, >> --lx >> >> From adinn at redhat.com Fri Sep 4 09:27:38 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 4 Sep 2020 10:27:38 +0100 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> Message-ID: <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> Hi Tobias/Xin On 04/09/2020 10:09, Tobias Hartmann wrote: > thanks for making these changes. That looks good to me. > But lets wait for some more opinions from other reviewers. I'll start by noting that this is not a review, merely user feedback. I use node dumps a lot when debugging C2 and, looking at the supplied example, I don't find the indentation helpful -- in fact, I actually find it slightly disrupts my reading the graph. Also, most of the time I feed graph dumps through a sort process so that nodes end up listed in id order, making it easier to track chains of links in both directions. That re-ordering makes the indentation much less useful. Of course, this is only a report of my way of working. I'm not against the patch per se, so long as it is easy to disable indentation (or have emacs remove it). regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From tobias.hartmann at oracle.com Fri Sep 4 09:29:02 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 4 Sep 2020 11:29:02 +0200 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <1599154180784.19506@amazon.com> References: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> <1599123405363.99306@amazon.com> <1599154180784.19506@amazon.com> Message-ID: <4fd6b45a-6c8a-e23d-66c9-d709d7561f3a@oracle.com> Hi Xin, On 03.09.20 19:29, Liu, Xin wrote: > How about this? We can clear() the worklist out. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,9 +2089,12 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > + Unique_Node_List* save_for_igvn = for_igvn(); > set_for_igvn(&new_worklist); > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > + set_for_igvn(save_for_igvn); > + for_igvn()->clear(); > } That looks good to me. Best regards, Tobias From vladimir.x.ivanov at oracle.com Fri Sep 4 13:37:40 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 4 Sep 2020 16:37:40 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> <9a5ab727-7d9e-a179-e46c-0916aa10ff12@oracle.com> <7296fa39-4f3f-b6e5-d0c8-cfc81c5c550c@oracle.com> Message-ID: >> I'm not sure I understand fully, but I'm now generously sprinkling >> class LDCs into my jitted methods' static initializers now. This seems >> like an incredibly onerous series of requirements just to load some >> dynamic code. > > So I moved the Class.forName's out of the OneShot constructor and into > the static initializer (as LDC) in the jitted methods' classes. I went > ahead and loaded everything that is in the signatures in question. > > https://urldefense.com/v3/__https://gist.github.com/headius/6408b8392096d7932020870022374a9d__;!!GqivPVa7Brio!OkSq8FcGAVnPJ05eVhXtpKqO3AddrYi7X2GF732M6jh55-DknpgkGo2GUB3uPTsQOGKg5fU$ > > Running the original script, I no longer see "unloaded signature > class" warnings from PrintInlining, and as shown in the above gist I > eventually get the asm I expect! > > But this is a hacky workaround, right? Do other frameworks that > dynamically generate code also have to do this aggressive classloading > within that generated code? This doesn't seem right, does it? I agree it requires too much work on user side to workaround the limitation, so I'm looking into possible ways to relax the restriction on JVM side. Still, I don't see it as a general problem for dynamic bytecode generation. If loaded code is placed inside single class loader, stepping on unloaded class (either absent of not-yet-loaded) is much less likely to happen. Having loader per class already stresses JVM in multiple directions and all the downsides should be taken into account before adopting it. So, I see it as yet another initialization-related corner case and it would be nice to get it fixed to avoid the pathological behavior you observe. Best regards, Vladimir Ivanov From igor.ignatyev at oracle.com Fri Sep 4 14:25:15 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 4 Sep 2020 07:25:15 -0700 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> Message-ID: <2102D49A-5672-4695-A271-3136EB299C3F@oracle.com> Hi Filipp, > On Sep 4, 2020, at 12:46 AM, Filipp Zhinkin wrote: > > Hi, > > updated webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.2/ (I've updated years in the copyright). I don't know the format used in SAP copyright notices, so I can't really tell if you should have added ", 2020" there, this I'd suggest you to seek comments from SAP folks. > IIRC it's enough to have only one reviewer for such a small change, right? right, one reviewer is sufficient for "trivial" changes, but as I said I can't really review changes in SAP copyright line. Cheers, -- Igor > > Thanks, > Filipp. > > > On Wed, 2 Sep 2020 at 21:53, Igor Ignatyev > wrote: > > >> On Sep 2, 2020, at 1:06 AM, Filipp Zhinkin > wrote: >> >> Hi, >> >> updated webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.1/ >> >> Tests are throwing SkippedException now, as Igor suggested. > thanks, LGTM. > >> Seems like I should also update a year in JdkInternalMiscUnsafeUnalignedAccess' copyright, but I'm not sure if there should be a line designating Oracle as intellectual property owner along with SAP. Should I add it too? > IANAL, but unless the file was modified by Oracle, it doesn't have to have Oracle copyright notice. > > Thanks, > -- Igor >> >> Thanks, >> Filipp. >> >> On Tue, 1 Sep 2020 at 22:48, Filipp Zhinkin > wrote: >> Hi Igor, >> >> On Tue, 1 Sep 2020 at 19:46, Igor Ignatyev > wrote: >> Hi Filipp, >> >> 1st of all, welcome back! >> >> thanks! >> >> >> it would be better to throw jtreg.SkippedException at L#46 so jtreg will reported the test as skipped (as opposed to just passed). >> Thanks, I'll update Test8202414 as well as compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess (which also skips execution the same way). >> >> alternative, you could use '@requires vm.simpleArch != "arm"' to exclude the test from arm32 execution. >> >> I was thinking about adding something like vm.unalignedAccess.enabled, but it seems to be too complicated solution for two tests (Test8202414 and compiler/unsafe/JdkInternalMiscUnsafeUnalignedAccess). >> I don't want to use 'simpleArch', because if some new platform missing unaligned access support will be added in the future then someone will have to spend time to find out why a test crashes. >> >> Thanks, >> Filipp. >> >> >> Thanks, >> -- Igor >> >> >> > On Sep 1, 2020, at 8:29 AM, Filipp Zhinkin > wrote: >> > >> > Hi, >> > >> > Test8202414 crashes on ARM32 while writing to memory using an unaligned >> > address. >> > ARM32 supports unaligned memory accesses for some load/store instructions >> > under certain conditions, but LDRD (which is used when we're calling >> > Unsafe::putLong) is always causing alignment fault when called with an >> > unaligned address [1]. >> > >> > The fix is simply skipping the test execution if a platform does not >> > support unaligned memory accesses. >> > >> > Bug: https://bugs.openjdk.java.net/browse/JDK-8251152 >> > Webrev: http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.0/ >> > >> > [1] ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, ?A3.2.1 >> > Unaligned data access https://developer.arm.com/documentation/ddi0406/cd >> > >> > Thanks, >> > Filipp. >> > From Roger.Riggs at oracle.com Fri Sep 4 15:07:55 2020 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Fri, 4 Sep 2020 11:07:55 -0400 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> Message-ID: <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> Hi Corey, The idea I had in mind is refactoring the fast path into the method you call decodeBlock. Base64: lines 751-768. It leaves all the unknown/illegal character handling to the Java code. And yes, it does not need to handle MIME, except to return on illegal characters. The patch is attached. Regards, Roger On 8/31/20 6:22 PM, Corey Ashford wrote: > On 8/29/20 1:19 PM, Corey Ashford wrote: >> Hi Roger, >> >> Thanks for your reply and thoughts!? Comments interspersed below: >> >> On 8/28/20 10:54 AM, Roger Riggs wrote: > ... >>> Comparing with the way that the Base64 encoder was intrinsified, the >>> method that is intrinsified should have a method body that does >>> the same function, so it is interchangable.? That likely will just >>> shift >>> the "fast path" code into the decodeBlock method. >>> Keeping the symmetry between encoder and decoder will >>> make it easier to maintain the code. >> >> Good point.? I'll investigate what this looks like in terms of the >> actual code, and will report back (perhaps in a new webrev). >> > > Having looked at this again, I don't think it makes sense.? One thing > that differs significantly from the encodeBlock intrinsic is that the > decodeBlock intrinsic only needs to process a prefix of the data, and > so it can leave virtually any amount of data at the end of the src > buffer unprocessed, where as with the encodeBlock intrinsic, if it > exists, it must process the entire buffer. > > In the (common) case where the decodeBlock intrinsic returns not > having processed everything, it still needs to call the Java code, and > if that Java code is "replaced" by the intrinsic, it's inaccessible. > > Is there something I'm overlooking here?? Basically I want the decode > API to behave differently than the encode API, mostly to make the > arch-specific intrinsic easier to implement. If that's not acceptable, > then I need to rethink the API, and also figure out how to deal with > the illegal character case.? The latter could perhaps be done by > throwing an exception from the intrinsic, or maybe by returning a > negative length that specifies the index of the illegal src byte, and > then have the Java code throw the exception). > > Regards, > > - Corey > -------------- next part -------------- diff --git a/src/java.base/share/classes/java/util/Base64.java b/src/java.base/share/classes/java/util/Base64.java index 34b39b18a54..e2b3a686d70 100644 --- a/src/java.base/share/classes/java/util/Base64.java +++ b/src/java.base/share/classes/java/util/Base64.java @@ -741,6 +741,27 @@ public class Base64 { return 3 * (int) ((len + 3L) / 4) - paddings; } + // Fast path for full 4 byte -> 3 byte conversion w/o errors + private int decodeBlock(byte[] src, int sp, int sl, byte[] dst, boolean isURL) { + int[] base64 = isURL ? fromBase64URL : fromBase64; + int dp = 0; + int sl0 = sp + ((sl - sp) & ~0b11); + while (sp < sl0) { + int b1 = base64[src[sp++] & 0xff]; + int b2 = base64[src[sp++] & 0xff]; + int b3 = base64[src[sp++] & 0xff]; + int b4 = base64[src[sp++] & 0xff]; + if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte + return dp; + } + int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4; + dst[dp++] = (byte)(bits0 >> 16); + dst[dp++] = (byte)(bits0 >> 8); + dst[dp++] = (byte)(bits0); + } + return dp; + } + private int decode0(byte[] src, int sp, int sl, byte[] dst) { int[] base64 = isURL ? fromBase64URL : fromBase64; int dp = 0; @@ -749,23 +770,34 @@ public class Base64 { while (sp < sl) { if (shiftto == 18 && sp + 4 < sl) { // fast path - int sl0 = sp + ((sl - sp) & ~0b11); - while (sp < sl0) { - int b1 = base64[src[sp++] & 0xff]; - int b2 = base64[src[sp++] & 0xff]; - int b3 = base64[src[sp++] & 0xff]; - int b4 = base64[src[sp++] & 0xff]; - if ((b1 | b2 | b3 | b4) < 0) { // non base64 byte - sp -= 4; - break; - } - int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4; - dst[dp++] = (byte)(bits0 >> 16); - dst[dp++] = (byte)(bits0 >> 8); - dst[dp++] = (byte)(bits0); - } - if (sp >= sl) - break; + int dl = decodeBlock(src, sp, sl, dst, isURL); + /* + * Calculate how many characters were processed by how many + * bytes of data were returned. + */ + + /* + * Base64 characters always come in groups of four, + * producing three bytes of binary data (except for on + * the final four-character piece where it can produce + * one to three data bytes depending on how many fill + * characters there - zero, one, or two). The only + * case where there should be a non-multiple of three + * returned is if the intrinsic has processed all of + * the characters passed to it. At this point in the + * logic, however, we know the instrinsic hasn't + * processed all of the chracters. + * + * Round dl down to the nearest three-byte boundary. + */ + dl = (dl / 3) * 3; + + // Recalculate chars_decoded based on the rounded dl + int chars_decoded = (dl / 3) * 4; + + sp += chars_decoded; + dp += dl; + continue; } int b = src[sp++] & 0xff; if ((b = base64[b]) < 0) { From igor.ignatyev at oracle.com Fri Sep 4 16:13:40 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 4 Sep 2020 09:13:40 -0700 Subject: RFR(T) : 8252778 : remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test In-Reply-To: <6189b097-a1e5-7236-8f62-8c88c1ee4f74@oracle.com> References: <6189b097-a1e5-7236-8f62-8c88c1ee4f74@oracle.com> Message-ID: thanks for your review, Katya. @list, can I get a Review? Cheers, -- Igor > On Sep 3, 2020, at 2:16 PM, Ekaterina Pavlova wrote: > > Looks good. > > -katya > > On 9/3/20 11:40 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ >>> 2 lines changed: 0 ins; 1 del; 1 mod; >> Hi all, >> could you please review this small and trivial cleanup? >> from JBS: >>> `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" >> testing: compiler/c2/stemmer on {linux,windows,macos}-x64 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252778 >> webrev: http://cr.openjdk.java.net/~iignatyev//8252778/webrev.00/ >> Thanks, >> -- Igor > From igor.ignatyev at oracle.com Fri Sep 4 16:15:22 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 4 Sep 2020 09:15:22 -0700 Subject: RFR(T) : 8252774 : remove jdk.test.lib.FileInstaller action from graalunit tests In-Reply-To: <12adb4ea-a8cf-f331-6c23-66e592e2e47d@oracle.com> References: <12adb4ea-a8cf-f331-6c23-66e592e2e47d@oracle.com> Message-ID: <07381AC0-BC58-4D72-889E-206704E63499@oracle.com> Katya, thank you for your review! @list, can I get a Review? Cheers, -- Igor > On Sep 3, 2020, at 2:15 PM, Ekaterina Pavlova wrote: > > Looks good. > Thanks Igor for fixing it! > > -katya > > On 9/3/20 10:12 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 >>> 341 lines changed: 0 ins; 240 del; 101 mod; >> Hi all, >> could you please review this small and trivial clean up in test/hotspot/jtreg/compiler/graalunit? >> from JBS: >>> test/hotspot/jtreg/compiler/graalunit tests use jdk.test.lib.FileInstaller to copy ProblemList-graal.txt from test/hotspot/jtreg/ to the current working directory as ExcludeList.txt, and then run compiler.graalunit.common.GraalUnitTestLauncher w/ -exclude ExcludeList.txt. >>> >>> j.t.l.FileInstaller actions aren't needed as c.g.c.GraalUnitTestLauncher interpeters `-exclude`'s value as path to file (as oppose to the file name in current directory), so we can use ${test.root}/ProblemList-graal.txt instead of ExcludeList.txt there. >> the patch modifies generateTests.sh to use ${test.root}/ProblemList-graal.txt, cleans it up (removes trailing spaces, empty @summary tag, and redundant explicit @build) and regenerates graalunit tests. >> testing: test/hotspot/jtreg/compiler/graalunit on {linux,windows,macos}-x64 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252774 >> webrev: http://cr.openjdk.java.net/~iignatyev//8252774/webrev.00 >> Thanks, >> -- Igor > From hohensee at amazon.com Fri Sep 4 17:39:30 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 4 Sep 2020 17:39:30 +0000 Subject: RFR 8239090: Improve CPU feature support in VM_version In-Reply-To: <8DBEA512-283E-443A-A820-E215D5EF03EA@amazon.com> References: <8DBEA512-283E-443A-A820-E215D5EF03EA@amazon.com> Message-ID: <0641DA2E-3BFA-400D-8214-DE3D8904CB3B@amazon.com> Slightly adjusted patch. http://cr.openjdk.java.net/~phh/8239090/webrev.02/ Thanks, Paul ?On 9/3/20, 3:47 PM, "hotspot-compiler-dev on behalf of Hohensee, Paul" wrote: Taking over from Eric... Thank you for the review, Igor. I took a completely different (and very old approach), however, and defined a method Abstract_VM_Version:: insert_features_names() that iterates over the feature flags set. If a feature bit is on, it appends to an output buffer a corresponding name string from an array indexed by the bit number. I've implemented it only for x86: using the mechanism for other platforms can be follow-on RFEs. I'd greatly appreciate a review. Webrev: http://cr.openjdk.java.net/~phh/8239090/webrev.00/ To add a feature bit, all one now has to do is add a CPU_ definition and corresponding name string in the FEATURES_NAMES macro. I've also included a few small changes to the x86 implementation beyond the above. 1. Unified the previous two bitset definitions into a single Feature_Flag enum and made it a uint64_t. 2. supports_tscinv_bit() referenced the CPU_TSCINV bit, which was a bit misleading, so added a new CPU_TSCINV_BIT mask and used it instead. 3. Repurposed CPU_TSCINV for supports_tscinv(), which was a "composite" property, but is now computed once in feature_flags(). 4. Made supports_clflushopt() and supports_clwb() common to both 32 and 64-bit rather than have 32-bit versions that always return 'false'. These bits are never set by the hardware on 32-bit, so no need for separate methods. 5. Renamed CPU_HV_PRESENT to CPU_HV to conform with the CPU_ bit naming scheme. "_PRESENT" is redundant and not used for any other CPU_ name, and the feature string uses "hv", not "hv_present". Added CPU_HV to vmStructs_x86.hpp and vmStructs_jvmci.cpp. Tested using -Xlog:os+cpu on my macbook pro: the same feature string is returned after the patch as before it. Suggestions for how to more thoroughly test the patch are very welcome. Thanks, Paul On 8/27/20, 6:22 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. That would be a good way to get rid of the ugly macro (we allow c++14 now). For example foo() in this example: enum E { a, b, c }; struct P { E _e; // key int _v; // value constexpr P(E e, int v) : _e(e), _v(v) { } }; constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; constexpr int match(E e) { for (const auto& p : ps) { if (p._e == e) { return p._v; } } return -1; } int foo(E e) { return match(e); } Will be compiled into: __Z3foo1E: ## @_Z3foo1E .cfi_startproc ## %bb.0: movl $-1, %eax cmpl $2, %edi ja LBB0_2 ## %bb.1: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movslq %edi, %rax leaq l_switch.table._Z3foo1E(%rip), %rcx movq (%rcx,%rax,8), %rax movl 4(%rax), %eax popq %rbp LBB0_2: retq .cfi_endproc ## -- End function .section __TEXT,__const .p2align 4 ## @_ZL2ps __ZL2ps: .long 0 ## 0x0 .long 57005 ## 0xdead .long 1 ## 0x1 .long 48879 ## 0xbeef .long 2 ## 0x2 .long 61453 ## 0xf00d .section __DATA,__const .p2align 3 ## @switch.table._Z3foo1E l_switch.table._Z3foo1E: .quad __ZL2ps .quad __ZL2ps+8 .quad __ZL2ps+16 igor > On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: > > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 > > Yesterday I sent a wrong one, so I send it again, > I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. > > Regards, > Eric Chen > From vladimir.kozlov at oracle.com Fri Sep 4 18:39:06 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Sep 2020 11:39:06 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: <8eead99d-cc14-05b2-10da-e7378a9251ea@oracle.com> References: <8eead99d-cc14-05b2-10da-e7378a9251ea@oracle.com> Message-ID: Thank you, Tobias On 9/3/20 10:58 PM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Some minor comments: > - This notation is confusing "0/{32|64}" (looks like a division). I think it should be something > like "{0|32|64}". Right. Fixed. > - Extra whitespaces in the lines with "return new" and "val , " should be removed. Fixed. Thanks, Vladimir K > > Best regards, > Tobias > > On 03.09.20 23:43, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8252188 >> >> Code added by 8248830 [1] uses Node::is_Con() check when looking for constant shift values. >> Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. >> I used C2 types to check and get shift values. I also refactor code to consolidate checks. >> >> Tested: tier1, hs-tier2, hs-tier3. >> Verified fix with replay file from bug report. >> I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. >> >> I created subtask to add new regerssion test later because this fix is urgent and I did not have >> time to prepare it. >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From vladimir.x.ivanov at oracle.com Fri Sep 4 18:48:00 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 4 Sep 2020 21:48:00 +0300 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: <1c7d0be9-1e4b-01a3-a855-3b65e819f385@oracle.com> > https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ Looks good. Best regards, Vladimir Ivanov > https://bugs.openjdk.java.net/browse/JDK-8252188 > > Code added by 8248830 [1] uses Node::is_Con() check when looking for > constant shift values. > Unfortunately it does not guarantee that it will be Integer constant > because TOP node is also ConNode. > I used C2 types to check and get shift values. I also refactor code to > consolidate checks. > > Tested: tier1, hs-tier2, hs-tier3. > Verified fix with replay file from bug report. > I also checked that RotateBenchmark.java added by 8248830 still creates > Rotate vectors after this fix. > > I created subtask to add new regerssion test later because this fix is > urgent and I did not have time to prepare it. > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From vladimir.kozlov at oracle.com Fri Sep 4 19:04:05 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Sep 2020 12:04:05 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: <1c7d0be9-1e4b-01a3-a855-3b65e819f385@oracle.com> References: <1c7d0be9-1e4b-01a3-a855-3b65e819f385@oracle.com> Message-ID: <7c91d3b2-fa5d-ea81-eda6-a2e5ce1e4f67@oracle.com> Thank you, Vladimir Vladimir K On 9/4/20 11:48 AM, Vladimir Ivanov wrote: > >> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> https://bugs.openjdk.java.net/browse/JDK-8252188 >> >> Code added by 8248830 [1] uses Node::is_Con() check when looking for constant shift values. >> Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. >> I used C2 types to check and get shift values. I also refactor code to consolidate checks. >> >> Tested: tier1, hs-tier2, hs-tier3. >> Verified fix with replay file from bug report. >> I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. >> >> I created subtask to add new regerssion test later because this fix is urgent and I did not have time to prepare it. >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From igor.veresov at oracle.com Fri Sep 4 23:04:15 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 4 Sep 2020 16:04:15 -0700 Subject: RFR 8239090: Improve CPU feature support in VM_version In-Reply-To: <0641DA2E-3BFA-400D-8214-DE3D8904CB3B@amazon.com> References: <8DBEA512-283E-443A-A820-E215D5EF03EA@amazon.com> <0641DA2E-3BFA-400D-8214-DE3D8904CB3B@amazon.com> Message-ID: <73324AFF-8AB1-4E82-B9B0-F1C11327B26D@oracle.com> This looks good. Did you make FEATURES_NAMES a macro just so that it?s close to the flags enum? igor > On Sep 4, 2020, at 10:39 AM, Hohensee, Paul wrote: > > Slightly adjusted patch. > > http://cr.openjdk.java.net/~phh/8239090/webrev.02/ > > Thanks, > Paul > > ?On 9/3/20, 3:47 PM, "hotspot-compiler-dev on behalf of Hohensee, Paul" wrote: > > Taking over from Eric... > > Thank you for the review, Igor. I took a completely different (and very old approach), however, and defined a method Abstract_VM_Version:: insert_features_names() that iterates over the feature flags set. If a feature bit is on, it appends to an output buffer a corresponding name string from an array indexed by the bit number. I've implemented it only for x86: using the mechanism for other platforms can be follow-on RFEs. I'd greatly appreciate a review. > > Webrev: http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > > To add a feature bit, all one now has to do is add a CPU_ definition and corresponding name string in the FEATURES_NAMES macro. > > I've also included a few small changes to the x86 implementation beyond the above. > > 1. Unified the previous two bitset definitions into a single Feature_Flag enum and made it a uint64_t. > 2. supports_tscinv_bit() referenced the CPU_TSCINV bit, which was a bit misleading, so added a new CPU_TSCINV_BIT mask and used it instead. > 3. Repurposed CPU_TSCINV for supports_tscinv(), which was a "composite" property, but is now computed once in feature_flags(). > 4. Made supports_clflushopt() and supports_clwb() common to both 32 and 64-bit rather than have 32-bit versions that always return 'false'. These bits are never set by the hardware on 32-bit, so no need for separate methods. > 5. Renamed CPU_HV_PRESENT to CPU_HV to conform with the CPU_ bit naming scheme. "_PRESENT" is redundant and not used for any other CPU_ name, and the feature string uses "hv", not "hv_present". Added CPU_HV to vmStructs_x86.hpp and vmStructs_jvmci.cpp. > > Tested using -Xlog:os+cpu on my macbook pro: the same feature string is returned after the patch as before it. Suggestions for how to more thoroughly test the patch are very welcome. > > Thanks, > Paul > > On 8/27/20, 6:22 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: > > You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. > That would be a good way to get rid of the ugly macro (we allow c++14 now). > > For example foo() in this example: > > enum E { a, b, c }; > > struct P { > E _e; // key > int _v; // value > constexpr P(E e, int v) : _e(e), _v(v) { } > }; > > > constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; > > constexpr int match(E e) { > for (const auto& p : ps) { > if (p._e == e) { > return p._v; > } > } > return -1; > } > > > int foo(E e) { > return match(e); > } > > Will be compiled into: > > __Z3foo1E: ## @_Z3foo1E > .cfi_startproc > ## %bb.0: > movl $-1, %eax > cmpl $2, %edi > ja LBB0_2 > ## %bb.1: > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset %rbp, -16 > movq %rsp, %rbp > .cfi_def_cfa_register %rbp > movslq %edi, %rax > leaq l_switch.table._Z3foo1E(%rip), %rcx > movq (%rcx,%rax,8), %rax > movl 4(%rax), %eax > popq %rbp > LBB0_2: > retq > .cfi_endproc > ## -- End function > .section __TEXT,__const > .p2align 4 ## @_ZL2ps > __ZL2ps: > .long 0 ## 0x0 > .long 57005 ## 0xdead > .long 1 ## 0x1 > .long 48879 ## 0xbeef > .long 2 ## 0x2 > .long 61453 ## 0xf00d > > .section __DATA,__const > .p2align 3 ## @switch.table._Z3foo1E > l_switch.table._Z3foo1E: > .quad __ZL2ps > .quad __ZL2ps+8 > .quad __ZL2ps+16 > > > igor > > >> On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: >> >> Hi, >> >> Requesting review for >> >> Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ >> JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 >> >> Yesterday I sent a wrong one, so I send it again, >> I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. >> >> Regards, >> Eric Chen >> > > > From honguye at microsoft.com Fri Sep 4 18:34:05 2020 From: honguye at microsoft.com (Nhat Nguyen) Date: Fri, 4 Sep 2020 18:34:05 +0000 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Message-ID: Hi everyone, I'm sorry for overlooking that for_igvn is not empty once PhaseRenumberLive finishes. On 03.09.20 19:29, Liu, Xin wrote: > How about this? We can clear() the worklist out. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,9 +2089,12 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > + Unique_Node_List* save_for_igvn = for_igvn(); > set_for_igvn(&new_worklist); > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > + set_for_igvn(save_for_igvn); > + for_igvn()->clear(); > } Thank you Xin for the suggestion -- I totally agree with it. ?? Nhat -----Original Message----- From: Tobias Hartmann Sent: Friday, September 4, 2020 2:29 AM To: Liu, Xin ; Christian Hagedorn ; Nhat Nguyen ; hotspot-compiler-dev at openjdk.java.net Subject: [EXTERNAL] Re: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Hi Xin, On 03.09.20 19:29, Liu, Xin wrote: > How about this? We can clear() the worklist out. > > diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp > --- a/src/hotspot/share/opto/compile.cpp > +++ b/src/hotspot/share/opto/compile.cpp > @@ -2089,9 +2089,12 @@ > ResourceMark rm; > PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > } > + Unique_Node_List* save_for_igvn = for_igvn(); > set_for_igvn(&new_worklist); > igvn = PhaseIterGVN(initial_gvn()); > igvn.optimize(); > + set_for_igvn(save_for_igvn); > + for_igvn()->clear(); > } That looks good to me. Best regards, Tobias From vladimir.kozlov at oracle.com Sat Sep 5 02:02:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Sep 2020 19:02:01 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> Thank you, Jatin On 9/3/20 10:08 PM, Bhateja, Jatin wrote: > Hi Vladimir, > > Thanks for taking care of this. > Similar strict check for constant shift is needed in OrVNode::Ideal routine in vectornode.cpp. It took me some time to analyze your code for "lazy de-generation" of Rotate vectors. As I understand you want to preserve scalar optimization which creates Rotate nodes but have to revert it to keep vectorization of java code. Method degenerate_vector_rotate() has is_Con() check and, in general, it could be TOP because we do loop optimizations after vectorization. I added isa_int() check and treat 'cnt' in other case as variable to do transformation on 'else' branch and let sub-graph collapse there. I also refactor degenerate_vector_rotate() to make it compact. Second, about OrVNode::Ideal(). I am not sure how safe it is without additional investigation because currently it is not executed. Based on comment it was added for VectorAPI which is experimental and not pushed yet. The code is convoluted and does not match scalar Or::Ideal() code. OrINode::Ideal() does next checks for left rotation: if (Matcher::match_rule_supported(Op_RotateLeft) && lopcode == Op_LShiftI && ropcode == Op_URShiftI && in(1)->in(1) == in(2)->in(1)) { but OrVNode::Ideal() does: if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, bt) && ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || Why it checks RIGHT operator for LShiftV???? And asserts are contradicting: assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV operand expected"); Was this code tested? My immediate reaction is simple delete it now and add reworked and tested version back with EnableVectorSupport flag check after VectorAPI is integrated. Reworked version may use the same new rotate_shift() I added. I start rewriting it but since I can't test it and I am not sure may be edges are swapped indeed. I am suggesting to remove it. Also VectorAPI should use Rotate vectors from start which we can de-generation if not supported. So I am not sure how OrVNode::Ideal() will be usefull for VEctorAPI too. http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ About testing. I see you used a lot of -128, 128 and similar values which are larger then bits in Java Integer and Long. But Java do masking of shift count by default before executing shift. I would prefer if something like 31 (or 63 for Long) were used instead. Otherwise Rotate vectors are not generated and tested. compiler/intrinsics/TestRotate.java calls verify() after each operation as result it is really hard to see generated assembler. I think we should at least exclude inlinining of verify(). I will work on tests and have an other update. Thanks, Vladimir K > > Regards, > Jatin > >> -----Original Message----- >> From: hotspot-compiler-dev On >> Behalf Of Vladimir Kozlov >> Sent: Friday, September 4, 2020 3:14 AM >> To: hotspot compiler >> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >> bool)+0x8b9 >> >> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8252188 >> >> Code added by 8248830 [1] uses Node::is_Con() check when looking for >> constant shift values. >> Unfortunately it does not guarantee that it will be Integer constant >> because TOP node is also ConNode. >> I used C2 types to check and get shift values. I also refactor code to >> consolidate checks. >> >> Tested: tier1, hs-tier2, hs-tier3. >> Verified fix with replay file from bug report. >> I also checked that RotateBenchmark.java added by 8248830 still creates >> Rotate vectors after this fix. >> >> I created subtask to add new regerssion test later because this fix is >> urgent and I did not have time to prepare it. >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From aph at redhat.com Sat Sep 5 14:47:00 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 5 Sep 2020 15:47:00 +0100 Subject: JDK-8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <0cbe7d8f594349b59504c42f89c6f268@EX13D46EUB003.ant.amazon.com> References: <0cbe7d8f594349b59504c42f89c6f268@EX13D46EUB003.ant.amazon.com> Message-ID: On 03/09/2020 22:28, Tatton, Jason wrote: > > JMH Benchmark results: > ==================== > The benchmarks examine the 3 codepaths for StringLatin1 and > StringUTF16. Here are the results for Intel x86 (ARM is similar): > > FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16): > Latin1 UTF16 > Short: 15 7 > SSE4: 16 8 > AVX2: 32 16 > > Without StringLatin1 indexofchar intrinsic: > Benchmark Mode Cnt Score Error Units > IndexOfBenchmark.latin1_AVX2_String thrpt 5 121781.424 ? 355.085 ops/s > IndexOfBenchmark.latin1_AVX2_char thrpt 5 46060.612 ? 151.274 ops/s > IndexOfBenchmark.latin1_SSE4_String thrpt 5 197339.146 ? 90.333 ops/s > IndexOfBenchmark.latin1_SSE4_char thrpt 5 61401.204 ? 426.761 ops/s > IndexOfBenchmark.latin1_Short_String thrpt 5 175389.355 ? 294.976 ops/s > IndexOfBenchmark.latin1_Short_char thrpt 5 60759.868 ? 124.349 ops/s > IndexOfBenchmark.utf16_AVX2_String thrpt 5 123601.020 ? 111.981 ops/s > IndexOfBenchmark.utf16_AVX2_char thrpt 5 141116.832 ? 380.489 ops/s > IndexOfBenchmark.utf16_SSE4_String thrpt 5 178136.762 ? 143.227 ops/s > IndexOfBenchmark.utf16_SSE4_char thrpt 5 181430.649 ? 120.097 ops/s > IndexOfBenchmark.utf16_Short_String thrpt 5 158301.361 ? 182.738 ops/s > IndexOfBenchmark.utf16_Short_char thrpt 5 84876.919 ? 247.769 ops/s > > With StringLatin1 indexofchar intrinsic: > Benchmark Mode Cnt Score Error Units > IndexOfBenchmark.latin1_AVX2_String thrpt 5 113621.676 ? 68.235 ops/s > IndexOfBenchmark.latin1_AVX2_char thrpt 5 177757.909 ? 727.308 ops/s > IndexOfBenchmark.latin1_SSE4_String thrpt 5 180529.049 ? 57.356 ops/s > IndexOfBenchmark.latin1_SSE4_char thrpt 5 235087.776 ? 457.024 ops/s > IndexOfBenchmark.latin1_Short_String thrpt 5 165914.990 ? 329.024 ops/s > IndexOfBenchmark.latin1_Short_char thrpt 5 53989.544 ? 65.393 ops/s > IndexOfBenchmark.utf16_AVX2_String thrpt 5 107632.783 ? 446.272 ops/s > IndexOfBenchmark.utf16_AVX2_char thrpt 5 143131.734 ? 159.944 ops/s > IndexOfBenchmark.utf16_SSE4_String thrpt 5 169882.703 ? 1024.367 ops/s > IndexOfBenchmark.utf16_SSE4_char thrpt 5 175693.972 ? 775.423 ops/s > IndexOfBenchmark.utf16_Short_String thrpt 5 163595.993 ? 225.089 ops/s > IndexOfBenchmark.utf16_Short_char thrpt 5 90126.154 ? 365.642 ops/s > > We can see above that indexOf(char) now behaves similarly between > StringUTF16 and StringLatin1. This is confusing. Can you please make the times nanoseconds? It's quite a struggle trying to think in reciprocal units for these very low-level benchmarks. Maybe it's just me. There are 1000 strings of length 32 bytes, so I guess that makes everything fit in L1, just. I guess that was the idea? > //'a is never present in rnd string So you only benchmarks searches that always fail? I don't get that at all. I'd also vary string lengths. 32 characters is a good average, so you should have a decent spread of different lengths, average over the whole set 32. I'd place a terminating character randomly in *at least* 50% of the strings. I think that would be much more representative. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christoph.langer at sap.com Sat Sep 5 20:48:57 2020 From: christoph.langer at sap.com (Langer, Christoph) Date: Sat, 5 Sep 2020 20:48:57 +0000 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: <2102D49A-5672-4695-A271-3136EB299C3F@oracle.com> References: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> <2102D49A-5672-4695-A271-3136EB299C3F@oracle.com> Message-ID: Hi, > > updated webrev: > http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.2/ > (I've > updated years in the copyright). > > I don't know the format used in SAP copyright notices, so I can't really tell if > you should have added ", 2020" there, this I'd suggest you to seek comments > from SAP folks. As for the SAP copyright, the webrev looks perfect ?? For the SAP copyright notice, we don't append a comma after the trailing year. Best regards Christoph From iignatyev at openjdk.java.net Sun Sep 6 16:37:51 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sun, 6 Sep 2020 16:37:51 GMT Subject: RFR: 8252778: remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test Message-ID: pre-Skara RFR [thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039878.html) Hi all, could you please review this small and trivial cleanup? from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252778): > `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the > current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st > argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" testing: compiler/c2/stemmer on {linux,windows,macos}-x64 ------------- Commit messages: - 8252778: remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test Changes: https://git.openjdk.java.net/jdk/pull/33/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=33&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252778 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/33.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/33/head:pull/33 PR: https://git.openjdk.java.net/jdk/pull/33 From iignatyev at openjdk.java.net Sun Sep 6 16:44:13 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sun, 6 Sep 2020 16:44:13 GMT Subject: RFR: 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests Message-ID: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> [pre-Skara RFR](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039874.html) Hi all, could you please review this small and trivial clean up in `test/hotspot/jtreg/compiler/graalunit`? from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252774): > `test/hotspot/jtreg/compiler/graalunit` tests use `jdk.test.lib.FileInstaller` to copy `ProblemList-graal.txt` from > `test/hotspot/jtreg/` to the current working directory as `ExcludeList.txt`, and then run > `compiler.graalunit.common.GraalUnitTestLauncher` w/ `-exclude ExcludeList.txt`. `j.t.l.FileInstaller` actions aren't needed as `c.g.c.GraalUnitTestLauncher` interpeters `-exclude`'s value as path to file (as oppose to the file name in current directory), so we can use `${test.root}/ProblemList-graal.txt` instead of `ExcludeList.txt` there. the patch modifies `generateTests.sh` to use `${test.root}/ProblemList-graal.txt`, cleans it up (removes trailing spaces, empty `@summary` tag, and redundant explicit `@build`) and regenerates graalunit tests. testing: `test/hotspot/jtreg/compiler/graalunit` on {linux,windows,macos}-x64 ------------- Commit messages: - 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests Changes: https://git.openjdk.java.net/jdk/pull/34/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=34&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252774 Stats: 341 lines in 48 files changed: 0 ins; 240 del; 101 mod Patch: https://git.openjdk.java.net/jdk/pull/34.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/34/head:pull/34 PR: https://git.openjdk.java.net/jdk/pull/34 From iignatyev at openjdk.java.net Sun Sep 6 16:50:38 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sun, 6 Sep 2020 16:50:38 GMT Subject: RFR: 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests In-Reply-To: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> References: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> Message-ID: <1GpFfYfl_s2ovcj4VN-eIV9jfbpedDC3st-clMh37nU=.4b0aaea4-686f-4134-a4f3-092f873fa648@github.com> On Sun, 6 Sep 2020 16:37:47 GMT, Igor Ignatyev wrote: > [pre-Skara RFR](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039874.html) > Hi all, > > could you please review this small and trivial clean up in `test/hotspot/jtreg/compiler/graalunit`? > from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252774): >> `test/hotspot/jtreg/compiler/graalunit` tests use `jdk.test.lib.FileInstaller` to copy `ProblemList-graal.txt` from >> `test/hotspot/jtreg/` to the current working directory as `ExcludeList.txt`, and then run >> `compiler.graalunit.common.GraalUnitTestLauncher` w/ `-exclude ExcludeList.txt`. > > `j.t.l.FileInstaller` actions aren't needed as `c.g.c.GraalUnitTestLauncher` interpeters `-exclude`'s value as path to > file (as oppose to the file name in current directory), so we can use `${test.root}/ProblemList-graal.txt` instead of > `ExcludeList.txt` there. > the patch modifies `generateTests.sh` to use `${test.root}/ProblemList-graal.txt`, cleans it up (removes trailing > spaces, empty `@summary` tag, and redundant explicit `@build`) and regenerates graalunit tests. > testing: `test/hotspot/jtreg/compiler/graalunit` on {linux,windows,macos}-x64 > @iignatev This change now passes all automated pre-integration checks. ... no, it's not. `epavlova` isn't a Reviewer in JDK project, so this PR and #33 aren't ready to be integrated. @edvbld , @rwestberg, could you please take a look? ------------- PR: https://git.openjdk.java.net/jdk/pull/34 From dnsimon at openjdk.java.net Sun Sep 6 19:53:53 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sun, 6 Sep 2020 19:53:53 GMT Subject: RFR: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode Message-ID: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: * Each JVMCI compiler thread has a "compilation ticks" counter. * There is also a global JVMCI compilation ticks counter. * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. * Every 512 increments of such a counter also increments the global counter. * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a defined period. ------------- Commit messages: - add compilation ticks for mitigating against deadlock due to blocking JVMCI compilation Changes: https://git.openjdk.java.net/jdk/pull/35/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=35&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252543 Stats: 126 lines in 11 files changed: 72 ins; 14 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/35.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/35/head:pull/35 PR: https://git.openjdk.java.net/jdk/pull/35 From goetz.lindenmaier at sap.com Mon Sep 7 06:35:04 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 7 Sep 2020 06:35:04 +0000 Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support Message-ID: Hi Since that change was pushed, the vm crashes in the build: To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/type.cpp:1022 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, tid=28983 # assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # JRE version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1-internal+0-adhoc.openjdk.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0-adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) # Problematic frame: # V [libjvm.so+0x1bfe22c] Type::xdual() const+0xfc # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again Do you have an ad-hoc idea of the problem? I locally backed out the change which fixed the issue. Best regards, Goetz. From Ningsheng.Jian at arm.com Mon Sep 7 06:50:36 2020 From: Ningsheng.Jian at arm.com (Ningsheng Jian) Date: Mon, 7 Sep 2020 06:50:36 +0000 Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support In-Reply-To: References: Message-ID: Hi Goetz, I am sorry about that and thanks for helping to identify the issue. As I cannot reproduce this on x86 and AArch64, a quick guess is that I may have missed the code like: diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index 2f4047dfa8e..f7d2f5b2320 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -62,12 +62,14 @@ const Type::TypeInfo Type::_type_info[Type::lastype] = { { Bad, T_ARRAY, "array:", false, Node::NotAMachineReg, relocInfo::none }, // Array #if defined(PPC64) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ #elif defined(S390) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX Could you please help to have a try? Thanks, Ningsheng > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Monday, September 7, 2020 2:35 PM > To: Ningsheng Jian ; Andrew Dinn ; > hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir > Ivanov ; Erik ?sterlund > Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin > Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support > > Hi > > Since that change was pushed, the vm crashes in the build: > > To suppress the following error report, specify this argument # after -XX: or > in .hotspotrc: SuppressErrorAt=/type.cpp:1022 # # A fatal error has been detected by > the Java Runtime Environment: > # > # Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, tid=28983 # > assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # JRE > version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1- > internal+0-adhoc.openjdk.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0- > adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) # > Problematic frame: > # V [libjvm.so+0x1bfe22c] Type::xdual() const+0xfc # # No core dump will be written. > Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" > before starting Java again > > Do you have an ad-hoc idea of the problem? > > I locally backed out the change which fixed the issue. > > Best regards, > Goetz. From goetz.lindenmaier at sap.com Mon Sep 7 07:06:50 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 7 Sep 2020 07:06:50 +0000 Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support In-Reply-To: References: Message-ID: HI, Thanks for the hint! Sounds good, I'll give it a try. ... I removed some emails to reduce traffic. Best regads, Goetz. -----Original Message----- From: Ningsheng Jian Sent: Montag, 7. September 2020 08:51 To: Lindenmaier, Goetz ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir Ivanov ; Erik ?sterlund Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support Hi Goetz, I am sorry about that and thanks for helping to identify the issue. As I cannot reproduce this on x86 and AArch64, a quick guess is that I may have missed the code like: diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index 2f4047dfa8e..f7d2f5b2320 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -62,12 +62,14 @@ const Type::TypeInfo Type::_type_info[Type::lastype] = { { Bad, T_ARRAY, "array:", false, Node::NotAMachineReg, relocInfo::none }, // Array #if defined(PPC64) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ #elif defined(S390) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX Could you please help to have a try? Thanks, Ningsheng > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Monday, September 7, 2020 2:35 PM > To: Ningsheng Jian ; Andrew Dinn ; > hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir > Ivanov ; Erik ?sterlund > Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin > Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support > > Hi > > Since that change was pushed, the vm crashes in the build: > > To suppress the following error report, specify this argument # after -XX: or > in .hotspotrc: SuppressErrorAt=/type.cpp:1022 # # A fatal error has been detected by > the Java Runtime Environment: > # > # Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, tid=28983 # > assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # JRE > version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1- > internal+0-adhoc.openjdk.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0- > adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) # > Problematic frame: > # V [libjvm.so+0x1bfe22c] Type::xdual() const+0xfc # # No core dump will be written. > Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" > before starting Java again > > Do you have an ad-hoc idea of the problem? > > I locally backed out the change which fixed the issue. > > Best regards, > Goetz. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From rwestrel at redhat.com Mon Sep 7 07:31:07 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 07 Sep 2020 09:31:07 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: <09ab280a-1e1b-f259-7fd4-108088c4a58d@oracle.com> References: <877dtb5li0.fsf@redhat.com> <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> <874koe6fgm.fsf@redhat.com> <09ab280a-1e1b-f259-7fd4-108088c4a58d@oracle.com> Message-ID: <87y2lm3ux0.fsf@redhat.com> > No this line should be fine. Just to be sure we are talking about the > same line, I was referring to L302 in your patch (old_new.map(..)). I > used the old_new mapping here to get a quick reference from the cloned > node in the slow loop (when processing the slow loop after the fast > loop) back to the original node in the fast loop on line L280. But I > added additional asserts there to ensure that everything is reset as > intended. So, it's fine that you removed the assertion code on L257-267. I see. So would that logic break if the slow loop is processed first? Aren't the tests: slow_node != NULL && slow_node->_idx > idx_before_clone redundant? wouldn't: slow_node->_idx > idx_before_clone or fast_node->_idx <= idx_before_clone be sufficient to figure out whether a node is part of the slow or fast loop? Roland. From lutz.schmidt at sap.com Mon Sep 7 08:02:08 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 7 Sep 2020 08:02:08 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> Message-ID: <1E234123-9501-40DF-90DE-5B840BE72B6F@sap.com> Hi Tobias, thanks a lot for the review. And sorry for my delayed response. I was out of home office for a few days. Best Regards, Lutz ?On 03.09.20, 14:18, "Tobias Hartmann" wrote: Hi Lutz, looks good to me. Best regards, Tobias On 26.08.20 17:18, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this small enhancement? Instead of calling a method doing complicated and fancy (hard to understand) checks, the iteration over all nmethods is now protected by holding the Compile_lock in addition to the CodeCache_lock. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > Thank you! > Lutz > > From rwestrel at redhat.com Mon Sep 7 08:09:33 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 07 Sep 2020 10:09:33 +0200 Subject: RFR(S): 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: <87y2lm3ux0.fsf@redhat.com> References: <877dtb5li0.fsf@redhat.com> <56f13a1d-bf0a-1990-efbb-ceb34ca9bd38@oracle.com> <874koe6fgm.fsf@redhat.com> <09ab280a-1e1b-f259-7fd4-108088c4a58d@oracle.com> <87y2lm3ux0.fsf@redhat.com> Message-ID: <87v9gq3t4y.fsf@redhat.com> I see Christian is out for a couple weeks. I added the verification code that I had removed back as it's not directly related to the bug fix and can be discussed further when Christian is back. I also made the small changes he suggested: http://cr.openjdk.java.net/~roland/8252696/webrev.01/ Roland. From goetz.lindenmaier at sap.com Mon Sep 7 09:46:00 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 7 Sep 2020 09:46:00 +0000 Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support In-Reply-To: References: Message-ID: Hi Ningsheng, I also opened an issue for this: https://bugs.openjdk.java.net/browse/JDK-8252846 So far, your proposed fix helps, I now put it into our testing queue. I would already have posted the webrev, but I guess I need to do it with skara ... where I first have to look at the new tooling. Best regards Goetz. -----Original Message----- From: Lindenmaier, Goetz Sent: Montag, 7. September 2020 09:07 To: Ningsheng Jian ; hotspot-compiler-dev at openjdk.java.net Cc: aarch64-port-dev at openjdk.java.net Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support HI, Thanks for the hint! Sounds good, I'll give it a try. ... I removed some emails to reduce traffic. Best regads, Goetz. -----Original Message----- From: Ningsheng Jian Sent: Montag, 7. September 2020 08:51 To: Lindenmaier, Goetz ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir Ivanov ; Erik ?sterlund Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support Hi Goetz, I am sorry about that and thanks for helping to identify the issue. As I cannot reproduce this on x86 and AArch64, a quick guess is that I may have missed the code like: diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index 2f4047dfa8e..f7d2f5b2320 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -62,12 +62,14 @@ const Type::TypeInfo Type::_type_info[Type::lastype] = { { Bad, T_ARRAY, "array:", false, Node::NotAMachineReg, relocInfo::none }, // Array #if defined(PPC64) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ #elif defined(S390) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX Could you please help to have a try? Thanks, Ningsheng > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Monday, September 7, 2020 2:35 PM > To: Ningsheng Jian ; Andrew Dinn ; > hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir > Ivanov ; Erik ?sterlund > Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin > Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support > > Hi > > Since that change was pushed, the vm crashes in the build: > > To suppress the following error report, specify this argument # after -XX: or > in .hotspotrc: SuppressErrorAt=/type.cpp:1022 # # A fatal error has been detected by > the Java Runtime Environment: > # > # Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, tid=28983 # > assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # JRE > version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1- > internal+0-adhoc.openjdk.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0- > adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) # > Problematic frame: > # V [libjvm.so+0x1bfe22c] Type::xdual() const+0xfc # # No core dump will be written. > Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" > before starting Java again > > Do you have an ad-hoc idea of the problem? > > I locally backed out the change which fixed the issue. > > Best regards, > Goetz. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From enikitin at openjdk.java.net Mon Sep 7 10:05:44 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 7 Sep 2020 10:05:44 GMT Subject: RFR: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java Message-ID: **Problem explanation** 1. The stress test [uses 0.8 of allowed running time](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40) + 10 seconds, thus exceeding the limit; 2. 10 additional seconds are [given by the VM](https://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388) for stuck compiler threads to finish; 3. Compiler threads aren't progressing due to the compilation being blocked; 4. Compilation is blocked via WhiteBox by the test, its lockUnlock thread; 5. The lockUnlock thread doesn't unblock the compilation because [it is a daemon](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59), it is stopped in the middle of the sleep. **Solution** Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock method into a Thread descendant, which got joined in the end. ------------- Commit messages: - 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java Changes: https://git.openjdk.java.net/jdk/pull/46/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=46&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8166554 Stats: 46 lines in 1 file changed: 28 ins; 14 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/46.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/46/head:pull/46 PR: https://git.openjdk.java.net/jdk/pull/46 From martin.doerr at sap.com Mon Sep 7 10:08:57 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Sep 2020 10:08:57 +0000 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <5ac1ac19-af1c-4ee6-b478-873031710081@linux.ibm.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <5ac1ac19-af1c-4ee6-b478-873031710081@linux.ibm.com> Message-ID: Hi Corey, thanks for investigating. Note that we use xlclang++ on AIX. It may possibly understand the directives as gcc on linux. Gcc 7.3.1 is the minimum for BE linux. But if you protect your code by #ifdef VM_LITTLE_ENDIAN no compiler except gcc >= 7.4.0 should ever look at it. Best regards, Martin > -----Original Message----- > From: Corey Ashford > Sent: Dienstag, 1. September 2020 02:17 > To: Doerr, Martin > Cc: Michihiro Horie ; hotspot-compiler- > dev at openjdk.java.net; core-libs-dev ; > Kazunori Ogata ; joserz at br.ibm.com > Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and > API for Base64 decoding > > On 8/27/20 8:07 AM, Doerr, Martin wrote: > >>> I will use __attribute__ ((align(16))) instead of __vector, and make > >> them arrays of 16 unsigned char. > > Maybe __vectors works as expected, too, now. Whatever we use, I'd > appreciate to double-check the alignment e.g. by using gdb. > > I don't remember what we had tried and why it didn't work as desired. > > > I just now tried on gcc-7.5.0, declaring a __vector at 1, 2, 3, 8, 9, > and 15 byte offsets in a struct, trying to force a misalignment, but the > compiler realigned all of them on 16-byte boundaries. > > If someone decides to make the intrinsic work on AIX (big endian), and > compiles with 7.3.1, I don't know what will happen w.r.t. alignment, so > to be on the safe side, I will make the declarations 16-byte unsigned > char arrays with an align attribute. > > Looking a bit deeper, I see that the __vector type comes out of the C > preprocessor as: __attribute__((altivec(vector__))). It's part of the > compiler's basic set of predefined macros, and can be seen using this > command: > > % gcc -dM -E - < /dev/null | grep __vector > > #define __vector __attribute__((altivec(vector__))) > > Some information here: > https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Type-Attributes.html > > I don't know if this is helpful or not, but it might answer part of your > question about the meaning of __vector. > > Regards, > > - Corey From Ningsheng.Jian at arm.com Mon Sep 7 10:13:51 2020 From: Ningsheng.Jian at arm.com (Ningsheng Jian) Date: Mon, 7 Sep 2020 10:13:51 +0000 Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support In-Reply-To: References: , Message-ID: Hi Goetz, Thanks for taking care of this. I will mark https://bugs.openjdk.java.net/browse/JDK-8252855 as duplicate. Thanks, Ningsheng ________________________________ From: Lindenmaier, Goetz Sent: Monday, September 7, 2020 5:46 PM To: Ningsheng Jian ; hotspot-compiler-dev at openjdk.java.net Cc: aarch64-port-dev at openjdk.java.net Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support Hi Ningsheng, I also opened an issue for this: https://bugs.openjdk.java.net/browse/JDK-8252846 So far, your proposed fix helps, I now put it into our testing queue. I would already have posted the webrev, but I guess I need to do it with skara ... where I first have to look at the new tooling. Best regards Goetz. -----Original Message----- From: Lindenmaier, Goetz Sent: Montag, 7. September 2020 09:07 To: Ningsheng Jian ; hotspot-compiler-dev at openjdk.java.net Cc: aarch64-port-dev at openjdk.java.net Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support HI, Thanks for the hint! Sounds good, I'll give it a try. ... I removed some emails to reduce traffic. Best regads, Goetz. -----Original Message----- From: Ningsheng Jian Sent: Montag, 7. September 2020 08:51 To: Lindenmaier, Goetz ; Andrew Dinn ; hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir Ivanov ; Erik ?sterlund Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin Subject: RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support Hi Goetz, I am sorry about that and thanks for helping to identify the issue. As I cannot reproduce this on x86 and AArch64, a quick guess is that I may have missed the code like: diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index 2f4047dfa8e..f7d2f5b2320 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -62,12 +62,14 @@ const Type::TypeInfo Type::_type_info[Type::lastype] = { { Bad, T_ARRAY, "array:", false, Node::NotAMachineReg, relocInfo::none }, // Array #if defined(PPC64) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ #elif defined(S390) + { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // VectorA. { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX Could you please help to have a try? Thanks, Ningsheng > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Monday, September 7, 2020 2:35 PM > To: Ningsheng Jian ; Andrew Dinn ; > hotspot-compiler-dev at openjdk.java.net; build-dev at openjdk.java.net; Vladimir > Ivanov ; Erik ?sterlund > Cc: aarch64-port-dev at openjdk.java.net; Doerr, Martin > Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support > > Hi > > Since that change was pushed, the vm crashes in the build: > > To suppress the following error report, specify this argument # after -XX: or > in .hotspotrc: SuppressErrorAt=/type.cpp:1022 # # A fatal error has been detected by > the Java Runtime Environment: > # > # Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, tid=28983 # > assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # JRE > version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1- > internal+0-adhoc.openjdk.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0- > adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) # > Problematic frame: > # V [libjvm.so+0x1bfe22c] Type::xdual() const+0xfc # # No core dump will be written. > Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" > before starting Java again > > Do you have an ad-hoc idea of the problem? > > I locally backed out the change which fixed the issue. > > Best regards, > Goetz. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From fzhinkin at openjdk.java.net Mon Sep 7 11:22:52 2020 From: fzhinkin at openjdk.java.net (Filipp Zhinkin) Date: Mon, 7 Sep 2020 11:22:52 GMT Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash Message-ID: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such accesses should be skipped on corresponding platforms. ------------- Commit messages: - 8251152: Skip Test8202414 on CPUs missing unaligned memory accesses support Changes: https://git.openjdk.java.net/jdk/pull/48/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=48&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251152 Stats: 15 lines in 2 files changed: 11 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/48.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/48/head:pull/48 PR: https://git.openjdk.java.net/jdk/pull/48 From filipp.zhinkin at gmail.com Mon Sep 7 11:26:49 2020 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Mon, 7 Sep 2020 14:26:49 +0300 Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: <88E929A3-E9C5-4BC2-B4E4-9AC8F046623D@oracle.com> <2102D49A-5672-4695-A271-3136EB299C3F@oracle.com> Message-ID: Hi Christoph, thanks for the review! I've created pull request on github for this change: https://github.com/openjdk/jdk/pull/48 Webrev: https://openjdk.github.io/cr/?repo=jdk&pr=48&range=00 Thanks, Filipp. On Sat, 5 Sep 2020 at 23:49, Langer, Christoph wrote: > Hi, > > > > updated webrev: > > http://cr.openjdk.java.net/~bulasevich/fzhinkin/8251152/webrev.2/ > > > (I've > > updated years in the copyright). > > > > I don't know the format used in SAP copyright notices, so I can't really > tell if > > you should have added ", 2020" there, this I'd suggest you to seek > comments > > from SAP folks. > > As for the SAP copyright, the webrev looks perfect ?? For the SAP > copyright notice, we don't append a comma after the trailing year. > > Best regards > Christoph > > From lutz.schmidt at sap.com Mon Sep 7 12:40:11 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 7 Sep 2020 12:40:11 +0000 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> Message-ID: <6316DC4C-A2FB-4B0C-93D9-61D2243EF369@sap.com> Hi Tobias, I used the brackets for "optical separation", to emphasize what was added. There is no "hidden purpose". The comment attempts to explain why these fields are first initialized with NULL/false, although they are filled with "more meaningful" data shortly after. I will remove the brackets and try to express my thoughts more clearly by rephrasing the comment. May I then regard this change as reviewed? Thank you! Lutz ?On 04.09.20, 10:58, "Tobias Hartmann" wrote: Hi Lutz, hard to review but looks reasonable to me. In compiledMethod.cpp:72, the brackets and comment are confusing. Why should these fields be treated specially? Best regards, Tobias On 26.08.20 17:20, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected. > > There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes: > > 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks. > > 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8219586 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/ > > This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. > > Thank you! > Lutz > From martin.doerr at sap.com Mon Sep 7 12:54:26 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Sep 2020 12:54:26 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: <73F65D4B-5970-41A3-B678-1F947BEE7392@sap.com> References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> <73F65D4B-5970-41A3-B678-1F947BEE7392@sap.com> Message-ID: Hi Lutz, thanks for the explanations. I'm fine with it. Best regards, Martin > -----Original Message----- > From: Schmidt, Lutz > Sent: Freitag, 28. August 2020 18:01 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8250635: MethodArityHistogram should use > Compile_lock in favour of fancy checks > > Hi Martin, > > good question. > > Originally, the iteration was only protected by the CodeCache_lock. That > proved insufficient: the CodeCache_lock only protects against structural > changes in the CodeCache. The contents of the individual code blobs can be, > and is, modified independently. > > By acquiring the Compile_lock, those modifications are blocked while > iterating. > > With the help of a consistency check (not contained in the RFR code), it was > found that there is a slight chance to see the case (nm != NULL) && > (method() == NULL). That chance is eliminated by adding the is_alive() check > which is less invasive compared to adding a new nmethods_do() variant. > > Regards, > Lutz > > ?On 28.08.20, 17:02, "Doerr, Martin" wrote: > > Hi Lutz, > > just for my understanding: What exactly are we protecting against by > holding Compile_lock? > Is it for concurrent initialization or concurrent unloading? > > Note that it's also possible to iterate only over alive nmethods: > NMethodIterator iter(NMethodIterator::only_alive); > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > retn at openjdk.java.net> On Behalf Of Schmidt, Lutz > > Sent: Mittwoch, 26. August 2020 17:19 > > To: hotspot-compiler-dev at openjdk.java.net > > Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use > > Compile_lock in favour of fancy checks > > > > Dear all, > > > > may I please request reviews for this small enhancement? Instead of > calling a > > method doing complicated and fancy (hard to understand) checks, the > > iteration over all nmethods is now protected by holding the Compile_lock > in > > addition to the CodeCache_lock. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > > > Thank you! > > Lutz > > > From iignatyev at openjdk.java.net Mon Sep 7 12:55:44 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 7 Sep 2020 12:55:44 GMT Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> References: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> Message-ID: On Mon, 7 Sep 2020 11:14:04 GMT, Filipp Zhinkin wrote: > Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such > accesses should be skipped on corresponding platforms. Marked as reviewed by iignatyev (Reviewer). test/hotspot/jtreg/compiler/c2/Test8202414.java line 48: > 46: // alignment check failure on such CPUs. > 47: if (!jdk.internal.misc.Unsafe.getUnsafe().unalignedAccess()) { > 48: throw new SkippedException( nit: I don't think we need a line break here. ------------- PR: https://git.openjdk.java.net/jdk/pull/48 From fzhinkin at openjdk.java.net Mon Sep 7 12:59:02 2020 From: fzhinkin at openjdk.java.net (Filipp Zhinkin) Date: Mon, 7 Sep 2020 12:59:02 GMT Subject: RFR: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: References: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> Message-ID: On Mon, 7 Sep 2020 12:53:03 GMT, Igor Ignatyev wrote: >> Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such >> accesses should be skipped on corresponding platforms. > > test/hotspot/jtreg/compiler/c2/Test8202414.java line 48: > >> 46: // alignment check failure on such CPUs. >> 47: if (!jdk.internal.misc.Unsafe.getUnsafe().unalignedAccess()) { >> 48: throw new SkippedException( > > nit: I don't think we need a line break here. Previously, all lines within this file were no longer than 80 chars, so I decided to follow the same restriction. ------------- PR: https://git.openjdk.java.net/jdk/pull/48 From shade at openjdk.java.net Mon Sep 7 13:26:48 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Sep 2020 13:26:48 GMT Subject: RFR: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java In-Reply-To: References: Message-ID: On Mon, 7 Sep 2020 09:59:40 GMT, Evgeny Nikitin wrote: > **Problem explanation** > > 1. The stress test [uses 0.8 of allowed running > time](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40) > + 10 seconds, thus exceeding the limit; 2. 10 additional seconds are [given by the > VM](https://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388) for stuck > compiler threads to finish; 3. Compiler threads aren't progressing due to the compilation being blocked; 4. Compilation > is blocked via WhiteBox by the test, its lockUnlock thread; 5. The lockUnlock thread doesn't unblock the compilation > because [it is a > daemon](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59), > it is stopped in the middle of the sleep. > **Solution** > > Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock > method into a Thread descendant, which got joined in the end. Looks good, modulo the minor issues. test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 82: > 80: > 81: public class OverloadCompileQueueTest implements Runnable { > 82: private static final LockUnlockThread lockUnlockThread = new LockUnlockThread(); `static final` field should be `LOCK_UNLOCK_THREAD`? test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 74: > 72: } > 73: } catch (InterruptedException e) { > 74: throw new Error("TESTBUG: lockUnlocker thread was unexpectedly interrupted", e); Since `lockUnlocker` method is gone, the message should be "LockUnlockThread was unexpectedly interrupted"? Also, throwing the error from this thread would not be rethrown with `join` later. If we care about this error, should we instead store it into `static final` field here, check it after `join`, and rethrow if ` != null`. The old code seems to have the same problem, though, so we can keep ignoring it. test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 114: > 112: > 113: lockUnlockThread.isActive = false; > 114: lockUnlockThread.join(); So this now relies on `lockUnlockThread` unblocking from its `Thread.sleep`-s, and then `joining` here? We could wait here for up to `MAX_SLEEP` seconds then? Maybe we should command `lockUnlockThread.interrupt()` before `join()` to make the test a tad faster? ------------- PR: https://git.openjdk.java.net/jdk/pull/46 From richard.reingruber at sap.com Mon Sep 7 14:09:11 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 7 Sep 2020 14:09:11 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> Message-ID: Hi, I would like to close the review of this change. It has received a lot of helpful feedback during the process and 2 full Reviews. Thanks everybody! I'm planning to push it this week on Thursday as solution for JBS items: https://bugs.openjdk.java.net/browse/JDK-8227745 https://bugs.openjdk.java.net/browse/JDK-8233915 Version to be pushed: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ Hope to get my GIT/Skara setup going until then... :) Thanks, Richard. -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Mittwoch, 2. September 2020 23:27 To: Robbin Ehn ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Robin, > On 2020-09-02 15:48, Reingruber, Richard wrote: > > Hi Robbin, > > > > // taking the discussion back to the mailing lists > > > > > I still don't understand why you don't deoptimize the objects inside the > > > handshake/safepoint instead? > So for handshakes using asynch handshake and allowing blocking inside > would fix that. (future fix, I'm working on that now) Just to make it clear: I'm not fond of the extra suspension mechanism currently used for JDK-8227745 either. I want to get rid of it and I will work on it. Asynch handshakes (JDK-8238761) could be a replacement for it. At least I think they can be used to suspend the target thread. > For safepoint, since we have suspended all threads, ~'safepointed them' > with a JavaThread, you _could_ just execute the action directly (e.g. > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > safely suspended until the destructor of EB, no? Yes, this should be possible. This would be an advanced change though. I would like EscapeBarriers to be a no-op and fall back to current implementation, if C2-EscapeAnalysis/Graal are disabled. > So I suggest future work to instead just execute the safepoint with the > requesting JT instead of having a this special safepoiting mechanism. > Since you are missing above functionality I see why you went this way. > If you need to push it, it's fine by me. We will work on further improvements. Top of the list would be eliminating the extra suspend mechanism. The implementation has matured for more than 12 months now [1]. It's been tested extensively at SAP over that time and passed also extended testing at Oracle kindly conducted by Vladimir Kozlov. We've got two full Reviews and incorporated extensive feedback from a number of OpenJDK Reviewers (including you, thanks!). Based on that I reckon we're good to push the change as enhancement (JDK-8227745) and bug fix (JDK-8233915). > Thanks for explaining once again :) Pleasure :) Thanks, Richard. [1] http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028729.html -----Original Message----- From: Robbin Ehn Sent: Mittwoch, 2. September 2020 16:54 To: Reingruber, Richard ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, On 2020-09-02 15:48, Reingruber, Richard wrote: > Hi Robbin, > > // taking the discussion back to the mailing lists > > > I still don't understand why you don't deoptimize the objects inside the > > handshake/safepoint instead? So for handshakes using asynch handshake and allowing blocking inside would fix that. (future fix, I'm working on that now) For safepoint, since we have suspended all threads, ~'safepointed them' with a JavaThread, you _could_ just execute the action directly (e.g. skipping VM_HeapWalkOperation safepoint) since they are suppose to be safely suspended until the destructor of EB, no? So I suggest future work to instead just execute the safepoint with the requesting JT instead of having a this special safepoiting mechanism. Since you are missing above functionality I see why you went this way. If you need to push it, it's fine by me. Thanks for explaining once again :) /Robbin > > This is unfortunately not possible. Deoptimizing objects includes reallocating > scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This > cannot be done at a safepoint or handshake. > > 1. The vm thread is not allowed to allocate on the java heap > See for instance assertions in ParallelScavengeHeap::mem_allocate() > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKya9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ > > This is not easy to change, I suppose, because it will be difficult to gc if > necessary. > > 2. Using a direct handshake would not work either. The problem there is again > gc. Let J be the JavaThread that is executing the direct handshake. The vm > would deadlock if the vm thread waits for J to execute the closure of a > handshake-all and J waits for the vm thread to execute a gc vm operation. > Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 > > Cheers, Richard. > > -----Original Message----- > From: Robbin Ehn > Sent: Mittwoch, 2. September 2020 13:56 > To: Reingruber, Richard > Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi, > > I still don't understand why you don't deoptimize the objects inside the > handshake/safepoint instead? > > E.g. > > JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code > from: > eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the > stack, so: > > void > GetOwnedMonitorInfoClosure::do_thread(Thread *target) { > assert(target->is_Java_thread(), "just checking"); > JavaThread *jt = (JavaThread *)target; > > if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { > _result = > ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, > _owned_monitors_list); > } else { > _result = JVMTI_ERROR_OUT_OF_MEMORY; > } > } > } > > Why try 'suspend' the thread first? > > > When we de-optimize all threads why not just in the following safepoint? > E.g. > VM_HeapWalkOperation::doit() { > + EscapeBarrier::deoptimize_objects_all_threads(); > ... > } > > Thanks, Robbin > > From fzhinkin at openjdk.java.net Mon Sep 7 15:36:03 2020 From: fzhinkin at openjdk.java.net (Filipp Zhinkin) Date: Mon, 7 Sep 2020 15:36:03 GMT Subject: Integrated: 8251152: ARM32: jtreg c2 Test8202414 test crash In-Reply-To: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> References: <3wWzppMEFKf-SJW9ZxQ6oKY7PLRlWUFEWOrIhGFVvt4=.0298938f-7781-4696-9891-6d61b78a31af@github.com> Message-ID: On Mon, 7 Sep 2020 11:14:04 GMT, Filipp Zhinkin wrote: > Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such > accesses should be skipped on corresponding platforms. This pull request has now been integrated. Changeset: 70d5cac9 Author: Filipp Zhinkin URL: https://git.openjdk.java.net/jdk/commit/70d5cac9 Stats: 15 lines in 2 files changed: 1 ins; 11 del; 3 mod 8251152: ARM32: jtreg c2 Test8202414 test crash Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such accesses should be skipped on corresponding platforms. Reviewed-by: iignatyev, clanger ------------- PR: https://git.openjdk.java.net/jdk/pull/48 From lutz.schmidt at sap.com Mon Sep 7 15:39:54 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 7 Sep 2020 15:39:54 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> <73F65D4B-5970-41A3-B678-1F947BEE7392@sap.com> Message-ID: Hi Martin, thank you for your review! Best, Lutz ?On 07.09.20, 14:54, "Doerr, Martin" wrote: Hi Lutz, thanks for the explanations. I'm fine with it. Best regards, Martin > -----Original Message----- > From: Schmidt, Lutz > Sent: Freitag, 28. August 2020 18:01 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8250635: MethodArityHistogram should use > Compile_lock in favour of fancy checks > > Hi Martin, > > good question. > > Originally, the iteration was only protected by the CodeCache_lock. That > proved insufficient: the CodeCache_lock only protects against structural > changes in the CodeCache. The contents of the individual code blobs can be, > and is, modified independently. > > By acquiring the Compile_lock, those modifications are blocked while > iterating. > > With the help of a consistency check (not contained in the RFR code), it was > found that there is a slight chance to see the case (nm != NULL) && > (method() == NULL). That chance is eliminated by adding the is_alive() check > which is less invasive compared to adding a new nmethods_do() variant. > > Regards, > Lutz > > On 28.08.20, 17:02, "Doerr, Martin" wrote: > > Hi Lutz, > > just for my understanding: What exactly are we protecting against by > holding Compile_lock? > Is it for concurrent initialization or concurrent unloading? > > Note that it's also possible to iterate only over alive nmethods: > NMethodIterator iter(NMethodIterator::only_alive); > > Best regards, > Martin > > > > -----Original Message----- > > From: hotspot-compiler-dev > retn at openjdk.java.net> On Behalf Of Schmidt, Lutz > > Sent: Mittwoch, 26. August 2020 17:19 > > To: hotspot-compiler-dev at openjdk.java.net > > Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use > > Compile_lock in favour of fancy checks > > > > Dear all, > > > > may I please request reviews for this small enhancement? Instead of > calling a > > method doing complicated and fancy (hard to understand) checks, the > > iteration over all nmethods is now protected by holding the Compile_lock > in > > addition to the CodeCache_lock. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > > > Thank you! > > Lutz > > > From enikitin at openjdk.java.net Mon Sep 7 20:56:01 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 7 Sep 2020 20:56:01 GMT Subject: RFR: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java [v2] In-Reply-To: References: Message-ID: <1PkUbSyLD4FaTjWXXmM50l9EpHKgbD7IX77w1avwSCo=.3e7eb1f9-b4dd-4a60-bc71-a302e9b9e71f@github.com> On Mon, 7 Sep 2020 13:11:07 GMT, Aleksey Shipilev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Interrupt LockUnlockThread before test finish, fix coding style issues. > > test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 82: > >> 80: >> 81: public class OverloadCompileQueueTest implements Runnable { >> 82: private static final LockUnlockThread lockUnlockThread = new LockUnlockThread(); > > `static final` field should be `LOCK_UNLOCK_THREAD`? Fixed by moving the static field into the 'main' method (thus making it function-local) > test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 74: > >> 72: } >> 73: } catch (InterruptedException e) { >> 74: throw new Error("TESTBUG: lockUnlocker thread was unexpectedly interrupted", e); > > Since `lockUnlocker` method is gone, the message should be "LockUnlockThread was unexpectedly interrupted"? > > Also, throwing the error from this thread would not be rethrown with `join` later. If we care about this error, should > we instead store it into `static final` field here, check it after `join`, and rethrow if ` != null`. The old code > seems to have the same problem, though, so we can keep ignoring it. Fixed the message. Regarding the error - it is an Error, not a Throwable, it gets noticed and recorded by the jtreg (I checked that). > test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 114: > >> 112: >> 113: lockUnlockThread.isActive = false; >> 114: lockUnlockThread.join(); > > So this now relies on `lockUnlockThread` unblocking from its `Thread.sleep`-s, and then `joining` here? We could wait > here for up to `MAX_SLEEP` seconds then? Maybe we should command `lockUnlockThread.interrupt()` before `join()` to make > the test a tad faster? Added an interrupt, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/46 From enikitin at openjdk.java.net Mon Sep 7 20:55:56 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 7 Sep 2020 20:55:56 GMT Subject: RFR: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java [v2] In-Reply-To: References: Message-ID: > **Problem explanation** > > 1. The stress test [uses 0.8 of allowed running > time](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40) > + 10 seconds, thus exceeding the limit; 2. 10 additional seconds are [given by the > VM](https://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388) for stuck > compiler threads to finish; 3. Compiler threads aren't progressing due to the compilation being blocked; 4. Compilation > is blocked via WhiteBox by the test, its lockUnlock thread; 5. The lockUnlock thread doesn't unblock the compilation > because [it is a > daemon](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59), > it is stopped in the middle of the sleep. > **Solution** > > Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock > method into a Thread descendant, which got joined in the end. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Interrupt LockUnlockThread before test finish, fix coding style issues. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/46/files - new: https://git.openjdk.java.net/jdk/pull/46/files/0651702e..ca6694b6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=46&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=46&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/46.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/46/head:pull/46 PR: https://git.openjdk.java.net/jdk/pull/46 From shade at openjdk.java.net Tue Sep 8 06:23:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Sep 2020 06:23:02 GMT Subject: RFR: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java [v2] In-Reply-To: References: Message-ID: On Mon, 7 Sep 2020 20:55:56 GMT, Evgeny Nikitin wrote: >> **Problem explanation** >> >> 1. The stress test [uses 0.8 of allowed running >> time](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40) >> + 10 seconds, thus exceeding the limit; 2. 10 additional seconds are [given by the >> VM](https://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388) for stuck >> compiler threads to finish; 3. Compiler threads aren't progressing due to the compilation being blocked; 4. Compilation >> is blocked via WhiteBox by the test, its lockUnlock thread; 5. The lockUnlock thread doesn't unblock the compilation >> because [it is a >> daemon](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59), >> it is stopped in the middle of the sleep. >> **Solution** >> >> Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock >> method into a Thread descendant, which got joined in the end. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Interrupt LockUnlockThread before test finish, fix coding style issues. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/46 From enikitin at openjdk.java.net Tue Sep 8 08:27:41 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 8 Sep 2020 08:27:41 GMT Subject: Integrated: 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java In-Reply-To: References: Message-ID: On Mon, 7 Sep 2020 09:59:40 GMT, Evgeny Nikitin wrote: > **Problem explanation** > > 1. The stress test [uses 0.8 of allowed running > time](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40) > + 10 seconds, thus exceeding the limit; 2. 10 additional seconds are [given by the > VM](https://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388) for stuck > compiler threads to finish; 3. Compiler threads aren't progressing due to the compilation being blocked; 4. Compilation > is blocked via WhiteBox by the test, its lockUnlock thread; 5. The lockUnlock thread doesn't unblock the compilation > because [it is a > daemon](https://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59), > it is stopped in the middle of the sleep. > **Solution** > > Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock > method into a Thread descendant, which got joined in the end. This pull request has now been integrated. Changeset: 2cceeedf Author: Evgeny Nikitin Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/2cceeedf Stats: 50 lines in 1 file changed: 15 ins; 32 del; 3 mod 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/46 From aph at redhat.com Tue Sep 8 09:12:05 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Sep 2020 10:12:05 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: On 28/08/2020 17:19, Doerr, Martin wrote: > 1. Some people assume that all of Oracle's 11u changes should get > integrated into the open version. > 2. Others only want to take them on demand with a good reason. > > I think there are good arguments for and against both ones. > Personally, I think approach 1. is better at the beginning of an > updates branch while it may be reasonable to switch at some point of > time. > At the moment, I still prefer to stay in sync with Oracle as far as > we can. > > Regarding this change, I don't see a high risk. > What it basically does is that it reuses better code which is > already used by C2 for C1 and JVMCI compilers. So there's no > substantial new code. > It's tested by GraalVM and by our internal testing. There are no > known issues with it. > > So I'd rather vote for taking it. OK for this particular change. Please go ahead. On the wider point: A while ago I announced that we should not continue to independently approve backports that had already been approved by Oracle. My reasoning was that they'd done the work of balancing risk and reward, and that for us to do it again when the decision would almost inevitably be the same was not a good use of our time. After some very noisy protests which insisted that this project must continue to approve backports independently, I relented. But there is no point retaining that right if we're never going to use it! We should keep a close watch on Oracle backports for a while to try to understand what their criteria are, and whether those criteria are a good fit for our mission. If we see a significant number of minor performance tweaks and "cleanups" we should reconsider our policy. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From goetz.lindenmaier at sap.com Tue Sep 8 12:01:38 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 8 Sep 2020 12:01:38 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: Hi > On the wider point: > > A while ago I announced that we should not continue to independently > approve backports that had already been approved by Oracle. My > reasoning was that they'd done the work of balancing risk and reward, > and that for us to do it again when the decision would almost > inevitably be the same was not a good use of our time. After some very > noisy protests which insisted that this project must continue to > approve backports independently, I relented. But there is no point > retaining that right if we're never going to use it! I would appreciate this a lot, as it would reduce the effort of downporting considerably. Thanks! > We should keep a close watch on Oracle backports for a while to try to > understand what their criteria are, and whether those criteria are a > good fit for our mission. If we see a significant number of minor > performance tweaks and "cleanups" we should reconsider our policy. One possibility to watch what Oracle is doing is that maintainers have an eye on the according filters of 11.0.x-oracle changes and flag them jdk11u-fix-no ad-hoc, before someone starts to actually work and downport the bug. Others, that also have an opinion on this, could also watch the filter and then add comments why they recommend not to downport the change. Then again, if someone has a concrete interest to downport, besides the motivation that Oracle downported it, this can be discussed. In the case of this change, it required a close look at the coding, which is why I had asked Martin to address it. Martin is an expert in this area. The objections came just after he had completed his work. I's really not helpful if a change is turned down just after all the work is done. This could be avoided by above approach. Great it is approved. Best regards, Goetz. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jptatton at amazon.com Tue Sep 8 12:02:14 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Tue, 8 Sep 2020 12:02:14 +0000 Subject: JDK-8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <0cbe7d8f594349b59504c42f89c6f268@EX13D46EUB003.ant.amazon.com> Message-ID: Hi Andrew, thank you for taking the time to review this. Since we have now moved to git, I have raised a new PR for this RFR: https://github.com/openjdk/jdk/pull/71 https://bugs.openjdk.java.net/browse/JDK-8173585 I have improved the micro benchmark in the ways which you and others have requested, namely: + The benchmark is now included in test/micro/org/openjdk/bench/java/lang as StringIndexOfChar (as advised by my colleagues here at AWS; Xin Liu and Volker Simonis). + Times are now in nanoseconds. + Terminating characters ('a') are in 66.666% of tested strings. + I have added four new benchmarks which operate on a random length strings (32 characters being the average) of type either StringLatin1 of StringUTF16 and call indexOf(char) or indexOf(String). I have included below the output of these four tests below: Without the new StringLatin1 indexOf(char) intrinsic: Benchmark Mode Cnt Score Error Units IndexOfBenchmark.latin1_mixed_char avgt 5 26389.129 ? 182.581 ns/op IndexOfBenchmark.utf16_mixed_char avgt 5 17885.383 ? 435.933 ns/op With the new StringLatin1 indexOf(char) intrinsic: Benchmark Mode Cnt Score Error Units IndexOfBenchmark.latin1_mixed_char avgt 5 17875.185 ? 407.716 ns/op IndexOfBenchmark.utf16_mixed_char avgt 5 18292.802 ? 167.306 ns/op The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when running on ARM. Regards, Jason -----Original Message----- From: Andrew Haley Sent: 05 September 2020 15:47 To: Tatton, Jason ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net Subject: RE: [EXTERNAL] JDK-8173585: Intrinsify StringLatin1.indexOf(char) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 03/09/2020 22:28, Tatton, Jason wrote: > > JMH Benchmark results: > ==================== > The benchmarks examine the 3 codepaths for StringLatin1 and > StringUTF16. Here are the results for Intel x86 (ARM is similar): > > FYI, String lengths in characters (1byte for Latin1, 2bytes for UTF16): > Latin1 UTF16 > Short: 15 7 > SSE4: 16 8 > AVX2: 32 16 > > Without StringLatin1 indexofchar intrinsic: > Benchmark Mode Cnt Score Error Units > IndexOfBenchmark.latin1_AVX2_String thrpt 5 121781.424 ? 355.085 ops/s > IndexOfBenchmark.latin1_AVX2_char thrpt 5 46060.612 ? 151.274 ops/s > IndexOfBenchmark.latin1_SSE4_String thrpt 5 197339.146 ? 90.333 ops/s > IndexOfBenchmark.latin1_SSE4_char thrpt 5 61401.204 ? 426.761 ops/s > IndexOfBenchmark.latin1_Short_String thrpt 5 175389.355 ? 294.976 ops/s > IndexOfBenchmark.latin1_Short_char thrpt 5 60759.868 ? 124.349 ops/s > IndexOfBenchmark.utf16_AVX2_String thrpt 5 123601.020 ? 111.981 ops/s > IndexOfBenchmark.utf16_AVX2_char thrpt 5 141116.832 ? 380.489 ops/s > IndexOfBenchmark.utf16_SSE4_String thrpt 5 178136.762 ? 143.227 ops/s > IndexOfBenchmark.utf16_SSE4_char thrpt 5 181430.649 ? 120.097 ops/s > IndexOfBenchmark.utf16_Short_String thrpt 5 158301.361 ? 182.738 ops/s > IndexOfBenchmark.utf16_Short_char thrpt 5 84876.919 ? 247.769 ops/s > > With StringLatin1 indexofchar intrinsic: > Benchmark Mode Cnt Score Error Units > IndexOfBenchmark.latin1_AVX2_String thrpt 5 113621.676 ? 68.235 ops/s > IndexOfBenchmark.latin1_AVX2_char thrpt 5 177757.909 ? 727.308 ops/s > IndexOfBenchmark.latin1_SSE4_String thrpt 5 180529.049 ? 57.356 ops/s > IndexOfBenchmark.latin1_SSE4_char thrpt 5 235087.776 ? 457.024 ops/s > IndexOfBenchmark.latin1_Short_String thrpt 5 165914.990 ? 329.024 ops/s > IndexOfBenchmark.latin1_Short_char thrpt 5 53989.544 ? 65.393 ops/s > IndexOfBenchmark.utf16_AVX2_String thrpt 5 107632.783 ? 446.272 ops/s > IndexOfBenchmark.utf16_AVX2_char thrpt 5 143131.734 ? 159.944 ops/s > IndexOfBenchmark.utf16_SSE4_String thrpt 5 169882.703 ? 1024.367 ops/s > IndexOfBenchmark.utf16_SSE4_char thrpt 5 175693.972 ? 775.423 ops/s > IndexOfBenchmark.utf16_Short_String thrpt 5 163595.993 ? 225.089 ops/s > IndexOfBenchmark.utf16_Short_char thrpt 5 90126.154 ? 365.642 ops/s > > We can see above that indexOf(char) now behaves similarly between > StringUTF16 and StringLatin1. This is confusing. Can you please make the times nanoseconds? It's quite a struggle trying to think in reciprocal units for these very low-level benchmarks. Maybe it's just me. There are 1000 strings of length 32 bytes, so I guess that makes everything fit in L1, just. I guess that was the idea? > //'a is never present in rnd string So you only benchmarks searches that always fail? I don't get that at all. I'd also vary string lengths. 32 characters is a good average, so you should have a decent spread of different lengths, average over the whole set 32. I'd place a terminating character randomly in *at least* 50% of the strings. I think that would be much more representative. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Sep 8 12:06:08 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Sep 2020 14:06:08 +0200 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: <6316DC4C-A2FB-4B0C-93D9-61D2243EF369@sap.com> References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> <6316DC4C-A2FB-4B0C-93D9-61D2243EF369@sap.com> Message-ID: <0e5868b3-6eb8-8036-a908-b3b5502cc6e2@oracle.com> Hi Lutz, On 07.09.20 14:40, Schmidt, Lutz wrote: > Hi Tobias, > > I used the brackets for "optical separation", to emphasize what was added. There is no "hidden purpose". > > The comment attempts to explain why these fields are first initialized with NULL/false, although they are filled with "more meaningful" data shortly after. Okay, thanks for the explanation. > I will remove the brackets and try to express my thoughts more clearly by rephrasing the comment. > > May I then regard this change as reviewed? Yes, looks good to me. Best regards, Tobias From neugens at redhat.com Tue Sep 8 12:27:41 2020 From: neugens at redhat.com (Mario Torre) Date: Tue, 8 Sep 2020 14:27:41 +0200 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: On Tue, Sep 8, 2020 at 2:02 PM Lindenmaier, Goetz wrote: > In the case of this change, it required a close look at the coding, > which is why I had asked Martin to address it. Martin is an > expert in this area. The objections came just after he had completed > his work. I's really not helpful if a change is turned down just after > all the work is done. This could be avoided by above approach. > Great it is approved. The way I see it is that although the maintainer can say no at any time, since they are not reviewing the patch their "no" usually happens later in the game anyway, so some amount of work has been done, I think the backporter (and reviewer) should be responsible to apply the first line of filtering. But this brings up an interesting point, we rarely discuss patches, your proposal seems to be a step in the right direction since it forces maintainer's attention earlier, but I think there's an element of planification still missing, also, we don't want to overload the mantainer's time, this needs to be a share effort from the ground up. Regarding the backporting in general, I do think we're backporting too much, I think we had a slowdown in 8u which is good, but 11u is still getting a lot of attention, we should slow the pace there too, imho. Cheers, Mario -- Mario Torre Associate Manager, Software Engineering Red Hat GmbH 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898 From aph at redhat.com Tue Sep 8 13:18:17 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Sep 2020 14:18:17 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: <9ffe38f1-85dc-c2fd-613f-3a97f7354a32@redhat.com> On 08/09/2020 13:01, Lindenmaier, Goetz wrote: > One possibility to watch what Oracle is doing is that maintainers have > an eye on the according filters of 11.0.x-oracle changes and flag them > jdk11u-fix-no ad-hoc, before someone starts to actually work and > downport the bug. Looks good to me. But we'd first have to agree on the principle of not backporting some patches, and the criteria by which we'd decide. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From thartmann at openjdk.java.net Tue Sep 8 14:13:44 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 8 Sep 2020 14:13:44 GMT Subject: RFR: 8252916: Newline in object field values list of ScopeDesc should be removed Message-ID: Given the following test: public class Test { static class MyClass { Object o1 = null; Object o2 = new Integer(42); } static Object test(boolean trap) { MyClass obj = new MyClass(); if (trap) { } return obj.o1; } public static void main(String[] args) { for (int i = 0; i < 100_000; ++i) { test(false); } } } The ScopeDesc for the uncommon trap in C2 compiled 'test' is printed like this: ScopeDesc(pc=0x00007f52a5160144 offset=84): Test::test at 9 (line 11) reexecute=true Locals - l0: empty - l1: obj[52] Expression stack - @0: reg rbp [10],int Objects - 52: Test$MyClass NULL , stack[0],oop There should be no newline after "NULL". This is a regression from [JDK-8202171](https://bugs.openjdk.java.net/browse/JDK-8202171) in JDK 12. The fix is to no print a new line in to be consistent with 'print_value_on'. Thanks, Tobias ------------- Commit messages: - 8252916: Newline in object field values list of ScopeDesc should be removed Changes: https://git.openjdk.java.net/jdk/pull/75/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=75&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252916 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/75.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/75/head:pull/75 PR: https://git.openjdk.java.net/jdk/pull/75 From erik.osterlund at oracle.com Tue Sep 8 14:31:37 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 8 Sep 2020 16:31:37 +0200 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> Message-ID: <5955b523-9e75-7b6f-cbc0-21c1361f8754@oracle.com> Hi Lutz, This looks great, thanks for fixing this! Thanks, /Erik On 2020-08-26 17:20, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected. > > There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes: > > 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks. > > 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8219586 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/ > > This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. > > Thank you! > Lutz > From vlivanov at openjdk.java.net Tue Sep 8 14:31:43 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 8 Sep 2020 14:31:43 GMT Subject: RFR: 8252916: Newline in object field values list of ScopeDesc should be removed In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 14:06:48 GMT, Tobias Hartmann wrote: > Given the following test: > > public class Test { > > static class MyClass { > Object o1 = null; > Object o2 = new Integer(42); > } > > static Object test(boolean trap) { > MyClass obj = new MyClass(); > if (trap) { } > return obj.o1; > } > > public static void main(String[] args) { > for (int i = 0; i < 100_000; ++i) { > test(false); > } > } > } > > The ScopeDesc for the uncommon trap in C2 compiled 'test' is printed like this: > > ScopeDesc(pc=0x00007f52a5160144 offset=84): > Test::test at 9 (line 11) reexecute=true > Locals > - l0: empty > - l1: obj[52] > Expression stack > - @0: reg rbp [10],int > Objects > - 52: Test$MyClass NULL > , stack[0],oop > > There should be no newline after "NULL". > > This is a regression from [JDK-8202171](https://bugs.openjdk.java.net/browse/JDK-8202171) in JDK 12. The fix is to no > print a new line in to be consistent with 'print_value_on'. > Thanks, > Tobias Looks good and trivial. ------------- PR: https://git.openjdk.java.net/jdk/pull/75 From thartmann at openjdk.java.net Tue Sep 8 14:37:37 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 8 Sep 2020 14:37:37 GMT Subject: RFR: 8252916: Newline in object field values list of ScopeDesc should be removed In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 14:29:19 GMT, Vladimir Ivanov wrote: >> Given the following test: >> >> public class Test { >> >> static class MyClass { >> Object o1 = null; >> Object o2 = new Integer(42); >> } >> >> static Object test(boolean trap) { >> MyClass obj = new MyClass(); >> if (trap) { } >> return obj.o1; >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 100_000; ++i) { >> test(false); >> } >> } >> } >> >> The ScopeDesc for the uncommon trap in C2 compiled 'test' is printed like this: >> >> ScopeDesc(pc=0x00007f52a5160144 offset=84): >> Test::test at 9 (line 11) reexecute=true >> Locals >> - l0: empty >> - l1: obj[52] >> Expression stack >> - @0: reg rbp [10],int >> Objects >> - 52: Test$MyClass NULL >> , stack[0],oop >> >> There should be no newline after "NULL". >> >> This is a regression from [JDK-8202171](https://bugs.openjdk.java.net/browse/JDK-8202171) in JDK 12. The fix is to no >> print a new line in to be consistent with 'print_value_on'. >> Thanks, >> Tobias > > Looks good and trivial. Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/75 From vlivanov at openjdk.java.net Tue Sep 8 14:40:52 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 8 Sep 2020 14:40:52 GMT Subject: RFR: 8252916: Newline in object field values list of ScopeDesc should be removed In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 14:06:48 GMT, Tobias Hartmann wrote: > Given the following test: > > public class Test { > > static class MyClass { > Object o1 = null; > Object o2 = new Integer(42); > } > > static Object test(boolean trap) { > MyClass obj = new MyClass(); > if (trap) { } > return obj.o1; > } > > public static void main(String[] args) { > for (int i = 0; i < 100_000; ++i) { > test(false); > } > } > } > > The ScopeDesc for the uncommon trap in C2 compiled 'test' is printed like this: > > ScopeDesc(pc=0x00007f52a5160144 offset=84): > Test::test at 9 (line 11) reexecute=true > Locals > - l0: empty > - l1: obj[52] > Expression stack > - @0: reg rbp [10],int > Objects > - 52: Test$MyClass NULL > , stack[0],oop > > There should be no newline after "NULL". > > This is a regression from [JDK-8202171](https://bugs.openjdk.java.net/browse/JDK-8202171) in JDK 12. The fix is to no > print a new line in to be consistent with 'print_value_on'. > Thanks, > Tobias Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/75 From daniel.daugherty at oracle.com Tue Sep 8 16:15:36 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 8 Sep 2020 12:15:36 -0400 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> Message-ID: <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Hi Richard, I haven't seen a review from anyone on the Serviceability team and I think you should get a review from them since JVM/TI is involved. Perhaps I missed it... Dan On 9/7/20 10:09 AM, Reingruber, Richard wrote: > Hi, > > I would like to close the review of this change. > > It has received a lot of helpful feedback during the process and 2 full > Reviews. Thanks everybody! > > I'm planning to push it this week on Thursday as solution for JBS items: > > https://bugs.openjdk.java.net/browse/JDK-8227745 > https://bugs.openjdk.java.net/browse/JDK-8233915 > > Version to be pushed: > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > > Hope to get my GIT/Skara setup going until then... :) > > Thanks, Richard. > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Reingruber, Richard > Sent: Mittwoch, 2. September 2020 23:27 > To: Robbin Ehn ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Robin, > >> On 2020-09-02 15:48, Reingruber, Richard wrote: >>> Hi Robbin, >>> >>> // taking the discussion back to the mailing lists >>> >>> > I still don't understand why you don't deoptimize the objects inside the >>> > handshake/safepoint instead? >> So for handshakes using asynch handshake and allowing blocking inside >> would fix that. (future fix, I'm working on that now) > Just to make it clear: I'm not fond of the extra suspension mechanism currently > used for JDK-8227745 either. I want to get rid of it and I will work on it. Asynch > handshakes (JDK-8238761) could be a replacement for it. At least I think they > can be used to suspend the target thread. > >> For safepoint, since we have suspended all threads, ~'safepointed them' >> with a JavaThread, you _could_ just execute the action directly (e.g. >> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >> safely suspended until the destructor of EB, no? > Yes, this should be possible. This would be an advanced change though. I would > like EscapeBarriers to be a no-op and fall back to current implementation, if > C2-EscapeAnalysis/Graal are disabled. > >> So I suggest future work to instead just execute the safepoint with the >> requesting JT instead of having a this special safepoiting mechanism. >> Since you are missing above functionality I see why you went this way. >> If you need to push it, it's fine by me. > We will work on further improvements. Top of the list would > be eliminating the extra suspend mechanism. > > The implementation has matured for more than 12 months now [1]. It's been tested > extensively at SAP over that time and passed also extended testing at Oracle > kindly conducted by Vladimir Kozlov. We've got two full Reviews and incorporated > extensive feedback from a number of OpenJDK Reviewers (including you, > thanks!). Based on that I reckon we're good to push the change as enhancement > (JDK-8227745) and bug fix (JDK-8233915). > >> Thanks for explaining once again :) > Pleasure :) > > Thanks, Richard. > > [1] http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028729.html > > -----Original Message----- > From: Robbin Ehn > Sent: Mittwoch, 2. September 2020 16:54 > To: Reingruber, Richard ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, > > On 2020-09-02 15:48, Reingruber, Richard wrote: >> Hi Robbin, >> >> // taking the discussion back to the mailing lists >> >> > I still don't understand why you don't deoptimize the objects inside the >> > handshake/safepoint instead? > So for handshakes using asynch handshake and allowing blocking inside > would fix that. (future fix, I'm working on that now) > > For safepoint, since we have suspended all threads, ~'safepointed them' > with a JavaThread, you _could_ just execute the action directly (e.g. > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > safely suspended until the destructor of EB, no? > > So I suggest future work to instead just execute the safepoint with the > requesting JT instead of having a this special safepoiting mechanism. > > Since you are missing above functionality I see why you went this way. > If you need to push it, it's fine by me. > > Thanks for explaining once again :) > > /Robbin > >> This is unfortunately not possible. Deoptimizing objects includes reallocating >> scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This >> cannot be done at a safepoint or handshake. >> >> 1. The vm thread is not allowed to allocate on the java heap >> See for instance assertions in ParallelScavengeHeap::mem_allocate() >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKya9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ >> >> This is not easy to change, I suppose, because it will be difficult to gc if >> necessary. >> >> 2. Using a direct handshake would not work either. The problem there is again >> gc. Let J be the JavaThread that is executing the direct handshake. The vm >> would deadlock if the vm thread waits for J to execute the closure of a >> handshake-all and J waits for the vm thread to execute a gc vm operation. >> Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 >> >> Cheers, Richard. >> >> -----Original Message----- >> From: Robbin Ehn >> Sent: Mittwoch, 2. September 2020 13:56 >> To: Reingruber, Richard >> Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents >> >> Hi, >> >> I still don't understand why you don't deoptimize the objects inside the >> handshake/safepoint instead? >> >> E.g. >> >> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code >> from: >> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the >> stack, so: >> >> void >> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { >> assert(target->is_Java_thread(), "just checking"); >> JavaThread *jt = (JavaThread *)target; >> >> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >> + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { >> _result = >> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, >> _owned_monitors_list); >> } else { >> _result = JVMTI_ERROR_OUT_OF_MEMORY; >> } >> } >> } >> >> Why try 'suspend' the thread first? >> >> >> When we de-optimize all threads why not just in the following safepoint? >> E.g. >> VM_HeapWalkOperation::doit() { >> + EscapeBarrier::deoptimize_objects_all_threads(); >> ... >> } >> >> Thanks, Robbin >> >> From sean.mullan at oracle.com Tue Sep 8 16:15:56 2020 From: sean.mullan at oracle.com (Sean Mullan) Date: Tue, 8 Sep 2020 12:15:56 -0400 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: <95965aeb-9d97-3b27-e684-967b6155eb34@redhat.com> References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> <95965aeb-9d97-3b27-e684-967b6155eb34@redhat.com> Message-ID: Since this change affects security code, please make sure you add security-dev at openjdk.java.net on any followup code reviews. Thanks, Sean On 9/1/20 10:44 AM, Andrew Haley wrote: > On 01/09/2020 11:53, Yangfei (Felix) wrote: >> Sure, I am happy if the original author of the assembly code or someone else from Linaro could help here. >> I wasn't aware there was such an requirement here given that assembly code is licensed under GPL. > > There sure is. All code must be contributed by its owner and put on the > cr.openjdk site. Especially GPL code. > >> Should we separate the patch into two parts: changes for the shared code part and the aarch64 port-specific changes? > > I think not. > From hohensee at amazon.com Tue Sep 8 16:33:50 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 8 Sep 2020 16:33:50 +0000 Subject: RFR 8239090: Improve CPU feature support in VM_version Message-ID: <2D081CC6-FCD4-4A31-BDC1-ECF7A4B60BAD@amazon.com> Thank you for the review, Igor. I did indeed define FEATURES_NAMES to be close to the flags enum definition. And, as a macro, it might be useful in some future context. I also defined four CPU_ enum values and strings per line so it?s easy to keep track of the correspondence. Anyone else up for a review please? I expect I?ll have to turn this into a PR though. Paul From: Igor Veresov Date: Friday, September 4, 2020 at 4:04 PM To: "Hohensee, Paul" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: RE: RFR 8239090: Improve CPU feature support in VM_version This looks good. Did you make FEATURES_NAMES a macro just so that it?s close to the flags enum? igor On Sep 4, 2020, at 10:39 AM, Hohensee, Paul > wrote: Slightly adjusted patch. http://cr.openjdk.java.net/~phh/8239090/webrev.02/ Thanks, Paul On 9/3/20, 3:47 PM, "hotspot-compiler-dev on behalf of Hohensee, Paul" wrote: Taking over from Eric... Thank you for the review, Igor. I took a completely different (and very old approach), however, and defined a method Abstract_VM_Version:: insert_features_names() that iterates over the feature flags set. If a feature bit is on, it appends to an output buffer a corresponding name string from an array indexed by the bit number. I've implemented it only for x86: using the mechanism for other platforms can be follow-on RFEs. I'd greatly appreciate a review. Webrev: http://cr.openjdk.java.net/~phh/8239090/webrev.00/ To add a feature bit, all one now has to do is add a CPU_ definition and corresponding name string in the FEATURES_NAMES macro. I've also included a few small changes to the x86 implementation beyond the above. 1. Unified the previous two bitset definitions into a single Feature_Flag enum and made it a uint64_t. 2. supports_tscinv_bit() referenced the CPU_TSCINV bit, which was a bit misleading, so added a new CPU_TSCINV_BIT mask and used it instead. 3. Repurposed CPU_TSCINV for supports_tscinv(), which was a "composite" property, but is now computed once in feature_flags(). 4. Made supports_clflushopt() and supports_clwb() common to both 32 and 64-bit rather than have 32-bit versions that always return 'false'. These bits are never set by the hardware on 32-bit, so no need for separate methods. 5. Renamed CPU_HV_PRESENT to CPU_HV to conform with the CPU_ bit naming scheme. "_PRESENT" is redundant and not used for any other CPU_ name, and the feature string uses "hv", not "hv_present". Added CPU_HV to vmStructs_x86.hpp and vmStructs_jvmci.cpp. Tested using -Xlog:os+cpu on my macbook pro: the same feature string is returned after the patch as before it. Suggestions for how to more thoroughly test the patch are very welcome. Thanks, Paul On 8/27/20, 6:22 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. That would be a good way to get rid of the ugly macro (we allow c++14 now). For example foo() in this example: enum E { a, b, c }; struct P { E _e; // key int _v; // value constexpr P(E e, int v) : _e(e), _v(v) { } }; constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; constexpr int match(E e) { for (const auto& p : ps) { if (p._e == e) { return p._v; } } return -1; } int foo(E e) { return match(e); } Will be compiled into: __Z3foo1E: ## @_Z3foo1E .cfi_startproc ## %bb.0: movl $-1, %eax cmpl $2, %edi ja LBB0_2 ## %bb.1: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movslq %edi, %rax leaq l_switch.table._Z3foo1E(%rip), %rcx movq (%rcx,%rax,8), %rax movl 4(%rax), %eax popq %rbp LBB0_2: retq .cfi_endproc ## -- End function .section __TEXT,__const .p2align 4 ## @_ZL2ps __ZL2ps: .long 0 ## 0x0 .long 57005 ## 0xdead .long 1 ## 0x1 .long 48879 ## 0xbeef .long 2 ## 0x2 .long 61453 ## 0xf00d .section __DATA,__const .p2align 3 ## @switch.table._Z3foo1E l_switch.table._Z3foo1E: .quad __ZL2ps .quad __ZL2ps+8 .quad __ZL2ps+16 igor On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 Yesterday I sent a wrong one, so I send it again, I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. Regards, Eric Chen From richard.reingruber at sap.com Tue Sep 8 16:45:15 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 8 Sep 2020 16:45:15 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Message-ID: Hi Dan, I'd be very happy about a review from somebody on the Serviceability team. I have asked for reviews many times (kindly I hope). And the change is for review for more than a year now. According to [1] I'd think all requirements to push are met already. But maybe I missed something? After renaming of methods in SafepointMechanism the change needs to be rebased (already done). I'll publish a pull request as soon as possible. Thanks, Richard. [1] https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change -----Original Message----- From: Daniel D. Daugherty Sent: Dienstag, 8. September 2020 18:16 To: Reingruber, Richard ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, I haven't seen a review from anyone on the Serviceability team and I think you should get a review from them since JVM/TI is involved. Perhaps I missed it... Dan On 9/7/20 10:09 AM, Reingruber, Richard wrote: > Hi, > > I would like to close the review of this change. > > It has received a lot of helpful feedback during the process and 2 full > Reviews. Thanks everybody! > > I'm planning to push it this week on Thursday as solution for JBS items: > > https://bugs.openjdk.java.net/browse/JDK-8227745 > https://bugs.openjdk.java.net/browse/JDK-8233915 > > Version to be pushed: > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > > Hope to get my GIT/Skara setup going until then... :) > > Thanks, Richard. > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Reingruber, Richard > Sent: Mittwoch, 2. September 2020 23:27 > To: Robbin Ehn ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Robin, > >> On 2020-09-02 15:48, Reingruber, Richard wrote: >>> Hi Robbin, >>> >>> // taking the discussion back to the mailing lists >>> >>> > I still don't understand why you don't deoptimize the objects inside the >>> > handshake/safepoint instead? >> So for handshakes using asynch handshake and allowing blocking inside >> would fix that. (future fix, I'm working on that now) > Just to make it clear: I'm not fond of the extra suspension mechanism currently > used for JDK-8227745 either. I want to get rid of it and I will work on it. Asynch > handshakes (JDK-8238761) could be a replacement for it. At least I think they > can be used to suspend the target thread. > >> For safepoint, since we have suspended all threads, ~'safepointed them' >> with a JavaThread, you _could_ just execute the action directly (e.g. >> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >> safely suspended until the destructor of EB, no? > Yes, this should be possible. This would be an advanced change though. I would > like EscapeBarriers to be a no-op and fall back to current implementation, if > C2-EscapeAnalysis/Graal are disabled. > >> So I suggest future work to instead just execute the safepoint with the >> requesting JT instead of having a this special safepoiting mechanism. >> Since you are missing above functionality I see why you went this way. >> If you need to push it, it's fine by me. > We will work on further improvements. Top of the list would > be eliminating the extra suspend mechanism. > > The implementation has matured for more than 12 months now [1]. It's been tested > extensively at SAP over that time and passed also extended testing at Oracle > kindly conducted by Vladimir Kozlov. We've got two full Reviews and incorporated > extensive feedback from a number of OpenJDK Reviewers (including you, > thanks!). Based on that I reckon we're good to push the change as enhancement > (JDK-8227745) and bug fix (JDK-8233915). > >> Thanks for explaining once again :) > Pleasure :) > > Thanks, Richard. > > [1] http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028729.html > > -----Original Message----- > From: Robbin Ehn > Sent: Mittwoch, 2. September 2020 16:54 > To: Reingruber, Richard ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hi Richard, > > On 2020-09-02 15:48, Reingruber, Richard wrote: >> Hi Robbin, >> >> // taking the discussion back to the mailing lists >> >> > I still don't understand why you don't deoptimize the objects inside the >> > handshake/safepoint instead? > So for handshakes using asynch handshake and allowing blocking inside > would fix that. (future fix, I'm working on that now) > > For safepoint, since we have suspended all threads, ~'safepointed them' > with a JavaThread, you _could_ just execute the action directly (e.g. > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > safely suspended until the destructor of EB, no? > > So I suggest future work to instead just execute the safepoint with the > requesting JT instead of having a this special safepoiting mechanism. > > Since you are missing above functionality I see why you went this way. > If you need to push it, it's fine by me. > > Thanks for explaining once again :) > > /Robbin > >> This is unfortunately not possible. Deoptimizing objects includes reallocating >> scalar replaced objects, i.e. calling Deoptimization::realloc_objects(). This >> cannot be done at a safepoint or handshake. >> >> 1. The vm thread is not allowed to allocate on the java heap >> See for instance assertions in ParallelScavengeHeap::mem_allocate() >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKya9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ >> >> This is not easy to change, I suppose, because it will be difficult to gc if >> necessary. >> >> 2. Using a direct handshake would not work either. The problem there is again >> gc. Let J be the JavaThread that is executing the direct handshake. The vm >> would deadlock if the vm thread waits for J to execute the closure of a >> handshake-all and J waits for the vm thread to execute a gc vm operation. >> Patricio Chilano made me aware of this: https://bugs.openjdk.java.net/browse/JDK-8230594 >> >> Cheers, Richard. >> >> -----Original Message----- >> From: Robbin Ehn >> Sent: Mittwoch, 2. September 2020 13:56 >> To: Reingruber, Richard >> Cc: Lindenmaier, Goetz ; Vladimir Kozlov ; David Holmes >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents >> >> Hi, >> >> I still don't understand why you don't deoptimize the objects inside the >> handshake/safepoint instead? >> >> E.g. >> >> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the code >> from: >> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over the >> stack, so: >> >> void >> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { >> assert(target->is_Java_thread(), "just checking"); >> JavaThread *jt = (JavaThread *)target; >> >> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >> + if (EscapeBarrier::deoptimize_objects(jt, MaxJavaStackTraceDepth)) { >> _result = >> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, >> _owned_monitors_list); >> } else { >> _result = JVMTI_ERROR_OUT_OF_MEMORY; >> } >> } >> } >> >> Why try 'suspend' the thread first? >> >> >> When we de-optimize all threads why not just in the following safepoint? >> E.g. >> VM_HeapWalkOperation::doit() { >> + EscapeBarrier::deoptimize_objects_all_threads(); >> ... >> } >> >> Thanks, Robbin >> >> From martin.thompson at oracle.com Tue Sep 8 16:54:37 2020 From: martin.thompson at oracle.com (Marty Thompson) Date: Tue, 8 Sep 2020 09:54:37 -0700 (PDT) Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Message-ID: Hello Richard, It would be good if Serguei Spitsyn could review before this is pushed. Serguei is out this week. Can you wait until Serguei is back in the office the week of Sept 14? Regards, Marty > -----Original Message----- > From: Reingruber, Richard > Sent: Tuesday, September 8, 2020 9:45 AM > To: Daniel Daugherty ; serviceability-dev > ; hotspot-compiler- > dev at openjdk.java.net; Hotspot dev runtime dev at openjdk.java.net> > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Dan, > > I'd be very happy about a review from somebody on the Serviceability team. > I have asked for reviews many times (kindly I hope). And the change is for > review for more than a year now. > > According to [1] I'd think all requirements to push are met already. But > maybe I missed something? > > After renaming of methods in SafepointMechanism the change needs to be > rebased (already done). I'll publish a pull request as soon as possible. > > Thanks, Richard. > > [1] > https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change > > -----Original Message----- > From: Daniel D. Daugherty > Sent: Dienstag, 8. September 2020 18:16 > To: Reingruber, Richard ; serviceability-dev > ; hotspot-compiler- > dev at openjdk.java.net; Hotspot dev runtime dev at openjdk.java.net> > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Richard, > > I haven't seen a review from anyone on the Serviceability team and I think > you should get a review from them since JVM/TI is involved. > Perhaps I missed it... > > Dan > > > On 9/7/20 10:09 AM, Reingruber, Richard wrote: > > Hi, > > > > I would like to close the review of this change. > > > > It has received a lot of helpful feedback during the process and 2 > > full Reviews. Thanks everybody! > > > > I'm planning to push it this week on Thursday as solution for JBS items: > > > > https://bugs.openjdk.java.net/browse/JDK-8227745 > > https://bugs.openjdk.java.net/browse/JDK-8233915 > > > > Version to be pushed: > > > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > > > > Hope to get my GIT/Skara setup going until then... :) > > > > Thanks, Richard. > > > > -----Original Message----- > > From: hotspot-compiler-dev > > On Behalf Of Reingruber, > > Richard > > Sent: Mittwoch, 2. September 2020 23:27 > > To: Robbin Ehn ; serviceability-dev > > ; > > hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > > > > Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for > > Better Performance in the Presence of JVMTI Agents > > > > Hi Robin, > > > >> On 2020-09-02 15:48, Reingruber, Richard wrote: > >>> Hi Robbin, > >>> > >>> // taking the discussion back to the mailing lists > >>> > >>> > I still don't understand why you don't deoptimize the objects inside > the > >>> > handshake/safepoint instead? > >> So for handshakes using asynch handshake and allowing blocking inside > >> would fix that. (future fix, I'm working on that now) > > Just to make it clear: I'm not fond of the extra suspension mechanism > > currently used for JDK-8227745 either. I want to get rid of it and I > > will work on it. Asynch handshakes (JDK-8238761) could be a > > replacement for it. At least I think they can be used to suspend the target > thread. > > > >> For safepoint, since we have suspended all threads, ~'safepointed them' > >> with a JavaThread, you _could_ just execute the action directly (e.g. > >> skipping VM_HeapWalkOperation safepoint) since they are suppose to be > >> safely suspended until the destructor of EB, no? > > Yes, this should be possible. This would be an advanced change though. > > I would like EscapeBarriers to be a no-op and fall back to current > > implementation, if C2-EscapeAnalysis/Graal are disabled. > > > >> So I suggest future work to instead just execute the safepoint with > >> the requesting JT instead of having a this special safepoiting mechanism. > >> Since you are missing above functionality I see why you went this way. > >> If you need to push it, it's fine by me. > > We will work on further improvements. Top of the list would be > > eliminating the extra suspend mechanism. > > > > The implementation has matured for more than 12 months now [1]. It's > > been tested extensively at SAP over that time and passed also extended > > testing at Oracle kindly conducted by Vladimir Kozlov. We've got two > > full Reviews and incorporated extensive feedback from a number of > > OpenJDK Reviewers (including you, thanks!). Based on that I reckon > > we're good to push the change as enhancement > > (JDK-8227745) and bug fix (JDK-8233915). > > > >> Thanks for explaining once again :) > > Pleasure :) > > > > Thanks, Richard. > > > > [1] > > http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/02 > > 8729.html > > > > -----Original Message----- > > From: Robbin Ehn > > Sent: Mittwoch, 2. September 2020 16:54 > > To: Reingruber, Richard ; > > serviceability-dev ; > > hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > Performance in the Presence of JVMTI Agents > > > > Hi Richard, > > > > On 2020-09-02 15:48, Reingruber, Richard wrote: > >> Hi Robbin, > >> > >> // taking the discussion back to the mailing lists > >> > >> > I still don't understand why you don't deoptimize the objects inside > the > >> > handshake/safepoint instead? > > So for handshakes using asynch handshake and allowing blocking inside > > would fix that. (future fix, I'm working on that now) > > > > For safepoint, since we have suspended all threads, ~'safepointed them' > > with a JavaThread, you _could_ just execute the action directly (e.g. > > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > > safely suspended until the destructor of EB, no? > > > > So I suggest future work to instead just execute the safepoint with > > the requesting JT instead of having a this special safepoiting mechanism. > > > > Since you are missing above functionality I see why you went this way. > > If you need to push it, it's fine by me. > > > > Thanks for explaining once again :) > > > > /Robbin > > > >> This is unfortunately not possible. Deoptimizing objects includes > >> reallocating scalar replaced objects, i.e. calling > >> Deoptimization::realloc_objects(). This cannot be done at a safepoint or > handshake. > >> > >> 1. The vm thread is not allowed to allocate on the java heap > >> See for instance assertions in ParallelScavengeHeap::mem_allocate() > >> > >> > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e > >> > 045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/par > >> > allelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKy > a > >> 9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ > >> > >> This is not easy to change, I suppose, because it will be difficult to gc if > >> necessary. > >> > >> 2. Using a direct handshake would not work either. The problem there is > again > >> gc. Let J be the JavaThread that is executing the direct handshake. The > vm > >> would deadlock if the vm thread waits for J to execute the closure of a > >> handshake-all and J waits for the vm thread to execute a gc vm > operation. > >> Patricio Chilano made me aware of this: > >> https://bugs.openjdk.java.net/browse/JDK-8230594 > >> > >> Cheers, Richard. > >> > >> -----Original Message----- > >> From: Robbin Ehn > >> Sent: Mittwoch, 2. September 2020 13:56 > >> To: Reingruber, Richard > >> Cc: Lindenmaier, Goetz ; Vladimir Kozlov > >> ; David Holmes > > >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > >> Performance in the Presence of JVMTI Agents > >> > >> Hi, > >> > >> I still don't understand why you don't deoptimize the objects inside > >> the handshake/safepoint instead? > >> > >> E.g. > >> > >> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the > >> code > >> from: > >> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over > >> the stack, so: > >> > >> void > >> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { > >> assert(target->is_Java_thread(), "just checking"); > >> JavaThread *jt = (JavaThread *)target; > >> > >> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > >> + if (EscapeBarrier::deoptimize_objects(jt, > >> + MaxJavaStackTraceDepth)) { > >> _result = > >> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, > >> _owned_monitors_list); > >> } else { > >> _result = JVMTI_ERROR_OUT_OF_MEMORY; > >> } > >> } > >> } > >> > >> Why try 'suspend' the thread first? > >> > >> > >> When we de-optimize all threads why not just in the following safepoint? > >> E.g. > >> VM_HeapWalkOperation::doit() { > >> + EscapeBarrier::deoptimize_objects_all_threads(); > >> ... > >> } > >> > >> Thanks, Robbin > >> > >> > From richard.reingruber at sap.com Tue Sep 8 17:02:29 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 8 Sep 2020 17:02:29 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Message-ID: Hello Marty, Sure. I'd be happy if Serguei could review the change. Thanks, Richard. -----Original Message----- From: Marty Thompson Sent: Dienstag, 8. September 2020 18:55 To: Reingruber, Richard ; Daniel Daugherty ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hello Richard, It would be good if Serguei Spitsyn could review before this is pushed. Serguei is out this week. Can you wait until Serguei is back in the office the week of Sept 14? Regards, Marty > -----Original Message----- > From: Reingruber, Richard > Sent: Tuesday, September 8, 2020 9:45 AM > To: Daniel Daugherty ; serviceability-dev > ; hotspot-compiler- > dev at openjdk.java.net; Hotspot dev runtime dev at openjdk.java.net> > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Dan, > > I'd be very happy about a review from somebody on the Serviceability team. > I have asked for reviews many times (kindly I hope). And the change is for > review for more than a year now. > > According to [1] I'd think all requirements to push are met already. But > maybe I missed something? > > After renaming of methods in SafepointMechanism the change needs to be > rebased (already done). I'll publish a pull request as soon as possible. > > Thanks, Richard. > > [1] > https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change > > -----Original Message----- > From: Daniel D. Daugherty > Sent: Dienstag, 8. September 2020 18:16 > To: Reingruber, Richard ; serviceability-dev > ; hotspot-compiler- > dev at openjdk.java.net; Hotspot dev runtime dev at openjdk.java.net> > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Richard, > > I haven't seen a review from anyone on the Serviceability team and I think > you should get a review from them since JVM/TI is involved. > Perhaps I missed it... > > Dan > > > On 9/7/20 10:09 AM, Reingruber, Richard wrote: > > Hi, > > > > I would like to close the review of this change. > > > > It has received a lot of helpful feedback during the process and 2 > > full Reviews. Thanks everybody! > > > > I'm planning to push it this week on Thursday as solution for JBS items: > > > > https://bugs.openjdk.java.net/browse/JDK-8227745 > > https://bugs.openjdk.java.net/browse/JDK-8233915 > > > > Version to be pushed: > > > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > > > > Hope to get my GIT/Skara setup going until then... :) > > > > Thanks, Richard. > > > > -----Original Message----- > > From: hotspot-compiler-dev > > On Behalf Of Reingruber, > > Richard > > Sent: Mittwoch, 2. September 2020 23:27 > > To: Robbin Ehn ; serviceability-dev > > ; > > hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > > > > Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for > > Better Performance in the Presence of JVMTI Agents > > > > Hi Robin, > > > >> On 2020-09-02 15:48, Reingruber, Richard wrote: > >>> Hi Robbin, > >>> > >>> // taking the discussion back to the mailing lists > >>> > >>> > I still don't understand why you don't deoptimize the objects inside > the > >>> > handshake/safepoint instead? > >> So for handshakes using asynch handshake and allowing blocking inside > >> would fix that. (future fix, I'm working on that now) > > Just to make it clear: I'm not fond of the extra suspension mechanism > > currently used for JDK-8227745 either. I want to get rid of it and I > > will work on it. Asynch handshakes (JDK-8238761) could be a > > replacement for it. At least I think they can be used to suspend the target > thread. > > > >> For safepoint, since we have suspended all threads, ~'safepointed them' > >> with a JavaThread, you _could_ just execute the action directly (e.g. > >> skipping VM_HeapWalkOperation safepoint) since they are suppose to be > >> safely suspended until the destructor of EB, no? > > Yes, this should be possible. This would be an advanced change though. > > I would like EscapeBarriers to be a no-op and fall back to current > > implementation, if C2-EscapeAnalysis/Graal are disabled. > > > >> So I suggest future work to instead just execute the safepoint with > >> the requesting JT instead of having a this special safepoiting mechanism. > >> Since you are missing above functionality I see why you went this way. > >> If you need to push it, it's fine by me. > > We will work on further improvements. Top of the list would be > > eliminating the extra suspend mechanism. > > > > The implementation has matured for more than 12 months now [1]. It's > > been tested extensively at SAP over that time and passed also extended > > testing at Oracle kindly conducted by Vladimir Kozlov. We've got two > > full Reviews and incorporated extensive feedback from a number of > > OpenJDK Reviewers (including you, thanks!). Based on that I reckon > > we're good to push the change as enhancement > > (JDK-8227745) and bug fix (JDK-8233915). > > > >> Thanks for explaining once again :) > > Pleasure :) > > > > Thanks, Richard. > > > > [1] > > http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/02 > > 8729.html > > > > -----Original Message----- > > From: Robbin Ehn > > Sent: Mittwoch, 2. September 2020 16:54 > > To: Reingruber, Richard ; > > serviceability-dev ; > > hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > Performance in the Presence of JVMTI Agents > > > > Hi Richard, > > > > On 2020-09-02 15:48, Reingruber, Richard wrote: > >> Hi Robbin, > >> > >> // taking the discussion back to the mailing lists > >> > >> > I still don't understand why you don't deoptimize the objects inside > the > >> > handshake/safepoint instead? > > So for handshakes using asynch handshake and allowing blocking inside > > would fix that. (future fix, I'm working on that now) > > > > For safepoint, since we have suspended all threads, ~'safepointed them' > > with a JavaThread, you _could_ just execute the action directly (e.g. > > skipping VM_HeapWalkOperation safepoint) since they are suppose to be > > safely suspended until the destructor of EB, no? > > > > So I suggest future work to instead just execute the safepoint with > > the requesting JT instead of having a this special safepoiting mechanism. > > > > Since you are missing above functionality I see why you went this way. > > If you need to push it, it's fine by me. > > > > Thanks for explaining once again :) > > > > /Robbin > > > >> This is unfortunately not possible. Deoptimizing objects includes > >> reallocating scalar replaced objects, i.e. calling > >> Deoptimization::realloc_objects(). This cannot be done at a safepoint or > handshake. > >> > >> 1. The vm thread is not allowed to allocate on the java heap > >> See for instance assertions in ParallelScavengeHeap::mem_allocate() > >> > >> > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e > >> > 045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/par > >> > allelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKy > a > >> 9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ > >> > >> This is not easy to change, I suppose, because it will be difficult to gc if > >> necessary. > >> > >> 2. Using a direct handshake would not work either. The problem there is > again > >> gc. Let J be the JavaThread that is executing the direct handshake. The > vm > >> would deadlock if the vm thread waits for J to execute the closure of a > >> handshake-all and J waits for the vm thread to execute a gc vm > operation. > >> Patricio Chilano made me aware of this: > >> https://bugs.openjdk.java.net/browse/JDK-8230594 > >> > >> Cheers, Richard. > >> > >> -----Original Message----- > >> From: Robbin Ehn > >> Sent: Mittwoch, 2. September 2020 13:56 > >> To: Reingruber, Richard > >> Cc: Lindenmaier, Goetz ; Vladimir Kozlov > >> ; David Holmes > > >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > >> Performance in the Presence of JVMTI Agents > >> > >> Hi, > >> > >> I still don't understand why you don't deoptimize the objects inside > >> the handshake/safepoint instead? > >> > >> E.g. > >> > >> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the > >> code > >> from: > >> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over > >> the stack, so: > >> > >> void > >> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { > >> assert(target->is_Java_thread(), "just checking"); > >> JavaThread *jt = (JavaThread *)target; > >> > >> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > >> + if (EscapeBarrier::deoptimize_objects(jt, > >> + MaxJavaStackTraceDepth)) { > >> _result = > >> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, > >> _owned_monitors_list); > >> } else { > >> _result = JVMTI_ERROR_OUT_OF_MEMORY; > >> } > >> } > >> } > >> > >> Why try 'suspend' the thread first? > >> > >> > >> When we de-optimize all threads why not just in the following safepoint? > >> E.g. > >> VM_HeapWalkOperation::doit() { > >> + EscapeBarrier::deoptimize_objects_all_threads(); > >> ... > >> } > >> > >> Thanks, Robbin > >> > >> > From dnsimon at openjdk.java.net Tue Sep 8 21:19:09 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 8 Sep 2020 21:19:09 GMT Subject: RFR: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode [v2] In-Reply-To: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> References: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> Message-ID: > To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is > added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: > * Each JVMCI compiler thread has a "compilation ticks" counter. > * There is also a global JVMCI compilation ticks counter. > * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. > * Every 512 increments of such a counter also increments the global counter. > * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a > defined period. Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: add compilation ticks for mitigating against deadlock due to blocking JVMCI compilation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/35/files - new: https://git.openjdk.java.net/jdk/pull/35/files/57870a78..94e4a3f4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=35&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=35&range=00-01 Stats: 7637 lines in 271 files changed: 5956 ins; 636 del; 1045 mod Patch: https://git.openjdk.java.net/jdk/pull/35.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/35/head:pull/35 PR: https://git.openjdk.java.net/jdk/pull/35 From david.holmes at oracle.com Tue Sep 8 22:29:03 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 9 Sep 2020 08:29:03 +1000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Message-ID: <9c1f2053-2055-42a0-1fd6-94793c0ff2e2@oracle.com> Hi Richard, I suspect this one fell off the radar due to the extended review period. The actual review started last December (there was prior discussion IIRC) and only seemed to get partial reviews. I only looked at some parts. Robbin may have given things a deeper look, but seemed focused on the handshake aspects. Vladimir said he would do a full review but I can't find it. Eventually Martin and Goetz took over reviewing and everyone else dropped off. :( As this covers a number of areas it really does need "approval" from each area (and yes the hotspot wiki should reflect this). I will try to take another look while we await Serguei's return (and I never did follow up on the problem I had with the nested lock elimination handling. :( ). Meanwhile this will need to be converted to a PR in any case. Thanks, David On 9/09/2020 3:02 am, Reingruber, Richard wrote: > Hello Marty, > > Sure. I'd be happy if Serguei could review the change. > > Thanks, Richard. > > -----Original Message----- > From: Marty Thompson > Sent: Dienstag, 8. September 2020 18:55 > To: Reingruber, Richard ; Daniel Daugherty ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hello Richard, > > It would be good if Serguei Spitsyn could review before this is pushed. Serguei is out this week. Can you wait until Serguei is back in the office the week of Sept 14? > > Regards, > > Marty > >> -----Original Message----- >> From: Reingruber, Richard >> Sent: Tuesday, September 8, 2020 9:45 AM >> To: Daniel Daugherty ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Dan, >> >> I'd be very happy about a review from somebody on the Serviceability team. >> I have asked for reviews many times (kindly I hope). And the change is for >> review for more than a year now. >> >> According to [1] I'd think all requirements to push are met already. But >> maybe I missed something? >> >> After renaming of methods in SafepointMechanism the change needs to be >> rebased (already done). I'll publish a pull request as soon as possible. >> >> Thanks, Richard. >> >> [1] >> https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change >> >> -----Original Message----- >> From: Daniel D. Daugherty >> Sent: Dienstag, 8. September 2020 18:16 >> To: Reingruber, Richard ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Richard, >> >> I haven't seen a review from anyone on the Serviceability team and I think >> you should get a review from them since JVM/TI is involved. >> Perhaps I missed it... >> >> Dan >> >> >> On 9/7/20 10:09 AM, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to close the review of this change. >>> >>> It has received a lot of helpful feedback during the process and 2 >>> full Reviews. Thanks everybody! >>> >>> I'm planning to push it this week on Thursday as solution for JBS items: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> https://bugs.openjdk.java.net/browse/JDK-8233915 >>> >>> Version to be pushed: >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ >>> >>> Hope to get my GIT/Skara setup going until then... :) >>> >>> Thanks, Richard. >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On Behalf Of Reingruber, >>> Richard >>> Sent: Mittwoch, 2. September 2020 23:27 >>> To: Robbin Ehn ; serviceability-dev >>> ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for >>> Better Performance in the Presence of JVMTI Agents >>> >>> Hi Robin, >>> >>>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>>> Hi Robbin, >>>>> >>>>> // taking the discussion back to the mailing lists >>>>> >>>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>>> > handshake/safepoint instead? >>>> So for handshakes using asynch handshake and allowing blocking inside >>>> would fix that. (future fix, I'm working on that now) >>> Just to make it clear: I'm not fond of the extra suspension mechanism >>> currently used for JDK-8227745 either. I want to get rid of it and I >>> will work on it. Asynch handshakes (JDK-8238761) could be a >>> replacement for it. At least I think they can be used to suspend the target >> thread. >>> >>>> For safepoint, since we have suspended all threads, ~'safepointed them' >>>> with a JavaThread, you _could_ just execute the action directly (e.g. >>>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>>> safely suspended until the destructor of EB, no? >>> Yes, this should be possible. This would be an advanced change though. >>> I would like EscapeBarriers to be a no-op and fall back to current >>> implementation, if C2-EscapeAnalysis/Graal are disabled. >>> >>>> So I suggest future work to instead just execute the safepoint with >>>> the requesting JT instead of having a this special safepoiting mechanism. >>>> Since you are missing above functionality I see why you went this way. >>>> If you need to push it, it's fine by me. >>> We will work on further improvements. Top of the list would be >>> eliminating the extra suspend mechanism. >>> >>> The implementation has matured for more than 12 months now [1]. It's >>> been tested extensively at SAP over that time and passed also extended >>> testing at Oracle kindly conducted by Vladimir Kozlov. We've got two >>> full Reviews and incorporated extensive feedback from a number of >>> OpenJDK Reviewers (including you, thanks!). Based on that I reckon >>> we're good to push the change as enhancement >>> (JDK-8227745) and bug fix (JDK-8233915). >>> >>>> Thanks for explaining once again :) >>> Pleasure :) >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/02 >>> 8729.html >>> >>> -----Original Message----- >>> From: Robbin Ehn >>> Sent: Mittwoch, 2. September 2020 16:54 >>> To: Reingruber, Richard ; >>> serviceability-dev ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>> Hi Robbin, >>>> >>>> // taking the discussion back to the mailing lists >>>> >>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>> > handshake/safepoint instead? >>> So for handshakes using asynch handshake and allowing blocking inside >>> would fix that. (future fix, I'm working on that now) >>> >>> For safepoint, since we have suspended all threads, ~'safepointed them' >>> with a JavaThread, you _could_ just execute the action directly (e.g. >>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>> safely suspended until the destructor of EB, no? >>> >>> So I suggest future work to instead just execute the safepoint with >>> the requesting JT instead of having a this special safepoiting mechanism. >>> >>> Since you are missing above functionality I see why you went this way. >>> If you need to push it, it's fine by me. >>> >>> Thanks for explaining once again :) >>> >>> /Robbin >>> >>>> This is unfortunately not possible. Deoptimizing objects includes >>>> reallocating scalar replaced objects, i.e. calling >>>> Deoptimization::realloc_objects(). This cannot be done at a safepoint or >> handshake. >>>> >>>> 1. The vm thread is not allowed to allocate on the java heap >>>> See for instance assertions in ParallelScavengeHeap::mem_allocate() >>>> >>>> >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e >>>> >> 045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/par >>>> >> allelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKy >> a >>>> 9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ >>>> >>>> This is not easy to change, I suppose, because it will be difficult to gc if >>>> necessary. >>>> >>>> 2. Using a direct handshake would not work either. The problem there is >> again >>>> gc. Let J be the JavaThread that is executing the direct handshake. The >> vm >>>> would deadlock if the vm thread waits for J to execute the closure of a >>>> handshake-all and J waits for the vm thread to execute a gc vm >> operation. >>>> Patricio Chilano made me aware of this: >>>> https://bugs.openjdk.java.net/browse/JDK-8230594 >>>> >>>> Cheers, Richard. >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn >>>> Sent: Mittwoch, 2. September 2020 13:56 >>>> To: Reingruber, Richard >>>> Cc: Lindenmaier, Goetz ; Vladimir Kozlov >>>> ; David Holmes >> >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi, >>>> >>>> I still don't understand why you don't deoptimize the objects inside >>>> the handshake/safepoint instead? >>>> >>>> E.g. >>>> >>>> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the >>>> code >>>> from: >>>> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over >>>> the stack, so: >>>> >>>> void >>>> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { >>>> assert(target->is_Java_thread(), "just checking"); >>>> JavaThread *jt = (JavaThread *)target; >>>> >>>> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >>>> + if (EscapeBarrier::deoptimize_objects(jt, >>>> + MaxJavaStackTraceDepth)) { >>>> _result = >>>> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, >>>> _owned_monitors_list); >>>> } else { >>>> _result = JVMTI_ERROR_OUT_OF_MEMORY; >>>> } >>>> } >>>> } >>>> >>>> Why try 'suspend' the thread first? >>>> >>>> >>>> When we de-optimize all threads why not just in the following safepoint? >>>> E.g. >>>> VM_HeapWalkOperation::doit() { >>>> + EscapeBarrier::deoptimize_objects_all_threads(); >>>> ... >>>> } >>>> >>>> Thanks, Robbin >>>> >>>> >> From thartmann at openjdk.java.net Wed Sep 9 06:00:31 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 9 Sep 2020 06:00:31 GMT Subject: Integrated: 8252916: Newline in object field values list of ScopeDesc should be removed In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 14:06:48 GMT, Tobias Hartmann wrote: > Given the following test: > > public class Test { > > static class MyClass { > Object o1 = null; > Object o2 = new Integer(42); > } > > static Object test(boolean trap) { > MyClass obj = new MyClass(); > if (trap) { } > return obj.o1; > } > > public static void main(String[] args) { > for (int i = 0; i < 100_000; ++i) { > test(false); > } > } > } > > The ScopeDesc for the uncommon trap in C2 compiled 'test' is printed like this: > > ScopeDesc(pc=0x00007f52a5160144 offset=84): > Test::test at 9 (line 11) reexecute=true > Locals > - l0: empty > - l1: obj[52] > Expression stack > - @0: reg rbp [10],int > Objects > - 52: Test$MyClass NULL > , stack[0],oop > > There should be no newline after "NULL". > > This is a regression from [JDK-8202171](https://bugs.openjdk.java.net/browse/JDK-8202171) in JDK 12. The fix is to no > print a new line in to be consistent with 'print_value_on'. > Thanks, > Tobias This pull request has now been integrated. Changeset: c655b703 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c655b703 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8252916: Newline in object field values list of ScopeDesc should be removed Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/75 From richard.reingruber at sap.com Wed Sep 9 07:14:15 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 9 Sep 2020 07:14:15 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: <9c1f2053-2055-42a0-1fd6-94793c0ff2e2@oracle.com> References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> <9c1f2053-2055-42a0-1fd6-94793c0ff2e2@oracle.com> Message-ID: > Hi Richard, > I suspect this one fell off the radar due to the extended review period. > The actual review started last December (there was prior discussion > IIRC) and only seemed to get partial reviews. I only looked at some > parts. Robbin may have given things a deeper look, but seemed focused on > the handshake aspects. Vladimir said he would do a full review but I > can't find it. Eventually Martin and Goetz took over reviewing and > everyone else dropped off. :( That's how it went I reckon. I repeatedly asked for feedback and reviews, and also tried to keep Vladimir, Robbin, and you in the loop addressing you directly (e.g. [1]) > As this covers a number of areas it really does need "approval" from > each area (and yes the hotspot wiki should reflect this). I agree. The wiki should define that in a clear manner. And the community should be involved in that definition. > I will try to take another look while we await Serguei's return (and I > never did follow up on the problem I had with the nested lock > elimination handling. :( ). Thanks for doing it. > Meanwhile this will need to be converted to a PR in any case. I hope to get the PR out later but we've got a team outing today... we haven't seen each other since months... :) Cheers, Richard. [1] http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/030911.html -----Original Message----- From: David Holmes Sent: Mittwoch, 9. September 2020 00:29 To: Reingruber, Richard ; Marty Thompson ; Daniel Daugherty ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime ; Robbin Ehn ; Vladimir Kozlov Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, I suspect this one fell off the radar due to the extended review period. The actual review started last December (there was prior discussion IIRC) and only seemed to get partial reviews. I only looked at some parts. Robbin may have given things a deeper look, but seemed focused on the handshake aspects. Vladimir said he would do a full review but I can't find it. Eventually Martin and Goetz took over reviewing and everyone else dropped off. :( As this covers a number of areas it really does need "approval" from each area (and yes the hotspot wiki should reflect this). I will try to take another look while we await Serguei's return (and I never did follow up on the problem I had with the nested lock elimination handling. :( ). Meanwhile this will need to be converted to a PR in any case. Thanks, David On 9/09/2020 3:02 am, Reingruber, Richard wrote: > Hello Marty, > > Sure. I'd be happy if Serguei could review the change. > > Thanks, Richard. > > -----Original Message----- > From: Marty Thompson > Sent: Dienstag, 8. September 2020 18:55 > To: Reingruber, Richard ; Daniel Daugherty ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hello Richard, > > It would be good if Serguei Spitsyn could review before this is pushed. Serguei is out this week. Can you wait until Serguei is back in the office the week of Sept 14? > > Regards, > > Marty > >> -----Original Message----- >> From: Reingruber, Richard >> Sent: Tuesday, September 8, 2020 9:45 AM >> To: Daniel Daugherty ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Dan, >> >> I'd be very happy about a review from somebody on the Serviceability team. >> I have asked for reviews many times (kindly I hope). And the change is for >> review for more than a year now. >> >> According to [1] I'd think all requirements to push are met already. But >> maybe I missed something? >> >> After renaming of methods in SafepointMechanism the change needs to be >> rebased (already done). I'll publish a pull request as soon as possible. >> >> Thanks, Richard. >> >> [1] >> https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change >> >> -----Original Message----- >> From: Daniel D. Daugherty >> Sent: Dienstag, 8. September 2020 18:16 >> To: Reingruber, Richard ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Richard, >> >> I haven't seen a review from anyone on the Serviceability team and I think >> you should get a review from them since JVM/TI is involved. >> Perhaps I missed it... >> >> Dan >> >> >> On 9/7/20 10:09 AM, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to close the review of this change. >>> >>> It has received a lot of helpful feedback during the process and 2 >>> full Reviews. Thanks everybody! >>> >>> I'm planning to push it this week on Thursday as solution for JBS items: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> https://bugs.openjdk.java.net/browse/JDK-8233915 >>> >>> Version to be pushed: >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ >>> >>> Hope to get my GIT/Skara setup going until then... :) >>> >>> Thanks, Richard. >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On Behalf Of Reingruber, >>> Richard >>> Sent: Mittwoch, 2. September 2020 23:27 >>> To: Robbin Ehn ; serviceability-dev >>> ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for >>> Better Performance in the Presence of JVMTI Agents >>> >>> Hi Robin, >>> >>>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>>> Hi Robbin, >>>>> >>>>> // taking the discussion back to the mailing lists >>>>> >>>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>>> > handshake/safepoint instead? >>>> So for handshakes using asynch handshake and allowing blocking inside >>>> would fix that. (future fix, I'm working on that now) >>> Just to make it clear: I'm not fond of the extra suspension mechanism >>> currently used for JDK-8227745 either. I want to get rid of it and I >>> will work on it. Asynch handshakes (JDK-8238761) could be a >>> replacement for it. At least I think they can be used to suspend the target >> thread. >>> >>>> For safepoint, since we have suspended all threads, ~'safepointed them' >>>> with a JavaThread, you _could_ just execute the action directly (e.g. >>>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>>> safely suspended until the destructor of EB, no? >>> Yes, this should be possible. This would be an advanced change though. >>> I would like EscapeBarriers to be a no-op and fall back to current >>> implementation, if C2-EscapeAnalysis/Graal are disabled. >>> >>>> So I suggest future work to instead just execute the safepoint with >>>> the requesting JT instead of having a this special safepoiting mechanism. >>>> Since you are missing above functionality I see why you went this way. >>>> If you need to push it, it's fine by me. >>> We will work on further improvements. Top of the list would be >>> eliminating the extra suspend mechanism. >>> >>> The implementation has matured for more than 12 months now [1]. It's >>> been tested extensively at SAP over that time and passed also extended >>> testing at Oracle kindly conducted by Vladimir Kozlov. We've got two >>> full Reviews and incorporated extensive feedback from a number of >>> OpenJDK Reviewers (including you, thanks!). Based on that I reckon >>> we're good to push the change as enhancement >>> (JDK-8227745) and bug fix (JDK-8233915). >>> >>>> Thanks for explaining once again :) >>> Pleasure :) >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/02 >>> 8729.html >>> >>> -----Original Message----- >>> From: Robbin Ehn >>> Sent: Mittwoch, 2. September 2020 16:54 >>> To: Reingruber, Richard ; >>> serviceability-dev ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>> Hi Robbin, >>>> >>>> // taking the discussion back to the mailing lists >>>> >>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>> > handshake/safepoint instead? >>> So for handshakes using asynch handshake and allowing blocking inside >>> would fix that. (future fix, I'm working on that now) >>> >>> For safepoint, since we have suspended all threads, ~'safepointed them' >>> with a JavaThread, you _could_ just execute the action directly (e.g. >>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>> safely suspended until the destructor of EB, no? >>> >>> So I suggest future work to instead just execute the safepoint with >>> the requesting JT instead of having a this special safepoiting mechanism. >>> >>> Since you are missing above functionality I see why you went this way. >>> If you need to push it, it's fine by me. >>> >>> Thanks for explaining once again :) >>> >>> /Robbin >>> >>>> This is unfortunately not possible. Deoptimizing objects includes >>>> reallocating scalar replaced objects, i.e. calling >>>> Deoptimization::realloc_objects(). This cannot be done at a safepoint or >> handshake. >>>> >>>> 1. The vm thread is not allowed to allocate on the java heap >>>> See for instance assertions in ParallelScavengeHeap::mem_allocate() >>>> >>>> >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e >>>> >> 045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/par >>>> >> allelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKy >> a >>>> 9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ >>>> >>>> This is not easy to change, I suppose, because it will be difficult to gc if >>>> necessary. >>>> >>>> 2. Using a direct handshake would not work either. The problem there is >> again >>>> gc. Let J be the JavaThread that is executing the direct handshake. The >> vm >>>> would deadlock if the vm thread waits for J to execute the closure of a >>>> handshake-all and J waits for the vm thread to execute a gc vm >> operation. >>>> Patricio Chilano made me aware of this: >>>> https://bugs.openjdk.java.net/browse/JDK-8230594 >>>> >>>> Cheers, Richard. >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn >>>> Sent: Mittwoch, 2. September 2020 13:56 >>>> To: Reingruber, Richard >>>> Cc: Lindenmaier, Goetz ; Vladimir Kozlov >>>> ; David Holmes >> >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi, >>>> >>>> I still don't understand why you don't deoptimize the objects inside >>>> the handshake/safepoint instead? >>>> >>>> E.g. >>>> >>>> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the >>>> code >>>> from: >>>> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over >>>> the stack, so: >>>> >>>> void >>>> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { >>>> assert(target->is_Java_thread(), "just checking"); >>>> JavaThread *jt = (JavaThread *)target; >>>> >>>> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >>>> + if (EscapeBarrier::deoptimize_objects(jt, >>>> + MaxJavaStackTraceDepth)) { >>>> _result = >>>> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, >>>> _owned_monitors_list); >>>> } else { >>>> _result = JVMTI_ERROR_OUT_OF_MEMORY; >>>> } >>>> } >>>> } >>>> >>>> Why try 'suspend' the thread first? >>>> >>>> >>>> When we de-optimize all threads why not just in the following safepoint? >>>> E.g. >>>> VM_HeapWalkOperation::doit() { >>>> + EscapeBarrier::deoptimize_objects_all_threads(); >>>> ... >>>> } >>>> >>>> Thanks, Robbin >>>> >>>> >> From aph at redhat.com Wed Sep 9 08:23:41 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Sep 2020 09:23:41 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: <9ffe38f1-85dc-c2fd-613f-3a97f7354a32@redhat.com> References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> <9ffe38f1-85dc-c2fd-613f-3a97f7354a32@redhat.com> Message-ID: On 08/09/2020 14:18, Andrew Haley wrote: > On 08/09/2020 13:01, Lindenmaier, Goetz wrote: >> One possibility to watch what Oracle is doing is that maintainers have >> an eye on the according filters of 11.0.x-oracle changes and flag them >> jdk11u-fix-no ad-hoc, before someone starts to actually work and >> downport the bug. > > Looks good to me. But we'd first have to agree on the principle of > not backporting some patches, and the criteria by which we'd decide. Thinking about this some more: it really should be a reviewer's job to object if a patch is likely to fail a risk-vs-reward test. Every patch should be considered carefully in this way. It's hard for inexperienced contributors to be able to make such judgements, so they should ask on the list if it isn't clear. But we can say this much: crashes and Java language specification failures will always qualify for fixes; performance improvements, especially small performance improvements, not so much. Compatibility bugs which break communications protocols must be fixed. Updates for new versions of communication protocols and new ciphers, probably. There's a wide grey area in between, it's true. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gdub at openjdk.java.net Wed Sep 9 08:25:10 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Wed, 9 Sep 2020 08:25:10 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode Message-ID: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance behaviour of lambdas in either direction. ------------- Commit messages: - 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode Changes: https://git.openjdk.java.net/jdk/pull/93/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8242451 Stats: 97 lines in 2 files changed: 73 ins; 5 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/93.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/93/head:pull/93 PR: https://git.openjdk.java.net/jdk/pull/93 From adinn at redhat.com Wed Sep 9 09:38:29 2020 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 9 Sep 2020 10:38:29 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> <9ffe38f1-85dc-c2fd-613f-3a97f7354a32@redhat.com> Message-ID: On 09/09/2020 09:23, Andrew Haley wrote: > Thinking about this some more: it really should be a reviewer's job to > object if a patch is likely to fail a risk-vs-reward test. Every patch > should be considered carefully in this way. It's hard for > inexperienced contributors to be able to make such judgements, so they > should ask on the list if it isn't clear. I agree. Indeed, I have already been doing that for some of the more complex patches. Another couple of things that should really be agreed between the backporter and a reviewer when a patch is complex or has wider-reaching side-effects than jst fixing the bug itself is whether 1) to limit the backport to some subset of the upstream change or even 2) to come up with a custom change that fixes the problem in a different way. That may involve the backporter asking for an early review of a preliminary patch. There's /nothing wrong/ with doing that. > But we can say this much: crashes and Java language specification > failures will always qualify for fixes; performance improvements, > especially small performance improvements, not so much. Compatibility > bugs which break communications protocols must be fixed. Updates for > new versions of communication protocols and new ciphers, probably. > > There's a wide grey area in between, it's true. Yes, and that is where reviewers can and should be brought in to help clarify things. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From jlahoda at openjdk.java.net Wed Sep 9 12:21:25 2020 From: jlahoda at openjdk.java.net (Jan Lahoda) Date: Wed, 9 Sep 2020 12:21:25 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: <9ARL_A2daS8-nEhhporpJpuRtdJJz8XY1mwyH_i99I8=.c3c3df72-8039-4243-b8c6-bd5040aabe64@github.com> On Wed, 9 Sep 2020 08:18:11 GMT, Gilles Duboscq wrote: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. test/langtools/tools/javac/lambda/lambdaExpression/LambdaTest6.java line 29: > 27: * @summary Add lambda tests > 28: * Test bridge methods for certain SAM conversions > 29: * Test the set of generate fields I would suggest to consider having the test under test/jdk/(java/lang/invoke/lambda), not under test/langtools/tools/javac. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From vladimir.x.ivanov at oracle.com Wed Sep 9 14:57:04 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 9 Sep 2020 17:57:04 +0300 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> Message-ID: <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> >> Similar strict check for constant shift is needed in OrVNode::Ideal >> routine in vectornode.cpp. > > It took me some time to analyze your code for "lazy de-generation" of > Rotate vectors. As I understand you want to preserve scalar optimization > which creates Rotate nodes but have to revert it to keep vectorization > of java code. Yes, the main motivation was to simplify the implementation: keep vectorization logic simple (just Rotate -> RotateV transformation) and don't mess with matching if RotateV is not supported (expand ideal nodes instead of adding special AD instructions). > Method degenerate_vector_rotate() has is_Con() check and, in general, it > could be TOP because we do loop optimizations after vectorization. I > added isa_int() check and treat 'cnt' in other case as variable to do > transformation on 'else' branch and let sub-graph collapse there. > I also refactor degenerate_vector_rotate() to make it compact. Good point. > Second, about OrVNode::Ideal(). I am not sure how safe it is without > additional investigation because currently it is not executed. Based on > comment it was added for VectorAPI which is experimental and not pushed > yet. > The code is convoluted and does not match scalar Or::Ideal() code. > > OrINode::Ideal() does next checks for left rotation: > ? if (Matcher::match_rule_supported(Op_RotateLeft) && > ????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && in(1)->in(1) > == in(2)->in(1)) { > > but OrVNode::Ideal() does: > ? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, bt) && > ????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || > > Why it checks RIGHT operator for LShiftV???? > > And asserts are contradicting: > ?? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV operand > expected"); > > Was this code tested? My immediate reaction is simple delete it now and > add reworked and tested version back with EnableVectorSupport flag check > after VectorAPI is integrated. Sounds good. My feedback on OrVNode::Ideal() during 8248830 review was: "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight idealization, inferencing of vector rotate through OrV idealization covers the vector patterns generated though non SLP route i.e. VectorAPI. I'm fine with keeping OrV::Ideal(), but I'm concerned with the general direction here - duplication of scalar transformations to lane-wise vector operations. It definitely won't scale and in a longer run it risks to diverge. Would be nice to find a way to automatically "lift" scalar transformations to vectors and apply them uniformly. But right now it is just an idea which requires more experimentation." > Reworked version may use the same new rotate_shift() I added. I start > rewriting it but since I can't test it and I am not sure may be edges > are swapped indeed. I am suggesting to remove it. > > Also VectorAPI should use Rotate vectors from start which we can > de-generation if not supported. So I am not sure how OrVNode::Ideal() > will be usefull for VEctorAPI too. Though the API exposes rotation as a dedicated operation, users are free to code it explicitly with vector shifts and vector or. Basically, the situation is similar to scalar case: there are dedicated methods available (Long.rotateLeft/rotateRight), but users are free to code their own variant. And sometimes more general code shapes may degenerate into rotates (as a result of other optimizations). And from Vector API implementation perspective, it is attractive to implement vector rotation purely in Java code as a composition of vector shifts/or operations rather than using JVM intrinsic for it. So, there's a number of use cases when transformations on vector nodes becomes profitable. > http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ Looks good. Best regards, Vladimir Ivanov > > About testing. I see you used a lot of -128, 128 and similar values > which are larger then bits in Java Integer and Long. > But Java do masking of shift count by default before executing shift. > I would prefer if something like 31 (or 63 for Long) were used instead. > Otherwise Rotate vectors are not generated and tested. > > compiler/intrinsics/TestRotate.java calls verify() after each operation > as result it is really hard to see generated assembler. I think we > should at least exclude inlinining of verify(). > > I will work on tests and have an other update. > > Thanks, > Vladimir K > > >> >> Regards, >> Jatin >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On >>> Behalf Of Vladimir Kozlov >>> Sent: Friday, September 4, 2020 3:14 AM >>> To: hotspot compiler >>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>> bool)+0x8b9 >>> >>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8252188 >>> >>> Code added by 8248830 [1] uses Node::is_Con() check when looking for >>> constant shift values. >>> Unfortunately it does not guarantee that it will be Integer constant >>> because TOP node is also ConNode. >>> I used C2 types to check and get shift values. I also refactor code to >>> consolidate checks. >>> >>> Tested: tier1, hs-tier2, hs-tier3. >>> Verified fix with replay file from bug report. >>> I also checked that RotateBenchmark.java added by 8248830 still creates >>> Rotate vectors after this fix. >>> >>> I created subtask to add new regerssion test later because this fix is >>> urgent and I did not have time to prepare it. >>> >>> Thanks, >>> Vladimir >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From goetz.lindenmaier at sap.com Wed Sep 9 15:49:22 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 9 Sep 2020 15:49:22 +0000 Subject: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor enter/exit runtime entries. Message-ID: Hi, It is pointless to ask reviewers to judge the risk because reviews are only done if the change had to be adapted. There are complex changes that just apply clean and thus are downported without review. Judging the risk is clearly a thing of the downporter. This is formulated in Rule 1 of Oracles's Updates description : http://openjdk.java.net/projects/jdk-updates/approval.html , and also mentioned in step 6 of jdk11us description: https://wiki.openjdk.java.net/display/JDKUpdates/How+to+contribute+a+fix This is supposed to help the maintainer to decide about the risk. The major task of the review is to make sure the downported change is correct, i.e. has the same effect on 11 as the original one. Nevertheless, if there is a reviewer and he feels bad about a change, he should communicate his concerns! Best regards, Goetz. > -----Original Message----- > From: Andrew Dinn > Sent: Wednesday, September 9, 2020 11:38 AM > To: Andrew Haley ; Lindenmaier, Goetz > ; Doerr, Martin ; > 'Severin Gehwolf' ; 'hotspot-compiler- > dev at openjdk.java.net' ; jdk- > updates-dev at openjdk.java.net > Cc: Langer, Christoph > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > On 09/09/2020 09:23, Andrew Haley wrote: > > Thinking about this some more: it really should be a reviewer's job to > > object if a patch is likely to fail a risk-vs-reward test. Every patch > > should be considered carefully in this way. It's hard for > > inexperienced contributors to be able to make such judgements, so they > > should ask on the list if it isn't clear. > > I agree. Indeed, I have already been doing that for some of the more > complex patches. > > Another couple of things that should really be agreed between the > backporter and a reviewer when a patch is complex or has wider-reaching > side-effects than jst fixing the bug itself is whether 1) to limit the > backport to some subset of the upstream change or even 2) to come up > with a custom change that fixes the problem in a different way. That may > involve the backporter asking for an early review of a preliminary > patch. There's /nothing wrong/ with doing that. > > > But we can say this much: crashes and Java language specification > > failures will always qualify for fixes; performance improvements, > > especially small performance improvements, not so much. Compatibility > > bugs which break communications protocols must be fixed. Updates for > > new versions of communication protocols and new ciphers, probably. > > > > There's a wide grey area in between, it's true. > Yes, and that is where reviewers can and should be brought in to help > clarify things. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Wed Sep 9 16:04:50 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Sep 2020 17:04:50 +0100 Subject: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: Message-ID: <876ec4a1-0228-6c4d-b6eb-c8c67d442ef7@redhat.com> On 09/09/2020 16:49, Lindenmaier, Goetz wrote: > It is pointless to ask reviewers to judge the risk > because reviews are only done if the change had to be adapted. That's an excellent point. In such a case, approvals are the only time risk is considered. However, if a patch applies cleanly I suppose it's probably less risky. Although in some cases that in't true. > There are complex changes that just apply clean and thus are > downported without review. > > Judging the risk is clearly a thing of the downporter. This is > formulated in Rule 1 of Oracles's Updates description : > http://openjdk.java.net/projects/jdk-updates/approval.html , and > also mentioned in step 6 of jdk11us description: > https://wiki.openjdk.java.net/display/JDKUpdates/How+to+contribute+a+fix > This is supposed to help the maintainer to decide about the risk. Yes, of course the person doing the porting must consider the risk. But so should the person reviewing it. > The major task of the review is to make sure the downported change > is correct, i.e. has the same effect on 11 as the original one.# I disagree. Just like a review of a change to head, the reviewer must consider whether the justification put forward by the submitter is sufficient. > Nevertheless, if there is a reviewer and he feels bad about a > change, he should communicate his concerns! Yes, that should happen. It's everybody's duty, all the time, to consider risk. We owe our users no less than that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From joserz at linux.ibm.com Wed Sep 9 16:37:33 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Wed, 9 Sep 2020 13:37:33 -0300 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Message-ID: <20200909163733.GA422344@pacoca> Hello team! I'd like to backport the following patchset to 11u. It doesn't apply perfectly due to some positional changes and a copyright update. Please, let me know if you prefer another webrev addressing this backport. Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 Thank you very much, Jose R Ziviani From mchung at openjdk.java.net Wed Sep 9 16:39:20 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 9 Sep 2020 16:39:20 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Wed, 9 Sep 2020 08:18:11 GMT, Gilles Duboscq wrote: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java line 215: > 213: if (disableEagerInitialization) { > 214: try { > 215: return new ConstantCallSite(caller.findStaticGetter(innerClass, LAMBDA_INSTANCE_FIELD, > invokedType.returnType())); Nit: it'd be good to wrap this long line. There are a couple long lines in this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Wed Sep 9 16:43:45 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 9 Sep 2020 16:43:45 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Wed, 9 Sep 2020 08:18:11 GMT, Gilles Duboscq wrote: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. Looks good. I agree with Jan's suggestion that it's good to move the test to test/jdk/java/lang/invoke/lambda which is a better home for it. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Wed Sep 9 16:43:45 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 9 Sep 2020 16:43:45 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: <9ARL_A2daS8-nEhhporpJpuRtdJJz8XY1mwyH_i99I8=.c3c3df72-8039-4243-b8c6-bd5040aabe64@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> <9ARL_A2daS8-nEhhporpJpuRtdJJz8XY1mwyH_i99I8=.c3c3df72-8039-4243-b8c6-bd5040aabe64@github.com> Message-ID: On Wed, 9 Sep 2020 12:19:04 GMT, Jan Lahoda wrote: >> [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the >> jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda >> classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the >> context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of >> non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas >> are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely >> _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and >> ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance >> behaviour of lambdas in either direction. > > test/langtools/tools/javac/lambda/lambdaExpression/LambdaTest6.java line 29: > >> 27: * @summary Add lambda tests >> 28: * Test bridge methods for certain SAM conversions >> 29: * Test the set of generate fields > > I would suggest to consider having the test under test/jdk/(java/lang/invoke/lambda), not under > test/langtools/tools/javac. It's a good suggestion as `disableEagerInitialization` support is not part of javac. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From vladimir.kozlov at oracle.com Wed Sep 9 17:47:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 9 Sep 2020 10:47:13 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> Message-ID: Thank you, Vladimir, for comments and review. Regards, Vladimir K On 9/9/20 7:57 AM, Vladimir Ivanov wrote: > >>> Similar strict check for constant shift is needed in OrVNode::Ideal routine in vectornode.cpp. >> >> It took me some time to analyze your code for "lazy de-generation" of Rotate vectors. As I understand you want to >> preserve scalar optimization which creates Rotate nodes but have to revert it to keep vectorization of java code. > > Yes, the main motivation was to simplify the implementation: keep vectorization logic simple (just Rotate -> RotateV > transformation) and don't mess with matching if RotateV is not supported (expand ideal nodes instead of adding special > AD instructions). > >> Method degenerate_vector_rotate() has is_Con() check and, in general, it could be TOP because we do loop optimizations >> after vectorization. I added isa_int() check and treat 'cnt' in other case as variable to do transformation on 'else' >> branch and let sub-graph collapse there. >> I also refactor degenerate_vector_rotate() to make it compact. > > Good point. > >> Second, about OrVNode::Ideal(). I am not sure how safe it is without additional investigation because currently it is >> not executed. Based on comment it was added for VectorAPI which is experimental and not pushed yet. >> The code is convoluted and does not match scalar Or::Ideal() code. >> >> OrINode::Ideal() does next checks for left rotation: >> ?? if (Matcher::match_rule_supported(Op_RotateLeft) && >> ?????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && in(1)->in(1) == in(2)->in(1)) { >> >> but OrVNode::Ideal() does: >> ?? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, bt) && >> ?????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || >> >> Why it checks RIGHT operator for LShiftV???? >> >> And asserts are contradicting: >> ??? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV operand expected"); >> >> Was this code tested? My immediate reaction is simple delete it now and add reworked and tested version back with >> EnableVectorSupport flag check after VectorAPI is integrated. > > Sounds good. > > My feedback on OrVNode::Ideal() during 8248830 review was: > > "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight idealization, inferencing of vector rotate > through OrV idealization covers the vector patterns generated though non SLP route i.e. VectorAPI. > > I'm fine with keeping OrV::Ideal(), but I'm concerned with the general direction here - duplication of scalar > transformations to lane-wise vector operations. It definitely won't scale and in a longer run it risks to diverge. Would > be nice to find a way to automatically "lift" scalar transformations to vectors and apply them uniformly. But right now > it is just an idea which requires more experimentation." > >> Reworked version may use the same new rotate_shift() I added. I start rewriting it but since I can't test it and I am >> not sure may be edges are swapped indeed. I am suggesting to remove it. >> >> Also VectorAPI should use Rotate vectors from start which we can de-generation if not supported. So I am not sure how >> OrVNode::Ideal() will be usefull for VEctorAPI too. > > Though the API exposes rotation as a dedicated operation, users are free to code it explicitly with vector shifts and > vector or. Basically, the situation is similar to scalar case: there are dedicated methods available > (Long.rotateLeft/rotateRight), but users are free to code their own variant. And sometimes more general code shapes may > degenerate into rotates (as a result of other optimizations). > > And from Vector API implementation perspective, it is attractive to implement vector rotation purely in Java code as a > composition of vector shifts/or operations rather than using JVM intrinsic for it. > > So, there's a number of use cases when transformations on vector nodes becomes profitable. > >> http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> About testing. I see you used a lot of -128, 128 and similar values which are larger then bits in Java Integer and Long. >> But Java do masking of shift count by default before executing shift. >> I would prefer if something like 31 (or 63 for Long) were used instead. Otherwise Rotate vectors are not generated and >> tested. >> >> compiler/intrinsics/TestRotate.java calls verify() after each operation as result it is really hard to see generated >> assembler. I think we should at least exclude inlinining of verify(). >> >> I will work on tests and have an other update. >> >> Thanks, >> Vladimir K >> >> >>> >>> Regards, >>> Jatin >>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev On >>>> Behalf Of Vladimir Kozlov >>>> Sent: Friday, September 4, 2020 3:14 AM >>>> To: hotspot compiler >>>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>>> bool)+0x8b9 >>>> >>>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8252188 >>>> >>>> Code added by 8248830 [1] uses Node::is_Con() check when looking for >>>> constant shift values. >>>> Unfortunately it does not guarantee that it will be Integer constant >>>> because TOP node is also ConNode. >>>> I used C2 types to check and get shift values. I also refactor code to >>>> consolidate checks. >>>> >>>> Tested: tier1, hs-tier2, hs-tier3. >>>> Verified fix with replay file from bug report. >>>> I also checked that RotateBenchmark.java added by 8248830 still creates >>>> Rotate vectors after this fix. >>>> >>>> I created subtask to add new regerssion test later because this fix is >>>> urgent and I did not have time to prepare it. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From cjashfor at linux.ibm.com Wed Sep 9 19:13:55 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Wed, 9 Sep 2020 12:13:55 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> Message-ID: On 9/4/20 8:07 AM, Roger Riggs wrote: > Hi Corey, > > The idea I had in mind is refactoring the fast path into the method you > call decodeBlock. > Base64: lines 751-768. > > It leaves all the unknown/illegal character handling to the Java code. > And yes, it does not need to handle MIME, except to return on illegal > characters. > > The patch is attached. Ah, I see what you mean now, and thanks for the patch! The patch as presented doesn't work, however, because the intrinsic processes fewer bytes than are in the src buffer, and then executes a "continue;", which then proceeds to loop infinitely because the intrinsic won't process any more bytes after that. I tried dropping the continue, but that doesn't work because the Java (non-intrinsic) code processes all of the bytes, and the line of code following the loop accesses one byte after the end of the src buffer causing an array bounds error. So this needs to be re-thought a little, but it shouldn't be too difficult. I will work on it. Regards, - Corey > > Regards, Roger > > > > On 8/31/20 6:22 PM, Corey Ashford wrote: >> On 8/29/20 1:19 PM, Corey Ashford wrote: >>> Hi Roger, >>> >>> Thanks for your reply and thoughts!? Comments interspersed below: >>> >>> On 8/28/20 10:54 AM, Roger Riggs wrote: >> ... >>>> Comparing with the way that the Base64 encoder was intrinsified, the >>>> method that is intrinsified should have a method body that does >>>> the same function, so it is interchangable.? That likely will just >>>> shift >>>> the "fast path" code into the decodeBlock method. >>>> Keeping the symmetry between encoder and decoder will >>>> make it easier to maintain the code. >>> >>> Good point.? I'll investigate what this looks like in terms of the >>> actual code, and will report back (perhaps in a new webrev). >>> >> >> Having looked at this again, I don't think it makes sense.? One thing >> that differs significantly from the encodeBlock intrinsic is that the >> decodeBlock intrinsic only needs to process a prefix of the data, and >> so it can leave virtually any amount of data at the end of the src >> buffer unprocessed, where as with the encodeBlock intrinsic, if it >> exists, it must process the entire buffer. >> >> In the (common) case where the decodeBlock intrinsic returns not >> having processed everything, it still needs to call the Java code, and >> if that Java code is "replaced" by the intrinsic, it's inaccessible. >> >> Is there something I'm overlooking here?? Basically I want the decode >> API to behave differently than the encode API, mostly to make the >> arch-specific intrinsic easier to implement. If that's not acceptable, >> then I need to rethink the API, and also figure out how to deal with >> the illegal character case.? The latter could perhaps be done by >> throwing an exception from the intrinsic, or maybe by returning a >> negative length that specifies the index of the illegal src byte, and >> then have the Java code throw the exception). >> >> Regards, >> >> - Corey >> > From lutz.schmidt at sap.com Wed Sep 9 20:10:07 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 9 Sep 2020 20:10:07 +0000 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: <5955b523-9e75-7b6f-cbc0-21c1361f8754@oracle.com> References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> <5955b523-9e75-7b6f-cbc0-21c1361f8754@oracle.com> Message-ID: Hi Erik, thanks a lot for your review and a big thank you for your help and patience to get the matter forward and to a solution. Regards, Lutz ?On 08.09.20, 16:31, "Erik ?sterlund" wrote: Hi Lutz, This looks great, thanks for fixing this! Thanks, /Erik On 2020-08-26 17:20, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected. > > There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes: > > 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks. > > 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8219586 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/ > > This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. > > Thank you! > Lutz > From lutz.schmidt at sap.com Wed Sep 9 20:14:40 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 9 Sep 2020 20:14:40 +0000 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods In-Reply-To: <0e5868b3-6eb8-8036-a908-b3b5502cc6e2@oracle.com> References: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> <6316DC4C-A2FB-4B0C-93D9-61D2243EF369@sap.com> <0e5868b3-6eb8-8036-a908-b3b5502cc6e2@oracle.com> Message-ID: <70CE6C62-4AD3-4B7C-AAC3-BAB4A549E3D5@sap.com> Hi Tobias, thanks for the review! I'll post a pull request as soon as I can handle Git and Skara commands. Best Regards, Lutz ?On 08.09.20, 14:06, "Tobias Hartmann" wrote: Hi Lutz, On 07.09.20 14:40, Schmidt, Lutz wrote: > Hi Tobias, > > I used the brackets for "optical separation", to emphasize what was added. There is no "hidden purpose". > > The comment attempts to explain why these fields are first initialized with NULL/false, although they are filled with "more meaningful" data shortly after. Okay, thanks for the explanation. > I will remove the brackets and try to express my thoughts more clearly by rephrasing the comment. > > May I then regard this change as reviewed? Yes, looks good to me. Best regards, Tobias From Roger.Riggs at oracle.com Wed Sep 9 21:04:21 2020 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Wed, 9 Sep 2020 17:04:21 -0400 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> Message-ID: <196a4e58-0710-2f3e-6d1b-e78ab03a185d@oracle.com> Hi Corey, Right,? the continue was so it would go back and check if the conversion was complete.? An alternative would be to repeat the check and return if there was no bytes left to process. Thanks, Roger On 9/9/20 3:13 PM, Corey Ashford wrote: > On 9/4/20 8:07 AM, Roger Riggs wrote: >> Hi Corey, >> >> The idea I had in mind is refactoring the fast path into the method >> you call decodeBlock. >> Base64: lines 751-768. >> >> It leaves all the unknown/illegal character handling to the Java code. >> And yes, it does not need to handle MIME, except to return on illegal >> characters. >> >> The patch is attached. > > Ah, I see what you mean now, and thanks for the patch!? The patch as > presented doesn't work, however, because the intrinsic processes fewer > bytes than are in the src buffer, and then executes a "continue;", > which then proceeds to loop infinitely because the intrinsic won't > process any more bytes after that. > > I tried dropping the continue, but that doesn't work because the Java > (non-intrinsic) code processes all of the bytes, and the line of code > following the loop accesses one byte after the end of the src buffer > causing an array bounds error. > > So this needs to be re-thought a little, but it shouldn't be too > difficult.? I will work on it. > > Regards, > > - Corey > >> >> Regards, Roger >> >> >> >> On 8/31/20 6:22 PM, Corey Ashford wrote: >>> On 8/29/20 1:19 PM, Corey Ashford wrote: >>>> Hi Roger, >>>> >>>> Thanks for your reply and thoughts!? Comments interspersed below: >>>> >>>> On 8/28/20 10:54 AM, Roger Riggs wrote: >>> ... >>>>> Comparing with the way that the Base64 encoder was intrinsified, the >>>>> method that is intrinsified should have a method body that does >>>>> the same function, so it is interchangable.? That likely will just >>>>> shift >>>>> the "fast path" code into the decodeBlock method. >>>>> Keeping the symmetry between encoder and decoder will >>>>> make it easier to maintain the code. >>>> >>>> Good point.? I'll investigate what this looks like in terms of the >>>> actual code, and will report back (perhaps in a new webrev). >>>> >>> >>> Having looked at this again, I don't think it makes sense. One thing >>> that differs significantly from the encodeBlock intrinsic is that >>> the decodeBlock intrinsic only needs to process a prefix of the >>> data, and so it can leave virtually any amount of data at the end of >>> the src buffer unprocessed, where as with the encodeBlock intrinsic, >>> if it exists, it must process the entire buffer. >>> >>> In the (common) case where the decodeBlock intrinsic returns not >>> having processed everything, it still needs to call the Java code, >>> and if that Java code is "replaced" by the intrinsic, it's >>> inaccessible. >>> >>> Is there something I'm overlooking here?? Basically I want the >>> decode API to behave differently than the encode API, mostly to make >>> the arch-specific intrinsic easier to implement. If that's not >>> acceptable, then I need to rethink the API, and also figure out how >>> to deal with the illegal character case. The latter could perhaps be >>> done by throwing an exception from the intrinsic, or maybe by >>> returning a negative length that specifies the index of the illegal >>> src byte, and then have the Java code throw the exception). >>> >>> Regards, >>> >>> - Corey >>> >> > From cjashfor at linux.ibm.com Wed Sep 9 23:32:59 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Wed, 9 Sep 2020 16:32:59 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <196a4e58-0710-2f3e-6d1b-e78ab03a185d@oracle.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> <196a4e58-0710-2f3e-6d1b-e78ab03a185d@oracle.com> Message-ID: <91b1717e-f9f4-0b5c-d410-e25507206812@linux.ibm.com> On 9/9/20 2:04 PM, Roger Riggs wrote: > Hi Corey, > > Right,? the continue was so it would go back and check if the conversion > was > complete.? An alternative would be to repeat the check and return if > there was > no bytes left to process. Another issue I just discovered is that the way the loop is structured, decodeBlock could be called multiple times in the event that isMIME is true, and in that case, decodeBlock will try to write into dst[] starting at offset 0 again. My original intention was for the intrinsic to be called a single time because it never attempted process bytes in the isMIME==true case, and because of that, the offset into the destination buffer would always be zero. With this loop, on the second and later calls, the offset into dst[] should be non-zero. This means that I also need to pass dp into decodeBlock. That necessitates a change in the parameter passing down to the intrinsic. Not a big deal, but it is a ripple. I'll get working on it. The upside of this change is that it makes the decode and encode intrinsics closely mirror each other, and handles the isMIME==true case as a happy side-effect. With the overhead of the call to the intrinsic, it's not clear there will be a performance gain when isMIME==true, but a benchmark should make that clear. I'm guessing maybe 1.5X to 2X is about the best that could be expected when linemax is the default 76. - Corey > > Thanks, Roger > > On 9/9/20 3:13 PM, Corey Ashford wrote: >> On 9/4/20 8:07 AM, Roger Riggs wrote: >>> Hi Corey, >>> >>> The idea I had in mind is refactoring the fast path into the method >>> you call decodeBlock. >>> Base64: lines 751-768. >>> >>> It leaves all the unknown/illegal character handling to the Java code. >>> And yes, it does not need to handle MIME, except to return on illegal >>> characters. >>> >>> The patch is attached. >> >> Ah, I see what you mean now, and thanks for the patch!? The patch as >> presented doesn't work, however, because the intrinsic processes fewer >> bytes than are in the src buffer, and then executes a "continue;", >> which then proceeds to loop infinitely because the intrinsic won't >> process any more bytes after that. >> >> I tried dropping the continue, but that doesn't work because the Java >> (non-intrinsic) code processes all of the bytes, and the line of code >> following the loop accesses one byte after the end of the src buffer >> causing an array bounds error. >> >> So this needs to be re-thought a little, but it shouldn't be too >> difficult.? I will work on it. >> >> Regards, >> >> - Corey >> >>> >>> Regards, Roger >>> >>> >>> >>> On 8/31/20 6:22 PM, Corey Ashford wrote: >>>> On 8/29/20 1:19 PM, Corey Ashford wrote: >>>>> Hi Roger, >>>>> >>>>> Thanks for your reply and thoughts!? Comments interspersed below: >>>>> >>>>> On 8/28/20 10:54 AM, Roger Riggs wrote: >>>> ... >>>>>> Comparing with the way that the Base64 encoder was intrinsified, the >>>>>> method that is intrinsified should have a method body that does >>>>>> the same function, so it is interchangable.? That likely will just >>>>>> shift >>>>>> the "fast path" code into the decodeBlock method. >>>>>> Keeping the symmetry between encoder and decoder will >>>>>> make it easier to maintain the code. >>>>> >>>>> Good point.? I'll investigate what this looks like in terms of the >>>>> actual code, and will report back (perhaps in a new webrev). >>>>> >>>> >>>> Having looked at this again, I don't think it makes sense. One thing >>>> that differs significantly from the encodeBlock intrinsic is that >>>> the decodeBlock intrinsic only needs to process a prefix of the >>>> data, and so it can leave virtually any amount of data at the end of >>>> the src buffer unprocessed, where as with the encodeBlock intrinsic, >>>> if it exists, it must process the entire buffer. >>>> >>>> In the (common) case where the decodeBlock intrinsic returns not >>>> having processed everything, it still needs to call the Java code, >>>> and if that Java code is "replaced" by the intrinsic, it's >>>> inaccessible. >>>> >>>> Is there something I'm overlooking here?? Basically I want the >>>> decode API to behave differently than the encode API, mostly to make >>>> the arch-specific intrinsic easier to implement. If that's not >>>> acceptable, then I need to rethink the API, and also figure out how >>>> to deal with the illegal character case. The latter could perhaps be >>>> done by throwing an exception from the intrinsic, or maybe by >>>> returning a negative length that specifies the index of the illegal >>>> src byte, and then have the Java code throw the exception). >>>> >>>> Regards, >>>> >>>> - Corey >>>> >>> >> > From shade at openjdk.java.net Thu Sep 10 05:18:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Sep 2020 05:18:04 GMT Subject: RFR: 8252778: remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test In-Reply-To: References: Message-ID: On Sun, 6 Sep 2020 16:30:29 GMT, Igor Ignatyev wrote: > pre-Skara RFR [thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039878.html) > > Hi all, > > could you please review this small and trivial cleanup? > > from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252778): >> `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the >> current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st >> argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" > > testing: compiler/c2/stemmer on {linux,windows,macos}-x64 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/33 From shade at openjdk.java.net Thu Sep 10 05:21:45 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Sep 2020 05:21:45 GMT Subject: RFR: 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests In-Reply-To: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> References: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> Message-ID: <_e2WO-GFurAiAe54CIZ-VP_hY4cYq5kPUhkHuZuDBJY=.83a03e4e-21aa-4799-8135-1693e8db5831@github.com> On Sun, 6 Sep 2020 16:37:47 GMT, Igor Ignatyev wrote: > [pre-Skara RFR](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039874.html) > Hi all, > > could you please review this small and trivial clean up in `test/hotspot/jtreg/compiler/graalunit`? > from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252774): >> `test/hotspot/jtreg/compiler/graalunit` tests use `jdk.test.lib.FileInstaller` to copy `ProblemList-graal.txt` from >> `test/hotspot/jtreg/` to the current working directory as `ExcludeList.txt`, and then run >> `compiler.graalunit.common.GraalUnitTestLauncher` w/ `-exclude ExcludeList.txt`. > > `j.t.l.FileInstaller` actions aren't needed as `c.g.c.GraalUnitTestLauncher` interpeters `-exclude`'s value as path to > file (as oppose to the file name in current directory), so we can use `${test.root}/ProblemList-graal.txt` instead of > `ExcludeList.txt` there. > the patch modifies `generateTests.sh` to use `${test.root}/ProblemList-graal.txt`, cleans it up (removes trailing > spaces, empty `@summary` tag, and redundant explicit `@build`) and regenerates graalunit tests. > testing: `test/hotspot/jtreg/compiler/graalunit` on {linux,windows,macos}-x64 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/34 From jatin.bhateja at intel.com Thu Sep 10 05:37:09 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 10 Sep 2020 05:37:09 +0000 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> Message-ID: Hi VladimirK, Removing OrVNode::Ideal looks correct, it was added to cover vector API use case. Regards, Jatin > -----Original Message----- > From: hotspot-compiler-dev On > Behalf Of Vladimir Kozlov > Sent: Wednesday, September 9, 2020 11:17 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, > bool)+0x8b9 > > Thank you, Vladimir, for comments and review. > > Regards, > Vladimir K > > On 9/9/20 7:57 AM, Vladimir Ivanov wrote: > > > >>> Similar strict check for constant shift is needed in OrVNode::Ideal > routine in vectornode.cpp. > >> > >> It took me some time to analyze your code for "lazy de-generation" of > >> Rotate vectors. As I understand you want to preserve scalar optimization > which creates Rotate nodes but have to revert it to keep vectorization of > java code. > > > > Yes, the main motivation was to simplify the implementation: keep > > vectorization logic simple (just Rotate -> RotateV > > transformation) and don't mess with matching if RotateV is not > > supported (expand ideal nodes instead of adding special AD instructions). > > > >> Method degenerate_vector_rotate() has is_Con() check and, in general, > >> it could be TOP because we do loop optimizations after vectorization. I > added isa_int() check and treat 'cnt' in other case as variable to do > transformation on 'else' > >> branch and let sub-graph collapse there. > >> I also refactor degenerate_vector_rotate() to make it compact. > > > > Good point. > > > >> Second, about OrVNode::Ideal(). I am not sure how safe it is without > >> additional investigation because currently it is not executed. Based on > comment it was added for VectorAPI which is experimental and not pushed > yet. > >> The code is convoluted and does not match scalar Or::Ideal() code. > >> > >> OrINode::Ideal() does next checks for left rotation: > >> ?? if (Matcher::match_rule_supported(Op_RotateLeft) && > >> ?????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && > >> in(1)->in(1) == in(2)->in(1)) { > >> > >> but OrVNode::Ideal() does: > >> ?? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, > >> bt) && > >> ?????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || > >> > >> Why it checks RIGHT operator for LShiftV???? > >> > >> And asserts are contradicting: > >> ??? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV > >> operand expected"); > >> > >> Was this code tested? My immediate reaction is simple delete it now > >> and add reworked and tested version back with EnableVectorSupport flag > check after VectorAPI is integrated. > > > > Sounds good. > > > > My feedback on OrVNode::Ideal() during 8248830 review was: > > > > "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight > > idealization, inferencing of vector rotate through OrV idealization > covers the vector patterns generated though non SLP route i.e. VectorAPI. > > > > I'm fine with keeping OrV::Ideal(), but I'm concerned with the general > > direction here - duplication of scalar transformations to lane-wise > > vector operations. It definitely won't scale and in a longer run it > > risks to diverge. Would be nice to find a way to automatically "lift" > scalar transformations to vectors and apply them uniformly. But right now > it is just an idea which requires more experimentation." > > > >> Reworked version may use the same new rotate_shift() I added. I start > >> rewriting it but since I can't test it and I am not sure may be edges > are swapped indeed. I am suggesting to remove it. > >> > >> Also VectorAPI should use Rotate vectors from start which we can > >> de-generation if not supported. So I am not sure how > >> OrVNode::Ideal() will be usefull for VEctorAPI too. > > > > Though the API exposes rotation as a dedicated operation, users are > > free to code it explicitly with vector shifts and vector or. > > Basically, the situation is similar to scalar case: there are > > dedicated methods available (Long.rotateLeft/rotateRight), but users are > free to code their own variant. And sometimes more general code shapes may > degenerate into rotates (as a result of other optimizations). > > > > And from Vector API implementation perspective, it is attractive to > > implement vector rotation purely in Java code as a composition of vector > shifts/or operations rather than using JVM intrinsic for it. > > > > So, there's a number of use cases when transformations on vector nodes > becomes profitable. > > > >> http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ > > > > Looks good. > > > > Best regards, > > Vladimir Ivanov > > > >> > >> About testing. I see you used a lot of -128, 128 and similar values > which are larger then bits in Java Integer and Long. > >> But Java do masking of shift count by default before executing shift. > >> I would prefer if something like 31 (or 63 for Long) were used > >> instead. Otherwise Rotate vectors are not generated and tested. > >> > >> compiler/intrinsics/TestRotate.java calls verify() after each > >> operation as result it is really hard to see generated assembler. I > think we should at least exclude inlinining of verify(). > >> > >> I will work on tests and have an other update. > >> > >> Thanks, > >> Vladimir K > >> > >> > >>> > >>> Regards, > >>> Jatin > >>> > >>>> -----Original Message----- > >>>> From: hotspot-compiler-dev > >>>> On Behalf Of Vladimir > >>>> Kozlov > >>>> Sent: Friday, September 4, 2020 3:14 AM > >>>> To: hotspot compiler > >>>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, > >>>> bool)+0x8b9 > >>>> > >>>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ > >>>> https://bugs.openjdk.java.net/browse/JDK-8252188 > >>>> > >>>> Code added by 8248830 [1] uses Node::is_Con() check when looking > >>>> for constant shift values. > >>>> Unfortunately it does not guarantee that it will be Integer > >>>> constant because TOP node is also ConNode. > >>>> I used C2 types to check and get shift values. I also refactor code > >>>> to consolidate checks. > >>>> > >>>> Tested: tier1, hs-tier2, hs-tier3. > >>>> Verified fix with replay file from bug report. > >>>> I also checked that RotateBenchmark.java added by 8248830 still > >>>> creates Rotate vectors after this fix. > >>>> > >>>> I created subtask to add new regerssion test later because this fix > >>>> is urgent and I did not have time to prepare it. > >>>> > >>>> Thanks, > >>>> Vladimir > >>>> > >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From tobias.hartmann at oracle.com Thu Sep 10 05:45:32 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Sep 2020 07:45:32 +0200 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> Message-ID: +1 Best regards, Tobias On 10.09.20 07:37, Bhateja, Jatin wrote: > Hi VladimirK, > Removing OrVNode::Ideal looks correct, it was added to cover vector API use case. > > Regards, > Jatin > >> -----Original Message----- >> From: hotspot-compiler-dev On >> Behalf Of Vladimir Kozlov >> Sent: Wednesday, September 9, 2020 11:17 PM >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >> bool)+0x8b9 >> >> Thank you, Vladimir, for comments and review. >> >> Regards, >> Vladimir K >> >> On 9/9/20 7:57 AM, Vladimir Ivanov wrote: >>> >>>>> Similar strict check for constant shift is needed in OrVNode::Ideal >> routine in vectornode.cpp. >>>> >>>> It took me some time to analyze your code for "lazy de-generation" of >>>> Rotate vectors. As I understand you want to preserve scalar optimization >> which creates Rotate nodes but have to revert it to keep vectorization of >> java code. >>> >>> Yes, the main motivation was to simplify the implementation: keep >>> vectorization logic simple (just Rotate -> RotateV >>> transformation) and don't mess with matching if RotateV is not >>> supported (expand ideal nodes instead of adding special AD instructions). >>> >>>> Method degenerate_vector_rotate() has is_Con() check and, in general, >>>> it could be TOP because we do loop optimizations after vectorization. I >> added isa_int() check and treat 'cnt' in other case as variable to do >> transformation on 'else' >>>> branch and let sub-graph collapse there. >>>> I also refactor degenerate_vector_rotate() to make it compact. >>> >>> Good point. >>> >>>> Second, about OrVNode::Ideal(). I am not sure how safe it is without >>>> additional investigation because currently it is not executed. Based on >> comment it was added for VectorAPI which is experimental and not pushed >> yet. >>>> The code is convoluted and does not match scalar Or::Ideal() code. >>>> >>>> OrINode::Ideal() does next checks for left rotation: >>>> ?? if (Matcher::match_rule_supported(Op_RotateLeft) && >>>> ?????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && >>>> in(1)->in(1) == in(2)->in(1)) { >>>> >>>> but OrVNode::Ideal() does: >>>> ?? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, >>>> bt) && >>>> ?????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || >>>> >>>> Why it checks RIGHT operator for LShiftV???? >>>> >>>> And asserts are contradicting: >>>> ??? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV >>>> operand expected"); >>>> >>>> Was this code tested? My immediate reaction is simple delete it now >>>> and add reworked and tested version back with EnableVectorSupport flag >> check after VectorAPI is integrated. >>> >>> Sounds good. >>> >>> My feedback on OrVNode::Ideal() during 8248830 review was: >>> >>> "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >>> idealization, inferencing of vector rotate through OrV idealization >> covers the vector patterns generated though non SLP route i.e. VectorAPI. >>> >>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the general >>> direction here - duplication of scalar transformations to lane-wise >>> vector operations. It definitely won't scale and in a longer run it >>> risks to diverge. Would be nice to find a way to automatically "lift" >> scalar transformations to vectors and apply them uniformly. But right now >> it is just an idea which requires more experimentation." >>> >>>> Reworked version may use the same new rotate_shift() I added. I start >>>> rewriting it but since I can't test it and I am not sure may be edges >> are swapped indeed. I am suggesting to remove it. >>>> >>>> Also VectorAPI should use Rotate vectors from start which we can >>>> de-generation if not supported. So I am not sure how >>>> OrVNode::Ideal() will be usefull for VEctorAPI too. >>> >>> Though the API exposes rotation as a dedicated operation, users are >>> free to code it explicitly with vector shifts and vector or. >>> Basically, the situation is similar to scalar case: there are >>> dedicated methods available (Long.rotateLeft/rotateRight), but users are >> free to code their own variant. And sometimes more general code shapes may >> degenerate into rotates (as a result of other optimizations). >>> >>> And from Vector API implementation perspective, it is attractive to >>> implement vector rotation purely in Java code as a composition of vector >> shifts/or operations rather than using JVM intrinsic for it. >>> >>> So, there's a number of use cases when transformations on vector nodes >> becomes profitable. >>> >>>> http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ >>> >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> About testing. I see you used a lot of -128, 128 and similar values >> which are larger then bits in Java Integer and Long. >>>> But Java do masking of shift count by default before executing shift. >>>> I would prefer if something like 31 (or 63 for Long) were used >>>> instead. Otherwise Rotate vectors are not generated and tested. >>>> >>>> compiler/intrinsics/TestRotate.java calls verify() after each >>>> operation as result it is really hard to see generated assembler. I >> think we should at least exclude inlinining of verify(). >>>> >>>> I will work on tests and have an other update. >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>> >>>>> >>>>> Regards, >>>>> Jatin >>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-compiler-dev >>>>>> On Behalf Of Vladimir >>>>>> Kozlov >>>>>> Sent: Friday, September 4, 2020 3:14 AM >>>>>> To: hotspot compiler >>>>>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>>>>> bool)+0x8b9 >>>>>> >>>>>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8252188 >>>>>> >>>>>> Code added by 8248830 [1] uses Node::is_Con() check when looking >>>>>> for constant shift values. >>>>>> Unfortunately it does not guarantee that it will be Integer >>>>>> constant because TOP node is also ConNode. >>>>>> I used C2 types to check and get shift values. I also refactor code >>>>>> to consolidate checks. >>>>>> >>>>>> Tested: tier1, hs-tier2, hs-tier3. >>>>>> Verified fix with replay file from bug report. >>>>>> I also checked that RotateBenchmark.java added by 8248830 still >>>>>> creates Rotate vectors after this fix. >>>>>> >>>>>> I created subtask to add new regerssion test later because this fix >>>>>> is urgent and I did not have time to prepare it. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From goetz at openjdk.java.net Thu Sep 10 07:55:50 2020 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Thu, 10 Sep 2020 07:55:50 GMT Subject: RFR: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= Message-ID: ?pport" Hi, this change fixes the issues after 8231441. The VM crashes right in the build after this change. The fix is trivial. Please review. ------------- Commit messages: - 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend support" Changes: https://git.openjdk.java.net/jdk/pull/105/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=105&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252846 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/105.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/105/head:pull/105 PR: https://git.openjdk.java.net/jdk/pull/105 From shade at openjdk.java.net Thu Sep 10 08:01:47 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Sep 2020 08:01:47 GMT Subject: RFR: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 07:48:54 GMT, Goetz Lindenmaier wrote: > ?pport" > > Hi, > > this change fixes the issues after 8231441. The VM crashes right in the build after this change. > The fix is trivial. > > Please review. Marked as reviewed by shade (Reviewer). src/hotspot/share/opto/type.cpp line 65: > 63: > 64: #if defined(PPC64) > 65: { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // > VectorA. I think trailing "." in the comment is unnecessary. Please consider dropping before push. src/hotspot/share/opto/type.cpp line 72: > 70: { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // > VectorZ 71: #elif defined(S390) > 72: { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // > VectorA. I think trailing "." in the comment is unnecessary. Please consider dropping before push. ------------- PR: https://git.openjdk.java.net/jdk/pull/105 From shade at openjdk.java.net Thu Sep 10 08:17:53 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Sep 2020 08:17:53 GMT Subject: RFR: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 07:48:54 GMT, Goetz Lindenmaier wrote: > ?pport" > > Hi, > > this change fixes the issues after 8231441. The VM crashes right in the build after this change. > The fix is trivial. > > Please review. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/105 From goetz at openjdk.java.net Thu Sep 10 08:17:55 2020 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Thu, 10 Sep 2020 08:17:55 GMT Subject: RFR: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 07:58:58 GMT, Aleksey Shipilev wrote: >> ?pport" >> >> Hi, >> >> this change fixes the issues after 8231441. The VM crashes right in the build after this change. >> The fix is trivial. >> >> Please review. > > src/hotspot/share/opto/type.cpp line 65: > >> 63: >> 64: #if defined(PPC64) >> 65: { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // >> VectorA. > > I think trailing "." in the comment is unnecessary. Please consider dropping before push. Hi Aleksey, I would not have put a dot there, but I just copied the line from the #else part which also has the dot. So I'd rather leave it as-is. Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/105 From shade at openjdk.java.net Thu Sep 10 08:17:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Sep 2020 08:17:55 GMT Subject: RFR: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:13:17 GMT, Goetz Lindenmaier wrote: >> src/hotspot/share/opto/type.cpp line 65: >> >>> 63: >>> 64: #if defined(PPC64) >>> 65: { Bad, T_ILLEGAL, "vectora:", false, Op_VecA, relocInfo::none }, // >>> VectorA. >> >> I think trailing "." in the comment is unnecessary. Please consider dropping before push. > > Hi Aleksey, I would not have put a dot there, but I just copied the line from the #else part which also has the dot. > So I'd rather leave it as-is. > Thanks for reviewing! Right. No biggie, ship it. ------------- PR: https://git.openjdk.java.net/jdk/pull/105 From goetz at openjdk.java.net Thu Sep 10 09:19:24 2020 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Thu, 10 Sep 2020 09:19:24 GMT Subject: Integrated: 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend =?UTF-8?B?c3XigKY=?= In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 07:48:54 GMT, Goetz Lindenmaier wrote: > ?pport" > > Hi, > > this change fixes the issues after 8231441. The VM crashes right in the build after this change. > The fix is trivial. > > Please review. This pull request has now been integrated. Changeset: 7ccf4358 Author: Goetz Lindenmaier URL: https://git.openjdk.java.net/jdk/commit/7ccf4358 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8252846: Fix ppc/s390 after "8231441: AArch64: Initial SVE backend su? Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/105 From adinn at redhat.com Thu Sep 10 09:46:26 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 10 Sep 2020 10:46:26 +0100 Subject: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: Message-ID: <835d5832-04a6-cbf5-7a7f-6495abf8658f@redhat.com> On 09/09/2020 16:49, Lindenmaier, Goetz wrote: > It is pointless to ask reviewers to judge the risk > because reviews are only done if the change had to be adapted. > There are complex changes that just apply clean and thus are > downported without review. I think you have very much overstated your case in that opening line. If a patch needs adapting in anything other than a trivial manner then it cannot be a bad thing for the reviewer to consider the risk involved. That benefit cannot be removed by the fact that some changes bypass the review process. It might be diminished, of course. But by the same token, if a complex patch does not need adapting then it might still be a good thing for the maintainer who is going to approve or disapprove it to consider the risk. Indeed, I hope that this consideration is foremost in the maintainers' minds. Either way, it's far from /pointless/ for reviewers or maintainers to consider risks and where they think it important negotiate the details with the downporter. Likewise if the downporter is unsure then they really ought to initiate such a dialogue. Such actiuon might well be redundant in many cases but I am sure those will be easy to spot. > Judging the risk is clearly a thing of the downporter. This is formulated > in Rule 1 of Oracles's Updates > description : http://openjdk.java.net/projects/jdk-updates/approval.html , > and also mentioned in step 6 of jdk11us description: > https://wiki.openjdk.java.net/display/JDKUpdates/How+to+contribute+a+fix > This is supposed to help the maintainer to decide about the risk. That would be fine if every downporter was happy and able to make the assessment. However, there are definitely going to be cases where that is not so. I'm not sure why you are stating the requirements here in such absolute terms. Note that in my previous post I did not recommend that /all/ responsibility be shifted to the reviewer or maintainer. I recommended that reviewers consider helping downporters with complex backports, including those that might benefit from trimming or re-implementing a fix. > The major task of the review is to make sure the downported change > is correct, i.e. has the same effect on 11 as the original one. > Nevertheless, if there is a reviewer and he feels bad about a > change, he should communicate his concerns! I think it is misguided to separate the decision as to correctness from concerns over the risk associated with a change. My experience is that cases where it is hard to foresee all the consequences of downporting a change are far from rare. For a lot of complex changes -- and even some simple ones -- a 100% foolproof assessment of correctness is never going to be available. In such cases the risk of breaking something really ought to be factored in by the reviewer because s/he can only speak to the /likely/ correctness (as opposed to /absolute/ correctness) of the patch. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From martin.doerr at sap.com Thu Sep 10 09:54:04 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 10 Sep 2020 09:54:04 +0000 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200909163733.GA422344@pacoca> References: <20200909163733.GA422344@pacoca> Message-ID: Hi Jose, if manual adaptation/integration is needed we also need to review the backport webrev (except trivial differences like in copyright headers). Best regards, Martin > -----Original Message----- > From: jdk-updates-dev On > Behalf Of joserz at linux.ibm.com > Sent: Mittwoch, 9. September 2020 18:38 > To: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > dev at openjdk.java.net > Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new > byte-reverse instructions > > Hello team! > > I'd like to backport the following patchset to 11u. It doesn't apply perfectly > due to some positional changes and a copyright update. > Please, let me know if you prefer another webrev addressing this backport. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thank you very much, > > Jose R Ziviani From gdub at openjdk.java.net Thu Sep 10 11:52:47 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Thu, 10 Sep 2020 11:52:47 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> <9ARL_A2daS8-nEhhporpJpuRtdJJz8XY1mwyH_i99I8=.c3c3df72-8039-4243-b8c6-bd5040aabe64@github.com> Message-ID: On Wed, 9 Sep 2020 16:39:25 GMT, Mandy Chung wrote: >> test/langtools/tools/javac/lambda/lambdaExpression/LambdaTest6.java line 29: >> >>> 27: * @summary Add lambda tests >>> 28: * Test bridge methods for certain SAM conversions >>> 29: * Test the set of generate fields >> >> I would suggest to consider having the test under test/jdk/(java/lang/invoke/lambda), not under >> test/langtools/tools/javac. > > It's a good suggestion as `disableEagerInitialization` support is not part of javac. OK makes sense. I guess it's still good to clean the test comments of the old `disableEagerInitialization` references? ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From enikitin at openjdk.java.net Thu Sep 10 12:26:33 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 10 Sep 2020 12:26:33 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures Message-ID: pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) Error reporting was improved by writing a C-style escaped string representations for the variables passed to the methods being tested. For array comparisons, a dedicated diff-formatter was implemented. Sample output for comparing byte arrays (with artificial failure): ----------System.err:(21/1553)---------- Result: (false) of 'arrayEqualsB' is not equal to expected (true) Arrays differ starting from [index: 7]: ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... ^^^^ java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not equal to expected (true) at compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) at ... stack trace continues - E.N. Sample output for comparing char arrays: ----------System.err:(21/1579)*---------- Result: (false) of 'arrayEqualsC' is not equal to expected (true) Arrays differ starting from [index: 7]: ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... ^^^^^^^ java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not equal to expected (true) at compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) at ... and so on - E.N. testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. ------------- Commit messages: - 8229186: Improve error messages for TestStringIntrinsics failures Changes: https://git.openjdk.java.net/jdk/pull/112/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8229186 Stats: 1078 lines in 6 files changed: 1066 ins; 1 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Thu Sep 10 12:31:49 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 10 Sep 2020 12:31:49 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures In-Reply-To: References: Message-ID: <8n8KvhWjyaRXASTq32-P5KXojlOFB2-JIvPTu88oJRg=.b32b5bd1-5ddc-43a6-bcf0-18b1bc24de26@github.com> On Thu, 10 Sep 2020 12:20:05 GMT, Evgeny Nikitin wrote: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Responding to the comments from pre-Skara thread: > test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java: > I'd prefer invokeAndCompareArrays and invokeAndCheck to be as close as possible: have both of them to accept either > boolean or Object as 2nd arg; print/throw the same error message the invokeAndCheck is very generic, it can be called with different objects and expect any kind of result, not only boolean. Therefore its output format radically differs from what an array-comparator should show. > maybe I'm missing smth, but I don't understand why ArrayCodec supports only char and byte arrays; and hence I don't > understand why you need ArrayCodec::of methods, as you can simply do new > ArrayCoded(Arrays.stream(a).collect(Collectors.toList()) where a is an array of any type for Object arrays, one can use that. for integer primitives one needs Arrays.stream(a).boxed.collect(Collectors.toList()), please note 'boxed' - it is required and not generic. for bytes or chars, there is none (no overload methos in the Arrays.stream(a)); To sum up, I can't see how with the given type system and utilities set I can make in a better, less wordy way. I've added int and long overloads, support for String and Object arrays to make it more complete. > it seems that ArrayCodec should be an inner static class of ArrayDiff I would argue that - I find it useful for printing arrays (and this usage has been utilised in the TestStringIntrinsics.java). In addition, I dont' like the practice of making such huge classes an inner classes as this reduces readability and modularity. Other issues have been fixed. I added support for int, long, Object and String arrays. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From iignatyev at openjdk.java.net Thu Sep 10 13:25:32 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 10 Sep 2020 13:25:32 GMT Subject: Integrated: 8252778: remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test In-Reply-To: References: Message-ID: On Sun, 6 Sep 2020 16:30:29 GMT, Igor Ignatyev wrote: > pre-Skara RFR [thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039878.html) > > Hi all, > > could you please review this small and trivial cleanup? > > from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252778): >> `compiler/c2/stemmer` test uses `jdk.test.lib.FileInstaller` to copy "words" file from the test source directory to the >> current working directory, `compiler.c2.stemmer.Stemmer` can read this file. yet, `c.c.s.Stemmer` class treats its 1st >> argument as a path to the file, given this isn't needed and we can pass "${test.src}/words" instead of "words" > > testing: compiler/c2/stemmer on {linux,windows,macos}-x64 This pull request has now been integrated. Changeset: 5b30a831 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/5b30a831 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8252778: remove jdk.test.lib.FileInstaller action from compiler/c2/stemmer test Reviewed-by: shade, epavlova ------------- PR: https://git.openjdk.java.net/jdk/pull/33 From iignatyev at openjdk.java.net Thu Sep 10 13:28:41 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 10 Sep 2020 13:28:41 GMT Subject: Integrated: 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests In-Reply-To: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> References: <_i2vOeOxRNWdga2jASmgHTwNYz0H3fLX-xdomiBOyG0=.3230004b-7bc1-4d36-b81b-152671890b40@github.com> Message-ID: On Sun, 6 Sep 2020 16:37:47 GMT, Igor Ignatyev wrote: > [pre-Skara RFR](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039874.html) > Hi all, > > could you please review this small and trivial clean up in `test/hotspot/jtreg/compiler/graalunit`? > from [JBS](https://bugs.openjdk.java.net/browse/JDK-8252774): >> `test/hotspot/jtreg/compiler/graalunit` tests use `jdk.test.lib.FileInstaller` to copy `ProblemList-graal.txt` from >> `test/hotspot/jtreg/` to the current working directory as `ExcludeList.txt`, and then run >> `compiler.graalunit.common.GraalUnitTestLauncher` w/ `-exclude ExcludeList.txt`. > > `j.t.l.FileInstaller` actions aren't needed as `c.g.c.GraalUnitTestLauncher` interpeters `-exclude`'s value as path to > file (as oppose to the file name in current directory), so we can use `${test.root}/ProblemList-graal.txt` instead of > `ExcludeList.txt` there. > the patch modifies `generateTests.sh` to use `${test.root}/ProblemList-graal.txt`, cleans it up (removes trailing > spaces, empty `@summary` tag, and redundant explicit `@build`) and regenerates graalunit tests. > testing: `test/hotspot/jtreg/compiler/graalunit` on {linux,windows,macos}-x64 This pull request has now been integrated. Changeset: 41d29b75 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/41d29b75 Stats: 341 lines in 48 files changed: 240 ins; 0 del; 101 mod 8252774: remove jdk.test.lib.FileInstaller action from graalunit tests Reviewed-by: shade, epavlova ------------- PR: https://git.openjdk.java.net/jdk/pull/34 From dnsimon at openjdk.java.net Thu Sep 10 14:14:05 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 10 Sep 2020 14:14:05 GMT Subject: RFR: 8252898: remove bulk registration of JFR CompilerPhaseType names In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 15:07:04 GMT, Doug Simon wrote: > The changes made in [JDK-8193210](https://bugs.openjdk.java.net/browse/JDK-8193210) support only [bulk > registration](https://github.com/openjdk/jdk/blob/4e6a4af1866d0007d368b78bf78b6a8e1c8be425/src/hotspot/share/compiler/compilerEvent.hpp#L75) > of compiler phase names with JFR. However, Graal only registers compiler phase names upon first execution of the phase > since the set of phases is not known during VM initialization. This means registration of a Graal phase name needs to > do unnecessary work, wrapping a single name into an array to conform to the bulk registration API. This pull request > updates the registration API to be in terms of a registering a single phase name. @jamsheedcm could you please have a look at this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/77 From jvernee at openjdk.java.net Thu Sep 10 15:13:13 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 10 Sep 2020 15:13:13 GMT Subject: RFR: 8253002: Remove the unused SafePointNode::_oop_map field Message-ID: Hi, I've been looking a lot at the code for generating oop maps for call nodes lately, and noticed that SafePointNode had an oopMap field that was unused (which led to some confusion as to where the oop map was actually set). The oop map is instead generated and set in buildOopMap OopFlow::compute_reach after matching: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/buildOopMap.cpp#L122 So the field on the ideal node is unused. This patch removes the field and cleans up related code. I've left a comment in SafePointNode to point people looking for the oop map at buildOopMap.cpp Thanks, Jorn Testing: local build + tier1,tier2,tier3 ------------- Commit messages: - Remove the unused SafePointNode::_oop_map field Changes: https://git.openjdk.java.net/jdk/pull/109/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=109&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253002 Stats: 8 lines in 2 files changed: 1 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/109/head:pull/109 PR: https://git.openjdk.java.net/jdk/pull/109 From gdub at openjdk.java.net Thu Sep 10 15:23:14 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Thu, 10 Sep 2020 15:23:14 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v2] In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. Gilles Duboscq has updated the pull request incrementally with three additional commits since the last revision: - Remove extra field test from LambdaTest6 - Wrap long lines - Add deciated test in the jdk tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/93/files - new: https://git.openjdk.java.net/jdk/pull/93/files/979186b8..422cd01d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=00-01 Stats: 111 lines in 3 files changed: 76 ins; 32 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/93.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/93/head:pull/93 PR: https://git.openjdk.java.net/jdk/pull/93 From gdub at openjdk.java.net Thu Sep 10 15:23:16 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Thu, 10 Sep 2020 15:23:16 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v2] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Wed, 9 Sep 2020 16:36:43 GMT, Mandy Chung wrote: >> Gilles Duboscq has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove extra field test from LambdaTest6 >> - Wrap long lines >> - Add deciated test in the jdk tests > > src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java line 215: > >> 213: if (disableEagerInitialization) { >> 214: try { >> 215: return new ConstantCallSite(caller.findStaticGetter(innerClass, LAMBDA_INSTANCE_FIELD, >> invokedType.returnType())); > > Nit: it'd be good to wrap this long line. There are a couple long lines in this patch. I have wrapped some lines that were longer than the typical line in this file. Let me know if the wrapping looks good to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From gdub at openjdk.java.net Thu Sep 10 15:23:17 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Thu, 10 Sep 2020 15:23:17 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v2] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> <9ARL_A2daS8-nEhhporpJpuRtdJJz8XY1mwyH_i99I8=.c3c3df72-8039-4243-b8c6-bd5040aabe64@github.com> Message-ID: On Thu, 10 Sep 2020 11:50:04 GMT, Gilles Duboscq wrote: >> It's a good suggestion as `disableEagerInitialization` support is not part of javac. > > OK makes sense. I guess it's still good to clean the test comments of the old `disableEagerInitialization` references? I have created a new test under `test/jdk/java/lang/invoke/lambda` and cleaned up `LambdaTest6.java` ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Thu Sep 10 16:36:51 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 10 Sep 2020 16:36:51 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v2] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Thu, 10 Sep 2020 15:23:14 GMT, Gilles Duboscq wrote: >> [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the >> jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda >> classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the >> context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of >> non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas >> are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely >> _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and >> ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance >> behaviour of lambdas in either direction. > > Gilles Duboscq has updated the pull request incrementally with three additional commits since the last revision: > > - Remove extra field test from LambdaTest6 > - Wrap long lines > - Add deciated test in the jdk tests test/langtools/tools/javac/lambda/lambdaExpression/LambdaTest6.java line 33: > 31: * @compile LambdaTest6.java > 32: * @run main LambdaTest6 > 33: */ This line was added by JDK-8232806 (https://hg.openjdk.java.net/jdk/jdk/rev/27c2d2a4b695). I assume you want to move the test case for JDK-8232806 to test/jdk/java/lang/invoke? If so, BridgeMethod.java should be looked at too. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From jatin.bhateja at intel.com Thu Sep 10 16:47:25 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 10 Sep 2020 16:47:25 +0000 Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy In-Reply-To: References: Message-ID: Summary: 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy sizes above 4096 bytes. JMH Results: System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_St ubs_Baseline.txt]() WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_St ubs_WithOpts.txt]() ------------- Commit messages: - 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy. Changes: https://git.openjdk.java.net/jdk/pull/61/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252847 Stats: 1315 lines in 12 files changed: 1213 ins; 69 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From joserz at linux.ibm.com Fri Sep 11 01:20:40 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Thu, 10 Sep 2020 22:20:40 -0300 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <20200909163733.GA422344@pacoca> Message-ID: <20200911012040.GA518622@pacoca> Hello Martin, Here is my new webrev. https://cr.openjdk.java.net/~mhorie/8248190/jdk11u/webrev.00/ (Thanks again, Michi) 8<---------------------------------------------------------------------- Some evidences in a Power10 emulator: $ openjdk/jdk11u-dev/build/jdk/bin/java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintOptoAssembly -XX:-UseSIGTRAP -XX:+UseByteReverseInstructions ReverseBytes | grep 'BRD\|BRH\|BRW\|EXTSH' ... 054 BRW R17, R18 070 BRD R14, R14 080 BRH R14, R15 EXTSH R14, R14 0a0 BRH R17, R15 $ openjdk/jdk11u-dev/build/jdk/bin/java -XX:+Verbose -XX:PowerArchitecturePPC64=10 -version dscr value was 0x10 Version: ppc64 fsqrt isel lxarxeh cmpb popcntb popcntw fcfids vand lqarx aes vpmsumb mfdscr vsx ldbrx stdbrx sha darn brw L1_data_cache_line_size=128 ContendedPaddingWidth 128 openjdk version "11.0.10-internal" 2021-01-19 OpenJDK Runtime Environment (slowdebug build 11.0.10-internal+0-adhoc.ziviani.jdk11u-dev) OpenJDK 64-Bit Server VM (slowdebug build 11.0.10-internal+0-adhoc.ziviani.jdk11u-dev, mixed mode) 8<---------------------------------------------------------------------- Thank you very much! Jose On Thu, Sep 10, 2020 at 09:54:04AM +0000, Doerr, Martin wrote: > Hi Jose, > > if manual adaptation/integration is needed we also need to review the backport webrev (except trivial differences like in copyright headers). > > Best regards, > Martin > > > > -----Original Message----- > > From: jdk-updates-dev On > > Behalf Of joserz at linux.ibm.com > > Sent: Mittwoch, 9. September 2020 18:38 > > To: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > dev at openjdk.java.net > > Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new > > byte-reverse instructions > > > > Hello team! > > > > I'd like to backport the following patchset to 11u. It doesn't apply perfectly > > due to some positional changes and a copyright update. > > Please, let me know if you prefer another webrev addressing this backport. > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > Thank you very much, > > > > Jose R Ziviani From goetz.lindenmaier at sap.com Fri Sep 11 06:05:54 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 11 Sep 2020 06:05:54 +0000 Subject: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: <835d5832-04a6-cbf5-7a7f-6495abf8658f@redhat.com> References: <835d5832-04a6-cbf5-7a7f-6495abf8658f@redhat.com> Message-ID: Hi Andrew, Sorry I didn't mean to offend you. But this thread is about optimizing the rules for changes Oracle downported to 11. The majority of these changes are not reviewed. See the example I assembled below. So I just wanted to point to the existing rules, which say the maintainer must judge the risk. This is a mandatory step in the process. A review is not mandatory. Obviously, any useful input is welcome to make the decision of the maintainer more easy, or to spot overseen risks. The point of this discussion here is under which criteria Oracle changes should not be downported, see also Andrew H's last post: http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-September/003785.html Best regards, Goetz. I checked a snapshot of 19 changes I did for 11.0.9 (see below): 12 applied clean. 3 had a review because of Copyright/context conflicts. 4 had a review because I had to touch code. clean https://bugs.openjdk.java.net/browse/JDK-8248472: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError code https://bugs.openjdk.java.net/browse/JDK-8248471: [aarch64] assert(false) failed: wrong size of mach node code https://bugs.openjdk.java.net/browse/JDK-8248469: java.net HTTP/2 client does not decrease stream count when receives 204 response clean https://bugs.openjdk.java.net/browse/JDK-8248229: 2020-04-24 public suffix list update v ff6fcea code, relevant https://bugs.openjdk.java.net/browse/JDK-8248213: TestEliminateArrayCopy fails with -XX:+StressReflectiveCode clean https://bugs.openjdk.java.net/browse/JDK-8248210: TestClone crashes with "all collected exceptions must come from the same place" clean https://bugs.openjdk.java.net/browse/JDK-8248209: C1 assert(known_holder == NULL || (known_holder->is_instance_klass() && (!known_holder->is_interface() ... copyright https://bugs.openjdk.java.net/browse/JDK-8248208: Possible NPE in ENC-PA-REP search in AS-REQ clean https://bugs.openjdk.java.net/browse/JDK-8248207: Minimal fastdebug build broken after JDK-8245801 clean https://bugs.openjdk.java.net/browse/JDK-8248206: StressRecompilation triggers assert "redundunt OSR recompilation detected. memory leak in CodeCache!" clean https://bugs.openjdk.java.net/browse/JDK-8248204: Enhance BaseLdapServer to support starttls extended request clean https://bugs.openjdk.java.net/browse/JDK-8248203: [macos] macos10.14 Mojave returns anti-aliased glyphs instead of aliased B&W glyphs context https://bugs.openjdk.java.net/browse/JDK-8248202: Jconsole can't connect to itself clean https://bugs.openjdk.java.net/browse/JDK-8248201: java.rmi.NoSuchObjectException: no such object in table clean https://bugs.openjdk.java.net/browse/JDK-8248197: Choose the default SecureRandom algo based on registration ordering clean https://bugs.openjdk.java.net/browse/JDK-8248196: SSLSocket.getSession() doesn't close connection for timeout/ interrupts context https://bugs.openjdk.java.net/browse/JDK-8248141: HttpConnection not returned to the pool after 204 response code https://bugs.openjdk.java.net/browse/JDK-8246690: Tools should warn if weak algorithms are used before restricting them clean https://bugs.openjdk.java.net/browse/JDK-8248002: Test javax/swing/border/TestTitledBorderLeak.java should be marked as headful > -----Original Message----- > From: Andrew Dinn > Sent: Thursday, September 10, 2020 11:46 AM > To: Lindenmaier, Goetz ; Andrew Haley > ; Doerr, Martin ; 'Severin > Gehwolf' ; 'hotspot-compiler- > dev at openjdk.java.net' ; jdk- > updates-dev at openjdk.java.net > Cc: Langer, Christoph > Subject: Re: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor > enter/exit runtime entries. > > On 09/09/2020 16:49, Lindenmaier, Goetz wrote: > > It is pointless to ask reviewers to judge the risk > > because reviews are only done if the change had to be adapted. > > There are complex changes that just apply clean and thus are > > downported without review. > > I think you have very much overstated your case in that opening line. If > a patch needs adapting in anything other than a trivial manner then it > cannot be a bad thing for the reviewer to consider the risk involved. > That benefit cannot be removed by the fact that some changes bypass the > review process. > > It might be diminished, of course. But by the same token, if a complex > patch does not need adapting then it might still be a good thing for the > maintainer who is going to approve or disapprove it to consider the > risk. Indeed, I hope that this consideration is foremost in the > maintainers' minds. > > Either way, it's far from /pointless/ for reviewers or maintainers to > consider risks and where they think it important negotiate the details > with the downporter. Likewise if the downporter is unsure then they > really ought to initiate such a dialogue. Such actiuon might well be > redundant in many cases but I am sure those will be easy to spot. > > > Judging the risk is clearly a thing of the downporter. This is formulated > > in Rule 1 of Oracles's Updates > > description : http://openjdk.java.net/projects/jdk-updates/approval.html , > > and also mentioned in step 6 of jdk11us description: > > > https://wiki.openjdk.java.net/display/JDKUpdates/How+to+contribute+a+fix > > This is supposed to help the maintainer to decide about the risk. > > That would be fine if every downporter was happy and able to make the > assessment. However, there are definitely going to be cases where that > is not so. > > I'm not sure why you are stating the requirements here in such absolute > terms. Note that in my previous post I did not recommend that /all/ > responsibility be shifted to the reviewer or maintainer. I recommended > that reviewers consider helping downporters with complex backports, > including those that might benefit from trimming or re-implementing a fix. > > > The major task of the review is to make sure the downported change > > is correct, i.e. has the same effect on 11 as the original one. > > Nevertheless, if there is a reviewer and he feels bad about a > > change, he should communicate his concerns! > I think it is misguided to separate the decision as to correctness from > concerns over the risk associated with a change. My experience is that > cases where it is hard to foresee all the consequences of downporting a > change are far from rare. For a lot of complex changes -- and even some > simple ones -- a 100% foolproof assessment of correctness is never going > to be available. In such cases the risk of breaking something really > ought to be factored in by the reviewer because s/he can only speak to > the /likely/ correctness (as opposed to /absolute/ correctness) of the > patch. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill From dnsimon at openjdk.java.net Fri Sep 11 08:08:22 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 11 Sep 2020 08:08:22 GMT Subject: RFR: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode [v3] In-Reply-To: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> References: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> Message-ID: > To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is > added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: > * Each JVMCI compiler thread has a "compilation ticks" counter. > * There is also a global JVMCI compilation ticks counter. > * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. > * Every 512 increments of such a counter also increments the global counter. > * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a > defined period. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8252543: add compilation ticks for mitigating against deadlock due to blocking JVMCI compilation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/35/files - new: https://git.openjdk.java.net/jdk/pull/35/files/94e4a3f4..6f779acf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=35&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=35&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/35.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/35/head:pull/35 PR: https://git.openjdk.java.net/jdk/pull/35 From adinn at redhat.com Fri Sep 11 09:57:13 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 11 Sep 2020 10:57:13 +0100 Subject: [11u] Sconpe of Review ... was RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <835d5832-04a6-cbf5-7a7f-6495abf8658f@redhat.com> Message-ID: <4a14cee1-1457-dbe9-48e5-5bcfab4b467e@redhat.com> Hi Goetz, On 11/09/2020 07:05, Lindenmaier, Goetz wrote: > Sorry I didn't mean to offend you. Of course. I never thought otherwise and I certainly was not offended by anything you said. > But this thread is about optimizing the rules for > changes Oracle downported to 11. The majority of > these changes are not reviewed. See the example > I assembled below. > So I just wanted to point to the existing rules, which > say the maintainer must judge the risk. This is a > mandatory step in the process. A review is not > mandatory. > > Obviously, any useful input is welcome to make > the decision of the maintainer more easy, or to > spot overseen risks. I was not clear that this was your original position, specifically because of these words: "It is pointless to ask reviewers to judge the risk . . ." "Judging the risk is clearly a thing of the downporter. This is formulated in Rule 1 of Oracles's Updates . . ." It seems we are both in agreement that downporters, reviewers and maintainers all have a part to play and that any doubt is best addressed through open discussion an negotiation. > The point of this discussion here is under which > criteria Oracle changes should not be downported, see > also Andrew H's last post: > http://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-September/003785.html Yes, I understood that. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 10:24:52 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 10:24:52 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type Message-ID: Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter (always true) for simplicity. ------------- Commit messages: - 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type Changes: https://git.openjdk.java.net/jdk/pull/106/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=106&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252494 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/106.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/106/head:pull/106 PR: https://git.openjdk.java.net/jdk/pull/106 From shade at openjdk.java.net Fri Sep 11 10:24:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Sep 2020 10:24:56 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:08:24 GMT, Roberto Casta?eda Lozano wrote: > Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter > (always true) for simplicity. I read Vladimir's explanations in the bug, and the patch looks good based on that. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/106 From vlivanov at openjdk.java.net Fri Sep 11 10:24:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 11 Sep 2020 10:24:57 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:08:24 GMT, Roberto Casta?eda Lozano wrote: > Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter > (always true) for simplicity. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/106 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 10:24:57 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 10:24:57 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:12:58 GMT, Aleksey Shipilev wrote: >> Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter >> (always true) for simplicity. > > I read Vladimir's explanations in the bug, and the patch looks good based on that. Thanks @shipilev! ------------- PR: https://git.openjdk.java.net/jdk/pull/106 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 10:25:17 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 10:25:17 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused Message-ID: Remove unused `Matcher::stack_direction()` together with related ADL entries and ADLC support. ------------- Commit messages: - Matcher::stack_direction() is unused Changes: https://git.openjdk.java.net/jdk/pull/125/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=125&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8250914 Stats: 44 lines in 11 files changed: 0 ins; 44 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/125.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/125/head:pull/125 PR: https://git.openjdk.java.net/jdk/pull/125 From vlivanov at openjdk.java.net Fri Sep 11 10:25:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 11 Sep 2020 10:25:18 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused In-Reply-To: References: Message-ID: <1QtAaT0lXzWawv4jpqgywhDRqeuPfy6ZbMsKYrIuGBM=.273f5f5b-c780-41a1-b485-6b735dfc6f18@github.com> On Fri, 11 Sep 2020 06:56:10 GMT, Roberto Casta?eda Lozano wrote: > Remove unused `Matcher::stack_direction()` together with related ADL entries and > ADLC support. Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/125 From vlivanov at openjdk.java.net Fri Sep 11 10:25:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 11 Sep 2020 10:25:18 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused In-Reply-To: <1QtAaT0lXzWawv4jpqgywhDRqeuPfy6ZbMsKYrIuGBM=.273f5f5b-c780-41a1-b485-6b735dfc6f18@github.com> References: <1QtAaT0lXzWawv4jpqgywhDRqeuPfy6ZbMsKYrIuGBM=.273f5f5b-c780-41a1-b485-6b735dfc6f18@github.com> Message-ID: On Fri, 11 Sep 2020 09:55:40 GMT, Vladimir Ivanov wrote: >> Remove unused `Matcher::stack_direction()` together with related ADL entries and >> ADLC support. > > Marked as reviewed by vlivanov (Reviewer). Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/125 From thartmann at openjdk.java.net Fri Sep 11 11:13:45 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 11 Sep 2020 11:13:45 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:08:24 GMT, Roberto Casta?eda Lozano wrote: > Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter > (always true) for simplicity. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/106 From thartmann at openjdk.java.net Fri Sep 11 11:14:16 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 11 Sep 2020 11:14:16 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 06:56:10 GMT, Roberto Casta?eda Lozano wrote: > Remove unused `Matcher::stack_direction()` together with related ADL entries and > ADLC support. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/125 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 11:32:53 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 11:32:53 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 11:11:15 GMT, Tobias Hartmann wrote: >> Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter >> (always true) for simplicity. > > Looks good to me. Remove dead definition of ptr_type in TypeAryPtr::cast_to_autobox_cache. Also remove unnecessary cache parameter (always true) for simplicity. ------------- PR: https://git.openjdk.java.net/jdk/pull/106 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 11:32:53 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 11:32:53 GMT Subject: RFR: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 11:29:18 GMT, Roberto Casta?eda Lozano wrote: >> Looks good to me. > > Remove dead definition of ptr_type in TypeAryPtr::cast_to_autobox_cache. Also remove > unnecessary cache parameter (always true) for simplicity. Thanks Vladimir and Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/106 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 11:38:56 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 11:38:56 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 11:11:57 GMT, Tobias Hartmann wrote: >> Remove unused `Matcher::stack_direction()` together with related ADL entries and >> ADLC support. > > Looks good to me. Thanks Vladimir and Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/125 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 11:38:56 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 11:38:56 GMT Subject: RFR: 8250914: Matcher::stack_direction() is unused In-Reply-To: References: Message-ID: <659jxnwKhocjffhClVo1AGuIjwvSr-yW8MUddNxtJ4o=.68d7a74f-9e63-4cfb-b29c-db0c0e8544a5@github.com> On Fri, 11 Sep 2020 11:35:15 GMT, Roberto Casta?eda Lozano wrote: >> Looks good to me. > > Thanks Vladimir and Tobias! Remove unused Matcher::stack_direction() together with related ADL entries and ADLC support. ------------- PR: https://git.openjdk.java.net/jdk/pull/125 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 11:58:21 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 11:58:21 GMT Subject: Integrated: 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 08:08:24 GMT, Roberto Casta?eda Lozano wrote: > Remove dead definition of `ptr_type` in `TypeAryPtr::cast_to_autobox_cache`. Also remove unnecessary `cache` parameter > (always true) for simplicity. This pull request has now been integrated. Changeset: 9687dcab Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/9687dcab Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod 8252494: C2: TypeAryPtr::cast_to_autobox_cache does not use ptr_type Remove dead definition of ptr_type in TypeAryPtr::cast_to_autobox_cache. Also remove unnecessary cache parameter (always true) for simplicity. Reviewed-by: shade, vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/106 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 11 12:00:31 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Sep 2020 12:00:31 GMT Subject: Integrated: 8250914: Matcher::stack_direction() is unused In-Reply-To: References: Message-ID: <8Alz6sFFK32WwxaeBFMxZnUDR_90SoYZj5IrzQUAvtk=.d77b937e-a3c4-4e55-a767-5f1435d64639@github.com> On Fri, 11 Sep 2020 06:56:10 GMT, Roberto Casta?eda Lozano wrote: > Remove unused `Matcher::stack_direction()` together with related ADL entries and > ADLC support. This pull request has now been integrated. Changeset: 040c8f58 Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/040c8f58 Stats: 44 lines in 11 files changed: 44 ins; 0 del; 0 mod 8250914: Matcher::stack_direction() is unused Remove unused Matcher::stack_direction() together with related ADL entries and ADLC support. Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/125 From volker.simonis at gmail.com Fri Sep 11 12:00:56 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 11 Sep 2020 14:00:56 +0200 Subject: RFR 8239090: Improve CPU feature support in VM_version In-Reply-To: <2D081CC6-FCD4-4A31-BDC1-ECF7A4B60BAD@amazon.com> References: <2D081CC6-FCD4-4A31-BDC1-ECF7A4B60BAD@amazon.com> Message-ID: On Tue, Sep 8, 2020 at 6:41 PM Hohensee, Paul wrote: > > Thank you for the review, Igor. > > I did indeed define FEATURES_NAMES to be close to the flags enum definition. And, as a macro, it might be useful in some future context. I also defined four CPU_ enum values and strings per line so it?s easy to keep track of the correspondence. > > Anyone else up for a review please? I expect I?ll have to turn this into a PR though. > Hi Paul, I like this cleanup and it looks good to me. I wonder if we shouldn't assert (e.g. in "VM_Version::get_processor_features()") that the number elements in "_features_names" equals the number of values in "Feature_Flag". Unfortunately we can't query the number of elements in an enum in C++, but we could do something like the following: enum Feature_Flag : uint64_t { ... CPU_HV = (1ULL << 46) // Hypervisor instructions MAX_FEATURE = (1ULL << 46) // Always the same like the largest feature }; assert(exact_log(MAX_FEATURE) == sizeof(_features_names)/sizeof(char*), ...) It's just an idea and I leave it up to you if you want to add it to the change. As I said before, the change looks good to me even without this extra check. Best regards, Volker > Paul > > From: Igor Veresov > Date: Friday, September 4, 2020 at 4:04 PM > To: "Hohensee, Paul" > Cc: "hotspot-compiler-dev at openjdk.java.net" > Subject: RE: RFR 8239090: Improve CPU feature support in VM_version > > This looks good. Did you make FEATURES_NAMES a macro just so that it?s close to the flags enum? > > igor > > > > > > On Sep 4, 2020, at 10:39 AM, Hohensee, Paul > wrote: > > Slightly adjusted patch. > > http://cr.openjdk.java.net/~phh/8239090/webrev.02/ > > Thanks, > Paul > > On 9/3/20, 3:47 PM, "hotspot-compiler-dev on behalf of Hohensee, Paul" wrote: > > Taking over from Eric... > > Thank you for the review, Igor. I took a completely different (and very old approach), however, and defined a method Abstract_VM_Version:: insert_features_names() that iterates over the feature flags set. If a feature bit is on, it appends to an output buffer a corresponding name string from an array indexed by the bit number. I've implemented it only for x86: using the mechanism for other platforms can be follow-on RFEs. I'd greatly appreciate a review. > > Webrev: http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > > To add a feature bit, all one now has to do is add a CPU_ definition and corresponding name string in the FEATURES_NAMES macro. > > I've also included a few small changes to the x86 implementation beyond the above. > > 1. Unified the previous two bitset definitions into a single Feature_Flag enum and made it a uint64_t. > 2. supports_tscinv_bit() referenced the CPU_TSCINV bit, which was a bit misleading, so added a new CPU_TSCINV_BIT mask and used it instead. > 3. Repurposed CPU_TSCINV for supports_tscinv(), which was a "composite" property, but is now computed once in feature_flags(). > 4. Made supports_clflushopt() and supports_clwb() common to both 32 and 64-bit rather than have 32-bit versions that always return 'false'. These bits are never set by the hardware on 32-bit, so no need for separate methods. > 5. Renamed CPU_HV_PRESENT to CPU_HV to conform with the CPU_ bit naming scheme. "_PRESENT" is redundant and not used for any other CPU_ name, and the feature string uses "hv", not "hv_present". Added CPU_HV to vmStructs_x86.hpp and vmStructs_jvmci.cpp. > > Tested using -Xlog:os+cpu on my macbook pro: the same feature string is returned after the patch as before it. Suggestions for how to more thoroughly test the patch are very welcome. > > Thanks, > Paul > > On 8/27/20, 6:22 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: > > You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. > That would be a good way to get rid of the ugly macro (we allow c++14 now). > > For example foo() in this example: > > enum E { a, b, c }; > > struct P { > E _e; // key > int _v; // value > constexpr P(E e, int v) : _e(e), _v(v) { } > }; > > > constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; > > constexpr int match(E e) { > for (const auto& p : ps) { > if (p._e == e) { > return p._v; > } > } > return -1; > } > > > int foo(E e) { > return match(e); > } > > Will be compiled into: > > __Z3foo1E: ## @_Z3foo1E > .cfi_startproc > ## %bb.0: > movl $-1, %eax > cmpl $2, %edi > ja LBB0_2 > ## %bb.1: > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset %rbp, -16 > movq %rsp, %rbp > .cfi_def_cfa_register %rbp > movslq %edi, %rax > leaq l_switch.table._Z3foo1E(%rip), %rcx > movq (%rcx,%rax,8), %rax > movl 4(%rax), %eax > popq %rbp > LBB0_2: > retq > .cfi_endproc > ## -- End function > .section __TEXT,__const > .p2align 4 ## @_ZL2ps > __ZL2ps: > .long 0 ## 0x0 > .long 57005 ## 0xdead > .long 1 ## 0x1 > .long 48879 ## 0xbeef > .long 2 ## 0x2 > .long 61453 ## 0xf00d > > .section __DATA,__const > .p2align 3 ## @switch.table._Z3foo1E > l_switch.table._Z3foo1E: > .quad __ZL2ps > .quad __ZL2ps+8 > .quad __ZL2ps+16 > > > igor > > > > On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: > > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 > > Yesterday I sent a wrong one, so I send it again, > I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. > > Regards, > Eric Chen > > > From martin.doerr at sap.com Fri Sep 11 12:26:09 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 11 Sep 2020 12:26:09 +0000 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200911012040.GA518622@pacoca> References: <20200909163733.GA422344@pacoca> <20200911012040.GA518622@pacoca> Message-ID: Hi Jose, looks good. Thanks for backporting. Best regards, Martin > -----Original Message----- > From: joserz at linux.ibm.com > Sent: Freitag, 11. September 2020 03:21 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > dev at openjdk.java.net; HORIE at jp.ibm.com > Subject: Re: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > new byte-reverse instructions > > Hello Martin, > > Here is my new webrev. > https://cr.openjdk.java.net/~mhorie/8248190/jdk11u/webrev.00/ > (Thanks again, Michi) > > 8<---------------------------------------------------------------------- > Some evidences in a Power10 emulator: > $ openjdk/jdk11u-dev/build/jdk/bin/java -Xcomp -XX:CompileThreshold=1 - > XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions - > XX:+PrintOptoAssembly -XX:-UseSIGTRAP -XX:+UseByteReverseInstructions > ReverseBytes | grep 'BRD\|BRH\|BRW\|EXTSH' > ... > 054 BRW R17, R18 > 070 BRD R14, R14 > 080 BRH R14, R15 > EXTSH R14, R14 > 0a0 BRH R17, R15 > > $ openjdk/jdk11u-dev/build/jdk/bin/java -XX:+Verbose - > XX:PowerArchitecturePPC64=10 -version > dscr value was 0x10 > Version: ppc64 fsqrt isel lxarxeh cmpb popcntb popcntw fcfids vand lqarx aes > vpmsumb mfdscr vsx ldbrx stdbrx sha darn brw > L1_data_cache_line_size=128 > > ContendedPaddingWidth 128 > > openjdk version "11.0.10-internal" 2021-01-19 > OpenJDK Runtime Environment (slowdebug build 11.0.10-internal+0- > adhoc.ziviani.jdk11u-dev) > OpenJDK 64-Bit Server VM (slowdebug build 11.0.10-internal+0- > adhoc.ziviani.jdk11u-dev, mixed mode) > 8<---------------------------------------------------------------------- > > Thank you very much! > > Jose > > On Thu, Sep 10, 2020 at 09:54:04AM +0000, Doerr, Martin wrote: > > Hi Jose, > > > > if manual adaptation/integration is needed we also need to review the > backport webrev (except trivial differences like in copyright headers). > > > > Best regards, > > Martin > > > > > > > -----Original Message----- > > > From: jdk-updates-dev On > > > Behalf Of joserz at linux.ibm.com > > > Sent: Mittwoch, 9. September 2020 18:38 > > > To: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > > dev at openjdk.java.net > > > Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > new > > > byte-reverse instructions > > > > > > Hello team! > > > > > > I'd like to backport the following patchset to 11u. It doesn't apply > perfectly > > > due to some positional changes and a copyright update. > > > Please, let me know if you prefer another webrev addressing this > backport. > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > Thank you very much, > > > > > > Jose R Ziviani From gdub at openjdk.java.net Fri Sep 11 15:07:23 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Fri, 11 Sep 2020 15:07:23 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v3] In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. Gilles Duboscq has updated the pull request incrementally with one additional commit since the last revision: Remove disableEagerInitialization concerns from BridgeMethod.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/93/files - new: https://git.openjdk.java.net/jdk/pull/93/files/422cd01d..625feb94 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=01-02 Stats: 32 lines in 1 file changed: 0 ins; 22 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/93.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/93/head:pull/93 PR: https://git.openjdk.java.net/jdk/pull/93 From gdub at openjdk.java.net Fri Sep 11 15:07:24 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Fri, 11 Sep 2020 15:07:24 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v3] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Thu, 10 Sep 2020 16:34:02 GMT, Mandy Chung wrote: >> Gilles Duboscq has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove disableEagerInitialization concerns from BridgeMethod.java > > test/langtools/tools/javac/lambda/lambdaExpression/LambdaTest6.java line 33: > >> 31: * @compile LambdaTest6.java >> 32: * @run main LambdaTest6 >> 33: */ > > This line was added by JDK-8232806 (https://hg.openjdk.java.net/jdk/jdk/rev/27c2d2a4b695). > I assume you want to move the test case for JDK-8232806 to test/jdk/java/lang/invoke? If so, > BridgeMethod.java should be looked at too. I have removed all the `disableEagerInitialization` tests from `BridgeMethod.java`. It is now restored to its pre-JDK-8232806 state. ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Fri Sep 11 18:22:56 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Fri, 11 Sep 2020 18:22:56 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v3] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: <5L4AuJbxZaEaS14dbG3o9stCf4ZfdvJX8Db2xnJplcs=.ebfe1a20-d39e-4119-9c34-e4c70a6a5a3e@github.com> On Wed, 9 Sep 2020 16:41:22 GMT, Mandy Chung wrote: >> Gilles Duboscq has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove disableEagerInitialization concerns from BridgeMethod.java > > Looks good. I agree with Jan's suggestion that it's good to move the test to test/jdk/java/lang/invoke/lambda which > is a better home for it. Thanks for updating BridgeMethod.java. I expected that the new LambdaEagerInitTest.java will be updated to verify the capturing lambda case that does not have the static `LAMBDA_INSTANCE$` field. BTW, this new regression test should be moved to `test/jdk/java/lang/invoke/lambda/` instead of `test/jdk/java/lang/invoke`. The test uses `assert`. Note that java assertion is not enabled by default. So regression tests should do an explicit check and throw runtime exception when the test fails. You can also use the JDK test library `jdk.test.lib.Asserts` or make this a testng test to use TestNG Asserts API. I updated and improved the test for your reference (send you separately). ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From kvn at openjdk.java.net Fri Sep 11 22:46:25 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 11 Sep 2020 22:46:25 GMT Subject: RFR: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode In-Reply-To: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> References: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> Message-ID: On Sun, 6 Sep 2020 19:48:08 GMT, Doug Simon wrote: > To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is > added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: > * Each JVMCI compiler thread has a "compilation ticks" counter. > * There is also a global JVMCI compilation ticks counter. > * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. > * Every 512 increments of such a counter also increments the global counter. > * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a > defined period. Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/35 From vladimir.kozlov at oracle.com Fri Sep 11 23:32:49 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 11 Sep 2020 16:32:49 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> Message-ID: Thank you, Jatin Vladimir K On 9/9/20 10:37 PM, Bhateja, Jatin wrote: > Hi VladimirK, > Removing OrVNode::Ideal looks correct, it was added to cover vector API use case. > > Regards, > Jatin > >> -----Original Message----- >> From: hotspot-compiler-dev On >> Behalf Of Vladimir Kozlov >> Sent: Wednesday, September 9, 2020 11:17 PM >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >> bool)+0x8b9 >> >> Thank you, Vladimir, for comments and review. >> >> Regards, >> Vladimir K >> >> On 9/9/20 7:57 AM, Vladimir Ivanov wrote: >>> >>>>> Similar strict check for constant shift is needed in OrVNode::Ideal >> routine in vectornode.cpp. >>>> >>>> It took me some time to analyze your code for "lazy de-generation" of >>>> Rotate vectors. As I understand you want to preserve scalar optimization >> which creates Rotate nodes but have to revert it to keep vectorization of >> java code. >>> >>> Yes, the main motivation was to simplify the implementation: keep >>> vectorization logic simple (just Rotate -> RotateV >>> transformation) and don't mess with matching if RotateV is not >>> supported (expand ideal nodes instead of adding special AD instructions). >>> >>>> Method degenerate_vector_rotate() has is_Con() check and, in general, >>>> it could be TOP because we do loop optimizations after vectorization. I >> added isa_int() check and treat 'cnt' in other case as variable to do >> transformation on 'else' >>>> branch and let sub-graph collapse there. >>>> I also refactor degenerate_vector_rotate() to make it compact. >>> >>> Good point. >>> >>>> Second, about OrVNode::Ideal(). I am not sure how safe it is without >>>> additional investigation because currently it is not executed. Based on >> comment it was added for VectorAPI which is experimental and not pushed >> yet. >>>> The code is convoluted and does not match scalar Or::Ideal() code. >>>> >>>> OrINode::Ideal() does next checks for left rotation: >>>> ?? if (Matcher::match_rule_supported(Op_RotateLeft) && >>>> ?????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && >>>> in(1)->in(1) == in(2)->in(1)) { >>>> >>>> but OrVNode::Ideal() does: >>>> ?? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, >>>> bt) && >>>> ?????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || >>>> >>>> Why it checks RIGHT operator for LShiftV???? >>>> >>>> And asserts are contradicting: >>>> ??? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV >>>> operand expected"); >>>> >>>> Was this code tested? My immediate reaction is simple delete it now >>>> and add reworked and tested version back with EnableVectorSupport flag >> check after VectorAPI is integrated. >>> >>> Sounds good. >>> >>> My feedback on OrVNode::Ideal() during 8248830 review was: >>> >>> "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >>> idealization, inferencing of vector rotate through OrV idealization >> covers the vector patterns generated though non SLP route i.e. VectorAPI. >>> >>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the general >>> direction here - duplication of scalar transformations to lane-wise >>> vector operations. It definitely won't scale and in a longer run it >>> risks to diverge. Would be nice to find a way to automatically "lift" >> scalar transformations to vectors and apply them uniformly. But right now >> it is just an idea which requires more experimentation." >>> >>>> Reworked version may use the same new rotate_shift() I added. I start >>>> rewriting it but since I can't test it and I am not sure may be edges >> are swapped indeed. I am suggesting to remove it. >>>> >>>> Also VectorAPI should use Rotate vectors from start which we can >>>> de-generation if not supported. So I am not sure how >>>> OrVNode::Ideal() will be usefull for VEctorAPI too. >>> >>> Though the API exposes rotation as a dedicated operation, users are >>> free to code it explicitly with vector shifts and vector or. >>> Basically, the situation is similar to scalar case: there are >>> dedicated methods available (Long.rotateLeft/rotateRight), but users are >> free to code their own variant. And sometimes more general code shapes may >> degenerate into rotates (as a result of other optimizations). >>> >>> And from Vector API implementation perspective, it is attractive to >>> implement vector rotation purely in Java code as a composition of vector >> shifts/or operations rather than using JVM intrinsic for it. >>> >>> So, there's a number of use cases when transformations on vector nodes >> becomes profitable. >>> >>>> http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ >>> >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> About testing. I see you used a lot of -128, 128 and similar values >> which are larger then bits in Java Integer and Long. >>>> But Java do masking of shift count by default before executing shift. >>>> I would prefer if something like 31 (or 63 for Long) were used >>>> instead. Otherwise Rotate vectors are not generated and tested. >>>> >>>> compiler/intrinsics/TestRotate.java calls verify() after each >>>> operation as result it is really hard to see generated assembler. I >> think we should at least exclude inlinining of verify(). >>>> >>>> I will work on tests and have an other update. >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>> >>>>> >>>>> Regards, >>>>> Jatin >>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-compiler-dev >>>>>> On Behalf Of Vladimir >>>>>> Kozlov >>>>>> Sent: Friday, September 4, 2020 3:14 AM >>>>>> To: hotspot compiler >>>>>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>>>>> bool)+0x8b9 >>>>>> >>>>>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8252188 >>>>>> >>>>>> Code added by 8248830 [1] uses Node::is_Con() check when looking >>>>>> for constant shift values. >>>>>> Unfortunately it does not guarantee that it will be Integer >>>>>> constant because TOP node is also ConNode. >>>>>> I used C2 types to check and get shift values. I also refactor code >>>>>> to consolidate checks. >>>>>> >>>>>> Tested: tier1, hs-tier2, hs-tier3. >>>>>> Verified fix with replay file from bug report. >>>>>> I also checked that RotateBenchmark.java added by 8248830 still >>>>>> creates Rotate vectors after this fix. >>>>>> >>>>>> I created subtask to add new regerssion test later because this fix >>>>>> is urgent and I did not have time to prepare it. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From vladimir.kozlov at oracle.com Fri Sep 11 23:33:47 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 11 Sep 2020 16:33:47 -0700 Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: <56ad28d7-1733-7cd7-2fb4-a1a53af8a311@oracle.com> <658e9a0a-ffc0-3c66-6353-83282f3ea071@oracle.com> Message-ID: Thank you, Tobias Vladimir K On 9/9/20 10:45 PM, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 10.09.20 07:37, Bhateja, Jatin wrote: >> Hi VladimirK, >> Removing OrVNode::Ideal looks correct, it was added to cover vector API use case. >> >> Regards, >> Jatin >> >>> -----Original Message----- >>> From: hotspot-compiler-dev On >>> Behalf Of Vladimir Kozlov >>> Sent: Wednesday, September 9, 2020 11:17 PM >>> To: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>> bool)+0x8b9 >>> >>> Thank you, Vladimir, for comments and review. >>> >>> Regards, >>> Vladimir K >>> >>> On 9/9/20 7:57 AM, Vladimir Ivanov wrote: >>>> >>>>>> Similar strict check for constant shift is needed in OrVNode::Ideal >>> routine in vectornode.cpp. >>>>> >>>>> It took me some time to analyze your code for "lazy de-generation" of >>>>> Rotate vectors. As I understand you want to preserve scalar optimization >>> which creates Rotate nodes but have to revert it to keep vectorization of >>> java code. >>>> >>>> Yes, the main motivation was to simplify the implementation: keep >>>> vectorization logic simple (just Rotate -> RotateV >>>> transformation) and don't mess with matching if RotateV is not >>>> supported (expand ideal nodes instead of adding special AD instructions). >>>> >>>>> Method degenerate_vector_rotate() has is_Con() check and, in general, >>>>> it could be TOP because we do loop optimizations after vectorization. I >>> added isa_int() check and treat 'cnt' in other case as variable to do >>> transformation on 'else' >>>>> branch and let sub-graph collapse there. >>>>> I also refactor degenerate_vector_rotate() to make it compact. >>>> >>>> Good point. >>>> >>>>> Second, about OrVNode::Ideal(). I am not sure how safe it is without >>>>> additional investigation because currently it is not executed. Based on >>> comment it was added for VectorAPI which is experimental and not pushed >>> yet. >>>>> The code is convoluted and does not match scalar Or::Ideal() code. >>>>> >>>>> OrINode::Ideal() does next checks for left rotation: >>>>> ??? if (Matcher::match_rule_supported(Op_RotateLeft) && >>>>> ??????? lopcode == Op_LShiftI && ropcode == Op_URShiftI && >>>>> in(1)->in(1) == in(2)->in(1)) { >>>>> >>>>> but OrVNode::Ideal() does: >>>>> ??? if (Matcher::match_rule_supported_vector(Op_RotateLeftV, vec_len, >>>>> bt) && >>>>> ??????? ((ropcode == Op_LShiftVI && lopcode == Op_URShiftVI) || >>>>> >>>>> Why it checks RIGHT operator for LShiftV???? >>>>> >>>>> And asserts are contradicting: >>>>> ???? assert(Op_RShiftCntV == in(1)->in(2)->Opcode(), "LShiftCntV >>>>> operand expected"); >>>>> >>>>> Was this code tested? My immediate reaction is simple delete it now >>>>> and add reworked and tested version back with EnableVectorSupport flag >>> check after VectorAPI is integrated. >>>> >>>> Sounds good. >>>> >>>> My feedback on OrVNode::Ideal() during 8248830 review was: >>>> >>>> "> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >>>> idealization, inferencing of vector rotate through OrV idealization >>> covers the vector patterns generated though non SLP route i.e. VectorAPI. >>>> >>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the general >>>> direction here - duplication of scalar transformations to lane-wise >>>> vector operations. It definitely won't scale and in a longer run it >>>> risks to diverge. Would be nice to find a way to automatically "lift" >>> scalar transformations to vectors and apply them uniformly. But right now >>> it is just an idea which requires more experimentation." >>>> >>>>> Reworked version may use the same new rotate_shift() I added. I start >>>>> rewriting it but since I can't test it and I am not sure may be edges >>> are swapped indeed. I am suggesting to remove it. >>>>> >>>>> Also VectorAPI should use Rotate vectors from start which we can >>>>> de-generation if not supported. So I am not sure how >>>>> OrVNode::Ideal() will be usefull for VEctorAPI too. >>>> >>>> Though the API exposes rotation as a dedicated operation, users are >>>> free to code it explicitly with vector shifts and vector or. >>>> Basically, the situation is similar to scalar case: there are >>>> dedicated methods available (Long.rotateLeft/rotateRight), but users are >>> free to code their own variant. And sometimes more general code shapes may >>> degenerate into rotates (as a result of other optimizations). >>>> >>>> And from Vector API implementation perspective, it is attractive to >>>> implement vector rotation purely in Java code as a composition of vector >>> shifts/or operations rather than using JVM intrinsic for it. >>>> >>>> So, there's a number of use cases when transformations on vector nodes >>> becomes profitable. >>>> >>>>> http://cr.openjdk.java.net/~kvn/8252188/webrev.01/ >>>> >>>> Looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> >>>>> About testing. I see you used a lot of -128, 128 and similar values >>> which are larger then bits in Java Integer and Long. >>>>> But Java do masking of shift count by default before executing shift. >>>>> I would prefer if something like 31 (or 63 for Long) were used >>>>> instead. Otherwise Rotate vectors are not generated and tested. >>>>> >>>>> compiler/intrinsics/TestRotate.java calls verify() after each >>>>> operation as result it is really hard to see generated assembler. I >>> think we should at least exclude inlinining of verify(). >>>>> >>>>> I will work on tests and have an other update. >>>>> >>>>> Thanks, >>>>> Vladimir K >>>>> >>>>> >>>>>> >>>>>> Regards, >>>>>> Jatin >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: hotspot-compiler-dev >>>>>>> On Behalf Of Vladimir >>>>>>> Kozlov >>>>>>> Sent: Friday, September 4, 2020 3:14 AM >>>>>>> To: hotspot compiler >>>>>>> Subject: [16] RFR(M) 8252188: Crash in OrINode::Ideal(PhaseGVN*, >>>>>>> bool)+0x8b9 >>>>>>> >>>>>>> https://cr.openjdk.java.net/~kvn/8252188/webrev.00/ >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8252188 >>>>>>> >>>>>>> Code added by 8248830 [1] uses Node::is_Con() check when looking >>>>>>> for constant shift values. >>>>>>> Unfortunately it does not guarantee that it will be Integer >>>>>>> constant because TOP node is also ConNode. >>>>>>> I used C2 types to check and get shift values. I also refactor code >>>>>>> to consolidate checks. >>>>>>> >>>>>>> Tested: tier1, hs-tier2, hs-tier3. >>>>>>> Verified fix with replay file from bug report. >>>>>>> I also checked that RotateBenchmark.java added by 8248830 still >>>>>>> creates Rotate vectors after this fix. >>>>>>> >>>>>>> I created subtask to add new regerssion test later because this fix >>>>>>> is urgent and I did not have time to prepare it. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8248830 From kvn at openjdk.java.net Sat Sep 12 00:27:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 12 Sep 2020 00:27:10 GMT Subject: RFR: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode [v3] In-Reply-To: References: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> Message-ID: <0LAN21rJH37Umce6KSX9OAoHtp6fArEGQiHM15AMmos=.e4578c8e-c3a6-40d6-b4c5-e29e8f246a12@github.com> On Fri, 11 Sep 2020 08:08:22 GMT, Doug Simon wrote: >> To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is >> added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: >> * Each JVMCI compiler thread has a "compilation ticks" counter. >> * There is also a global JVMCI compilation ticks counter. >> * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. >> * Every 512 increments of such a counter also increments the global counter. >> * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a >> defined period. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental > views will show differences compared to the previous content of the PR. The pull request contains one new commit since > the last revision: > 8252543: add compilation ticks for mitigating against deadlock due to blocking JVMCI compilation Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/35 From kvn at openjdk.java.net Sat Sep 12 01:09:40 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 12 Sep 2020 01:09:40 GMT Subject: RFR: 8252898: remove bulk registration of JFR CompilerPhaseType names [v2] In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 21:05:21 GMT, Doug Simon wrote: >> The changes made in [JDK-8193210](https://bugs.openjdk.java.net/browse/JDK-8193210) support only [bulk >> registration](https://github.com/openjdk/jdk/blob/4e6a4af1866d0007d368b78bf78b6a8e1c8be425/src/hotspot/share/compiler/compilerEvent.hpp#L75) >> of compiler phase names with JFR. However, Graal only registers compiler phase names upon first execution of the phase >> since the set of phases is not known during VM initialization. This means registration of a Graal phase name needs to >> do unnecessary work, wrapping a single name into an array to conform to the bulk registration API. This pull request >> updates the registration API to be in terms of a registering a single phase name. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental > views will show differences compared to the previous content of the PR. Looks good. I ran jfr/event/compiler tests with changes and they passed except TestCompilerInlining.java which failed before changes. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/77 From kvn at openjdk.java.net Sat Sep 12 01:09:40 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 12 Sep 2020 01:09:40 GMT Subject: RFR: 8252898: remove bulk registration of JFR CompilerPhaseType names [v2] In-Reply-To: References: Message-ID: On Sat, 12 Sep 2020 01:05:41 GMT, Vladimir Kozlov wrote: >> Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental >> views will show differences compared to the previous content of the PR. > > Looks good. > I ran jfr/event/compiler tests with changes and they passed except TestCompilerInlining.java which failed before > changes. I still want @jamsheedcm to review it too. ------------- PR: https://git.openjdk.java.net/jdk/pull/77 From dnsimon at openjdk.java.net Sat Sep 12 05:29:16 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sat, 12 Sep 2020 05:29:16 GMT Subject: Integrated: 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode In-Reply-To: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> References: <5IFx6Bjetu0eYmCpkgbsRmMVj6i-Xf1tt4LMHvWx91w=.7f520889-ab7d-49de-aa58-0c0608627edb@github.com> Message-ID: <4PhN_O0Qm8BjhF-CQ-exlIwJJoaqOXj8Ne4dgcnFJn8=.a8e5a29a-9d38-48f0-8d2e-77bcbceacbea@github.com> On Sun, 6 Sep 2020 19:48:08 GMT, Doug Simon wrote: > To prevent a deadlock in libgraal under `-Xcomp` or `-Xbatch` due to a lock being held in libgraal, a new mechanism is > added by this change that allow JVMCI compiler threads to communicate their "progress" to HotSpot: > * Each JVMCI compiler thread has a "compilation ticks" counter. > * There is also a global JVMCI compilation ticks counter. > * Each JVMCI VM call increments the JVMCI compiler thread-local compilation ticks counter. > * Every 512 increments of such a counter also increments the global counter. > * A thread waiting on a blocking JVMCI compilation will be unblocked if these counters indicate no progress after a > defined period. This pull request has now been integrated. Changeset: 998ce78e Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/998ce78e Stats: 126 lines in 11 files changed: 14 ins; 72 del; 40 mod 8252543: [JVMCI] Libgraal can deadlock in blocking compilation mode Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/35 From jbhateja at openjdk.java.net Sun Sep 13 19:12:03 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 13 Sep 2020 19:12:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions Message-ID: Summary: 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. Performance Results: System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java ArrayCopyPartialInlineSize : 32 JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain -- | -- | -- | -- | -- ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 Detailed Reports: Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt]() WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](url) ------------- Commit messages: - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. Changes: https://git.openjdk.java.net/jdk/pull/144/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=144&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252848 Stats: 561 lines in 27 files changed: 545 ins; 1 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/144.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/144/head:pull/144 PR: https://git.openjdk.java.net/jdk/pull/144 From dholmes at openjdk.java.net Mon Sep 14 04:04:52 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 14 Sep 2020 04:04:52 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Sun, 13 Sep 2020 19:02:59 GMT, Jatin Bhateja wrote: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size > less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes > an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](url) Adding a new product flag requires a CSR request to be filed. ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From jbhateja at openjdk.java.net Mon Sep 14 05:04:24 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 14 Sep 2020 05:04:24 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Mon, 14 Sep 2020 04:02:10 GMT, David Holmes wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size >> less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes >> an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](url) > > Adding a new product flag requires a CSR request to be filed. > /csr needed > > Adding a new product flag requires a CSR request to be filed. @dholmes-ora , with https://github.com/openjdk/jdk/commit/5144190e there has been a clean up of options and product options now accept DIAGNOSTIC as an additional parameter. Newly added flag is a DIAGNOSTIC flag. ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From jcm at openjdk.java.net Mon Sep 14 06:49:09 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Mon, 14 Sep 2020 06:49:09 GMT Subject: RFR: 8252898: remove bulk registration of JFR CompilerPhaseType names [v2] In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 21:05:21 GMT, Doug Simon wrote: >> The changes made in [JDK-8193210](https://bugs.openjdk.java.net/browse/JDK-8193210) support only [bulk >> registration](https://github.com/openjdk/jdk/blob/4e6a4af1866d0007d368b78bf78b6a8e1c8be425/src/hotspot/share/compiler/compilerEvent.hpp#L75) >> of compiler phase names with JFR. However, Graal only registers compiler phase names upon first execution of the phase >> since the set of phases is not known during VM initialization. This means registration of a Graal phase name needs to >> do unnecessary work, wrapping a single name into an array to conform to the bulk registration API. This pull request >> updates the registration API to be in terms of a registering a single phase name. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental > views will show differences compared to the previous content of the PR. The pull request contains one new commit since > the last revision: > 8252898: remove bulk registration of JFR CompilerPhaseType names Marked as reviewed by jcm (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/77 From dnsimon at openjdk.java.net Mon Sep 14 07:51:39 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 14 Sep 2020 07:51:39 GMT Subject: Integrated: 8252898: remove bulk registration of JFR CompilerPhaseType names In-Reply-To: References: Message-ID: On Tue, 8 Sep 2020 15:07:04 GMT, Doug Simon wrote: > The changes made in [JDK-8193210](https://bugs.openjdk.java.net/browse/JDK-8193210) support only [bulk > registration](https://github.com/openjdk/jdk/blob/4e6a4af1866d0007d368b78bf78b6a8e1c8be425/src/hotspot/share/compiler/compilerEvent.hpp#L75) > of compiler phase names with JFR. However, Graal only registers compiler phase names upon first execution of the phase > since the set of phases is not known during VM initialization. This means registration of a Graal phase name needs to > do unnecessary work, wrapping a single name into an array to conform to the bulk registration API. This pull request > updates the registration API to be in terms of a registering a single phase name. This pull request has now been integrated. Changeset: b05290aa Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/b05290aa Stats: 114 lines in 6 files changed: 43 ins; 31 del; 40 mod 8252898: remove bulk registration of JFR CompilerPhaseType names Reviewed-by: kvn, jcm ------------- PR: https://git.openjdk.java.net/jdk/pull/77 From aph at redhat.com Mon Sep 14 08:40:14 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 14 Sep 2020 09:40:14 +0100 Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On 13/09/2020 20:12, Jatin Bhateja wrote: > 1) Partial in-lining technique avoids call overhead penalty for > sub-word type small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length > either calls an array-copy stub or executes an optimized instruction > sequence using AVX-512 masked instructions emitted at the call site. This may not be a good idea. See my reply at https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043114.html https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043155.html -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From github.com+1981974+kuaiwei at openjdk.java.net Mon Sep 14 10:52:44 2020 From: github.com+1981974+kuaiwei at openjdk.java.net (kuaiwei) Date: Mon, 14 Sep 2020 10:52:44 GMT Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 Message-ID: Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only one klass. If not, a slow loop is used to checking both klasses. Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go through itable once and reduce memory operations. bug: https://bugs.openjdk.java.net/browse/JDK-8253049 ------------- Commit messages: - 8253049: Enhance itable_stub for AArch64 and x86_64 Changes: https://git.openjdk.java.net/jdk/pull/128/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/128.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 PR: https://git.openjdk.java.net/jdk/pull/128 From vladimir.x.ivanov at oracle.com Mon Sep 14 12:25:16 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 14 Sep 2020 15:25:16 +0300 Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: References: Message-ID: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> Hi Kevin, Very interesting observations. I like the idea to optimize for the case when REFC == DECC. Fusing 2 passes over the itable into one does look attractive, but I'm not sure the proposed variant is correct. I suggest to split the patch into 2 enhancements and handle them separately. I'm curious what kind of benchmarks you used and what are the improvements observed with the patch. One suggestion about the implementation: src/hotspot/cpu/x86/macroAssembler_x86.cpp: +void MacroAssembler::lookup_interface_method_in_stub(Register recv_klass, I'd like to avoid having 2 independent implementations of itable lookup (MacroAssembler::lookup_interface_method_in_stub() and MacroAssembler::lookup_interface_method()). It would be nice to keep the implementation unified between itable and MethodHandle linkToInterface linker stubs. What MacroAssembler::lookup_interface_method(..., true /*return_method*/) does is interface method lookup w/o proper subtype check and it is equivalent to fast loop in MacroAssembler::lookup_interface_method_in_stub(). As a possible path forward, you could introduce the fast path check first by moving the fast path check into VtableStubs::create_itable_stub() and guard the first path over the itable. It would make the type checking pass over itable optional based on runtime check. Then you could refactor MacroAssembler::lookup_interface_method() to optionally do REFC and DECC checks on every iteration and migrate VtableStubs::create_itable_stub() and MethodHandles::generate_method_handle_dispatch() to it. Best regards, Vladimir Ivanov On 14.09.2020 13:52, kuaiwei wrote: > Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type > checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as > holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only > one klass. If not, a slow loop is used to checking both klasses. > > Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go > through itable once and reduce memory operations. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8253049 > > ------------- > > Commit messages: > - 8253049: Enhance itable_stub for AArch64 and x86_64 > > Changes: https://git.openjdk.java.net/jdk/pull/128/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 > Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod > Patch: https://git.openjdk.java.net/jdk/pull/128.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 > > PR: https://git.openjdk.java.net/jdk/pull/128 > From jbhateja at openjdk.java.net Mon Sep 14 13:18:39 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 14 Sep 2020 13:18:39 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Mon, 14 Sep 2020 05:01:24 GMT, Jatin Bhateja wrote: >> Adding a new product flag requires a CSR request to be filed. > >> /csr needed >> >> Adding a new product flag requires a CSR request to be filed. > > @dholmes-ora , with https://github.com/openjdk/jdk/commit/5144190e there has been a clean up of options and product > options now accept DIAGNOSTIC as an additional parameter. Newly added flag is a DIAGNOSTIC flag. > Mailing list message from Andrew Haley on hotspot-dev: > On 13/09/2020 20:12, Jatin Bhateja wrote: > > 1) Partial in-lining technique avoids call overhead penalty for > sub-word type small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length > either calls an array-copy stub or executes an optimized instruction > sequence using AVX-512 masked instructions emitted at the call site. > > This may not be a good idea. See my reply at > https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043114.html > https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043155.html Frequency level switchover is sensitive to vector size, this has been taken care of by using a 32 byte vector masked operations in default mode. Default value of ArrayCopyPartialInlineSize is 32 i.e. copy sizes b/w 1-32 are partially in lined at the call site using masked vector moves operating over YMM registers. Only if user sets it to 64 we use ZMMs registers which forces a frequency level switch over to a lower frequency level (LVL1). So an AVX512 lite instruction working over a 32 byte vector (YMM) will operate a maximum frequency level (LVL0). > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From shade at openjdk.java.net Mon Sep 14 13:22:16 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 14 Sep 2020 13:22:16 GMT Subject: RFR: 8253002: Remove the unused SafePointNode::_oop_map field In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 10:17:42 GMT, Jorn Vernee wrote: > Hi, > > I've been looking a lot at the code for generating oop maps for call nodes lately, and noticed that SafePointNode had > an oopMap field that was unused (which led to some confusion as to where the oop map was actually set). > The oop map is instead generated and set in buildOopMap OopFlow::compute_reach after matching: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/buildOopMap.cpp#L122 So the field on the ideal node > is unused. This patch removes the field and cleans up related code. I've left a comment in SafePointNode to point > people looking for the oop map at buildOopMap.cpp > Thanks, > Jorn > > Testing: local build + tier1,tier2,tier3 There is also a forward declaration of `class OopMap;` in `callnode.hpp`, do we still need it? src/hotspot/share/opto/callnode.hpp line 339: > 337: > 338: // There is no OopMap field, the oop map is set after matching in > 339: // OopFlow::compute_reach on the MachSafePointNode. (See buildOopMap.cpp) I don't think we need this comment. I think it would bitrot eventually. It seems the code to construct `OopMap` is easily discoverable already. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/109 From aph at redhat.com Mon Sep 14 16:43:53 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 14 Sep 2020 17:43:53 +0100 Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: <5fa811bb-59f2-1b33-0b87-1eb82e69c136@redhat.com> On 14/09/2020 14:18, Jatin Bhateja wrote: > Frequency level switchover is sensitive to vector size, this has > been taken care of by using a 32 byte vector masked operations in > default mode. > > Default value of ArrayCopyPartialInlineSize is 32 i.e. copy sizes > b/w 1-32 are partially in lined at the call site using masked vector > moves operating over YMM registers. Only if user sets it to 64 we > use ZMMs registers which forces a frequency level switch over to a > lower frequency level (LVL1). > > So an AVX512 lite instruction working over a 32 byte vector (YMM) > will operate a maximum frequency level (LVL0). OK, as long as you're keeping watch on this issue. We really do not want all Java workloads to be running at lower frequency or higher power just because of some intrinsics. Sure, if we're doing high-power vector calculations that's fine. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Divino.Cesar at microsoft.com Mon Sep 14 18:51:42 2020 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Mon, 14 Sep 2020 18:51:42 +0000 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: <78ea86f9-a4e1-5a4b-b102-d06bcdb3aaf8@oracle.com> References: , <78ea86f9-a4e1-5a4b-b102-d06bcdb3aaf8@oracle.com> Message-ID: Hi, thank you all for the reviews & testing. I was away on vacation but now I can get back on this. Lindenmaier, can you please share the results of your tests? AFAIU, now that Openjdk moved to GitHub I should proceed to opening a Pull Request there instead of updating the Webrev, right? ________________________________ From: Tobias Hartmann Sent: September 3, 2020 6:29 AM To: Cesar Soares Lucas ; hotspot-compiler-dev at openjdk.java.net Cc: Brian Stafford ; Aditya Mandaleeka ; Christian Hagedorn Subject: Re: [16] RFR(S): 8250668: Clean up method_oop names in adlc Hi Cesar, looks good to me. constantPool.hpp:498 "Method" -> "Methods" Best regards, Tobias On 27.08.20 21:36, Cesar Soares Lucas wrote: > Hi there, > > RFE: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8250668&data=02%7C01%7CDivino.Cesar%40microsoft.com%7C6ae31c84e1e342f3bb0d08d8500dbbae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637347367198983514&sdata=6g9FfaOtypFidx%2Fdib5Tu0OsW6zkP9pZT82sZZ5PdzU%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcesar%2F8250668%2F0%2F&data=02%7C01%7CDivino.Cesar%40microsoft.com%7C6ae31c84e1e342f3bb0d08d8500dbbae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637347367198983514&sdata=Z3RsgcW%2FQA9bbyTLveXv8WsQ7xDx2iz7OPY5Aqgvi04%3D&reserved=0 > Need sponsor: Yes > Tested on: Windows/Linux/MacOS tiers 1-3 > > can I please get some reviews for the Webrev linked above? The work > consists of renaming "method_oop" ocurrences all around the code > base to just "method". I've tested this on x86_64 only?* Can someone > please help testing on other architectures as well: x86_32, PPC, > ARM32/64, S390? > > > Thank you, > Cesar > From dholmes at openjdk.java.net Tue Sep 15 02:25:13 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 15 Sep 2020 02:25:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Mon, 14 Sep 2020 13:16:02 GMT, Jatin Bhateja wrote: >>> /csr needed >>> >>> Adding a new product flag requires a CSR request to be filed. >> >> @dholmes-ora , with https://github.com/openjdk/jdk/commit/5144190e there has been a clean up of options and product >> options now accept DIAGNOSTIC as an additional parameter. Newly added flag is a DIAGNOSTIC flag. > >> Mailing list message from Andrew Haley on hotspot-dev: >> On 13/09/2020 20:12, Jatin Bhateja wrote: >> >> 1) Partial in-lining technique avoids call overhead penalty for >> sub-word type small array copy operations with size less than 32 >> bytes. 2) At runtime, a conditional check based on copy length >> either calls an array-copy stub or executes an optimized instruction >> sequence using AVX-512 masked instructions emitted at the call site. >> >> This may not be a good idea. See my reply at >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043114.html >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/043155.html > > Frequency level switchover is sensitive to vector size, this has been taken care of by using a 32 byte vector masked > operations in default mode. > Default value of ArrayCopyPartialInlineSize is 32 i.e. copy sizes b/w 1-32 are partially in lined at the call site > using masked vector moves operating over YMM registers. Only if user sets it to 64 we use ZMMs registers which forces > a frequency level switch over to a lower frequency level (LVL1). > So an AVX512 lite instruction working over a 32 byte vector (YMM) will operate a maximum frequency level (LVL0). > >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > > > /csr needed > > Adding a new product flag requires a CSR request to be filed. > > @dholmes-ora , with [5144190](https://github.com/openjdk/jdk/commit/5144190e) there has been a clean up of options and > product options now accept DIAGNOSTIC as an additional parameter. Newly added flag is a DIAGNOSTIC flag. Apologies for that. Yes I got caught out by the new format. ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From github.com+1981974+kuaiwei at openjdk.java.net Tue Sep 15 07:14:32 2020 From: github.com+1981974+kuaiwei at openjdk.java.net (kuaiwei) Date: Tue, 15 Sep 2020 07:14:32 GMT Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 [v2] In-Reply-To: References: Message-ID: > Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type > checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as > holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only > one klass. If not, a slow loop is used to checking both klasses. Even entering in slow loop, new implementation can be > better than old one in some cases. Because new stub just need go through itable once and reduce memory operations. > bug: https://bugs.openjdk.java.net/browse/JDK-8253049 kuaiwei has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8253049: Enhance itable_stub for AArch64 and x86_64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/128/files - new: https://git.openjdk.java.net/jdk/pull/128/files/b2f12ccc..6d79d5ed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00-01 Stats: 126 lines in 1 file changed: 126 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/128.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 PR: https://git.openjdk.java.net/jdk/pull/128 From shade at openjdk.java.net Tue Sep 15 07:21:28 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Sep 2020 07:21:28 GMT Subject: RFR: 8253146: C2: Purge unused MachCallNode::_arg_size field Message-ID: While doing changes in related code, noticed that `MachCallNode::_arg_size` is computed needlessly, taking memory without a good reason. Let's purge it. Testing: text search for _argsize; Linux x86_64 {release,fastdebug,slowdebug} builds; Linux x86_64 tier1; Linux ppc64 fastdebug build ------------- Commit messages: - 8253146: C2: Purge unused MachCallNode::_arg_size field Changes: https://git.openjdk.java.net/jdk/pull/167/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=167&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253146 Stats: 9 lines in 3 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/167.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/167/head:pull/167 PR: https://git.openjdk.java.net/jdk/pull/167 From shade at openjdk.java.net Tue Sep 15 07:21:29 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Sep 2020 07:21:29 GMT Subject: RFR: 8253146: C2: Purge unused MachCallNode::_arg_size field In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 07:14:58 GMT, Aleksey Shipilev wrote: > While doing changes in related code, noticed that `MachCallNode::_arg_size` is computed needlessly, taking memory > without a good reason. Let's purge it. > Testing: text search for _argsize; Linux x86_64 {release,fastdebug,slowdebug} builds; Linux x86_64 tier1; Linux ppc64 > fastdebug build This touches `ppc.ad`, attention @GoeLin, @TheRealMDoerr. ------------- PR: https://git.openjdk.java.net/jdk/pull/167 From tobias.hartmann at oracle.com Tue Sep 15 07:24:37 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Sep 2020 09:24:37 +0200 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: References: <78ea86f9-a4e1-5a4b-b102-d06bcdb3aaf8@oracle.com> Message-ID: <619a2215-9525-22f3-61c6-d0f1cb5fe3e0@oracle.com> On 14.09.20 20:51, Cesar Soares Lucas wrote: > AFAIU, now that Openjdk moved to GitHub I should proceed to opening a > Pull Request there instead of updating the Webrev, right? Yes, please open a PR for this. Best regards, Tobias From thartmann at openjdk.java.net Tue Sep 15 07:34:14 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 15 Sep 2020 07:34:14 GMT Subject: RFR: 8253146: C2: Purge unused MachCallNode::_arg_size field In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 07:14:58 GMT, Aleksey Shipilev wrote: > While doing changes in related code, noticed that `MachCallNode::_arg_size` is computed needlessly, taking memory > without a good reason. Let's purge it. > Testing: text search for _argsize; Linux x86_64 {release,fastdebug,slowdebug} builds; Linux x86_64 tier1; Linux ppc64 > fastdebug build Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/167 From kuaiwei.kw at alibaba-inc.com Tue Sep 15 07:38:33 2020 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 15 Sep 2020 15:38:33 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODI1MzA0OTogRW5oYW5jZSBpdGFibGVfc3R1YiBmb3IgQUFyY2g2NCBhbmQg?= =?UTF-8?B?eDg2XzY0?= In-Reply-To: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> References: , <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> Message-ID: <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> Hi Vladimir, Thanks for your review. I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub. every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test. Below is test result on my machines, it looks slow loop has more improvement than origin one. aarch64: === testStubPoly3 === orig: 38430308.215 38438769.040 38325616.152 opt : 39425275.311 39626194.985 39374242.065 === testStubPoly5 === orig: 23227433.053 23210843.937 23212518.073 opt : 23805995.657 23797837.061 23861764.978 === testSlowStubPoly3 === orig: 30838750.839 30886603.202 30841314.152 opt : 36166775.967 36242733.807 36041506.263 === testSlowStubPoly5 === orig: 18713218.115 18706994.686 18686729.040 opt : 21827549.808 21836822.173 21861920.069 x86: === testStubPoly3 === orig: 36339726.912 36322863.060 36363196.132 opt : 38631086.341 38465649.400 38466044.926 === testStubPoly5 === orig: 22240149.674 22218724.450 22225970.358 opt : 23498941.840 23454580.221 23497053.570 === testSlowStubPoly3 === orig: 28693696.199 28700714.257 28587900.429 opt : 34187319.519 34171321.762 34138648.599 === testSlowStubPoly5 === orig: 17388480.977 17389247.386 17177206.666 opt : 20697609.518 20771108.051 20699215.655 I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch. My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate slow path implementation. How do you think about it? Thanks, Kevin ------------------------------------------------------------------ From:Vladimir Ivanov Send Time:2020?9?14?(???) 22:10 To:kuaiwei ; hotspot-dev ; hotspot-compiler-dev Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 Hi Kevin, Very interesting observations. I like the idea to optimize for the case when REFC == DECC. Fusing 2 passes over the itable into one does look attractive, but I'm not sure the proposed variant is correct. I suggest to split the patch into 2 enhancements and handle them separately. I'm curious what kind of benchmarks you used and what are the improvements observed with the patch. One suggestion about the implementation: src/hotspot/cpu/x86/macroAssembler_x86.cpp: +void MacroAssembler::lookup_interface_method_in_stub(Register recv_klass, I'd like to avoid having 2 independent implementations of itable lookup (MacroAssembler::lookup_interface_method_in_stub() and MacroAssembler::lookup_interface_method()). It would be nice to keep the implementation unified between itable and MethodHandle linkToInterface linker stubs. What MacroAssembler::lookup_interface_method(..., true /*return_method*/) does is interface method lookup w/o proper subtype check and it is equivalent to fast loop in MacroAssembler::lookup_interface_method_in_stub(). As a possible path forward, you could introduce the fast path check first by moving the fast path check into VtableStubs::create_itable_stub() and guard the first path over the itable. It would make the type checking pass over itable optional based on runtime check. Then you could refactor MacroAssembler::lookup_interface_method() to optionally do REFC and DECC checks on every iteration and migrate VtableStubs::create_itable_stub() and MethodHandles::generate_method_handle_dispatch() to it. Best regards, Vladimir Ivanov On 14.09.2020 13:52, kuaiwei wrote: > Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type > checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as > holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only > one klass. If not, a slow loop is used to checking both klasses. > > Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go > through itable once and reduce memory operations. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8253049 > > ------------- > > Commit messages: > - 8253049: Enhance itable_stub for AArch64 and x86_64 > > Changes: https://git.openjdk.java.net/jdk/pull/128/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 > Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod > Patch: https://git.openjdk.java.net/jdk/pull/128.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 > > PR: https://git.openjdk.java.net/jdk/pull/128 > From vladimir.x.ivanov at oracle.com Tue Sep 15 09:02:49 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Sep 2020 12:02:49 +0300 Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> References: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> Message-ID: <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> > I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub. > every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test. > Below is test result on my machines, it looks slow loop has more improvement than origin one. Good, thanks for the numbers. I'm curious have you observed any improvements on larger scale benchmarks or real world apps? I'm asking because linear scan is already far from optimal when there are many superinterfaces present. > I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch. > My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate > slow path implementation. How do you think about it? Frankly speaking, I'd like to avoid the duplication. Also, absence of guarantees about order of interfaces in the itable complicates things: REFC and DECC can be encountered in arbitrary order and the pass should take that into account. For example, I don't see early exit on success in slow variant, so every lookup has to go through the whole itable irrespective of whether it succeeds or fails. I attribute that to the complications induced by aforementioned aspect. And speaking of the overall approach (as it is implemented now), IMO increased complexity doesn't worth it. If interface calls become a bottleneck, the problem lies not in itable stub, but the overall design which requires linear scan over itables. It's better to put the effort there than micro-optimizing the stub. But I'm happy to change my mind if the rewritten implementation makes it easier to reason about the code. (FTR subtype checks suffer from a similar problem: unless Klass::_secondary_super_cache catches it, subtype check for an interface does a linear scan over _secondary_supers array.) Best regards, Vladimir Ivanov > ------------------------------------------------------------------ > From:Vladimir Ivanov > Send Time:2020?9?14?(???) 22:10 > To:kuaiwei ; hotspot-dev ; hotspot-compiler-dev > Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 > > Hi Kevin, > > Very interesting observations. I like the idea to optimize for the case > when REFC == DECC. > > Fusing 2 passes over the itable into one does look attractive, but I'm > not sure the proposed variant is correct. I suggest to split the patch > into 2 enhancements and handle them separately. > > I'm curious what kind of benchmarks you used and what are the > improvements observed with the patch. > > One suggestion about the implementation: > > src/hotspot/cpu/x86/macroAssembler_x86.cpp: > > +void MacroAssembler::lookup_interface_method_in_stub(Register recv_klass, > > I'd like to avoid having 2 independent implementations of itable lookup > (MacroAssembler::lookup_interface_method_in_stub() and > MacroAssembler::lookup_interface_method()). It would be nice to keep the > implementation unified between itable and MethodHandle linkToInterface > linker stubs. > > What MacroAssembler::lookup_interface_method(..., true > /*return_method*/) does is interface method lookup w/o proper subtype > check and it is equivalent to fast loop in > MacroAssembler::lookup_interface_method_in_stub(). > > As a possible path forward, you could introduce the fast path check > first by moving the fast path check into > VtableStubs::create_itable_stub() and guard the first path over the > itable. It would make the type checking pass over itable optional based > on runtime check. > > Then you could refactor MacroAssembler::lookup_interface_method() to > optionally do REFC and DECC checks on every iteration and migrate > VtableStubs::create_itable_stub() and > MethodHandles::generate_method_handle_dispatch() to it. > > Best regards, > Vladimir Ivanov > > On 14.09.2020 13:52, kuaiwei wrote: >> Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type >> checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as >> holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only >> one klass. If not, a slow loop is used to checking both klasses. >> >> Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go >> through itable once and reduce memory operations. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8253049 >> >> ------------- >> >> Commit messages: >> - 8253049: Enhance itable_stub for AArch64 and x86_64 >> >> Changes: https://git.openjdk.java.net/jdk/pull/128/files >> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 >> Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 >> Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod >> Patch: https://git.openjdk.java.net/jdk/pull/128.diff >> Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 >> >> PR: https://git.openjdk.java.net/jdk/pull/128 >> > From aph at redhat.com Tue Sep 15 09:13:58 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Sep 2020 10:13:58 +0100 Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> References: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> Message-ID: On 15/09/2020 10:02, Vladimir Ivanov wrote: > And speaking of the overall approach (as it is implemented now), IMO > increased complexity doesn't worth it. If interface calls become a > bottleneck, the problem lies not in itable stub, but the overall design > which requires linear scan over itables. It's better to put the effort > there than micro-optimizing the stub. Indeed. When I first came to HotSpot after working on GCJ for years I was very surprised to see a linear scan used for interface dispatch. The code improvements look to be fairly minor. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dnsimon at openjdk.java.net Tue Sep 15 09:26:41 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 15 Sep 2020 09:26:41 GMT Subject: RFR: 8252518: cache result of CompilerToVM.getComponentType Message-ID: Linux perf profiles of CompileTheWorld with libgraal show that `CompilerToVM.getComponentType` is the most expensive JVMCI VM entry point with almost 2% of total execution time: + 1.87% 0.04% [.] c2v_getComponentType + 0.54% 0.00% [.] c2v_installCode 0.39% 0.00% [.] c2v_getResolvedJavaType0 0.04% 0.00% [.] c2v_resolvePossiblyCachedConstantInPool 0.03% 0.00% [.] c2v_interpreterFrameSize 0.03% 0.01% [.] c2v_isAssignableFrom 0.02% 0.00% [.] c2v_translate 0.01% 0.00% [.] c2v_getIdentityHashCode It's worth caching the result of this call. ------------- Commit messages: - 8252518: cache result of CompilerToVM.getComponentType Changes: https://git.openjdk.java.net/jdk/pull/172/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=172&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252518 Stats: 13 lines in 1 file changed: 8 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/172.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/172/head:pull/172 PR: https://git.openjdk.java.net/jdk/pull/172 From jvernee at openjdk.java.net Tue Sep 15 09:29:12 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 15 Sep 2020 09:29:12 GMT Subject: RFR: 8253002: Remove the unused SafePointNode::_oop_map field [v2] In-Reply-To: References: Message-ID: > Hi, > > I've been looking a lot at the code for generating oop maps for call nodes lately, and noticed that SafePointNode had > an oopMap field that was unused (which led to some confusion as to where the oop map was actually set). > The oop map is instead generated and set in buildOopMap OopFlow::compute_reach after matching: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/buildOopMap.cpp#L122 So the field on the ideal node > is unused. This patch removes the field and cleans up related code. I've left a comment in SafePointNode to point > people looking for the oop map at buildOopMap.cpp > Thanks, > Jorn > > Testing: local build + tier1,tier2,tier3 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Review comments: - Remove comment in SafePointNode - Remove forward declaration of class OopMap ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/109/files - new: https://git.openjdk.java.net/jdk/pull/109/files/05bdc4dc..d636dcd3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=109&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=109&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/109/head:pull/109 PR: https://git.openjdk.java.net/jdk/pull/109 From jvernee at openjdk.java.net Tue Sep 15 09:29:14 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 15 Sep 2020 09:29:14 GMT Subject: RFR: 8253002: Remove the unused SafePointNode::_oop_map field [v2] In-Reply-To: References: Message-ID: On Mon, 14 Sep 2020 13:19:20 GMT, Aleksey Shipilev wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments: >> - Remove comment in SafePointNode >> - Remove forward declaration of class OopMap > > There is also a forward declaration of `class OopMap;` in `callnode.hpp`, do we still need it? @shipilev Thanks for the review! > There is also a forward declaration of class OopMap; in callnode.hpp, do we still need it? Doesn't look like it, good catch! > I don't think we need this comment. I think it would bitrot eventually. It seems the code to construct OopMap is easily > discoverable already. Yeah, good point, I've removed it. I've ran a build on Windows and Linux (WSL). Still builds with the new changes, I hope that's enough to test them. ------------- PR: https://git.openjdk.java.net/jdk/pull/109 From thartmann at openjdk.java.net Tue Sep 15 09:34:59 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 15 Sep 2020 09:34:59 GMT Subject: RFR: 8253002: Remove the unused SafePointNode::_oop_map field [v2] In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 09:29:12 GMT, Jorn Vernee wrote: >> Hi, >> >> I've been looking a lot at the code for generating oop maps for call nodes lately, and noticed that SafePointNode had >> an oopMap field that was unused (which led to some confusion as to where the oop map was actually set). >> The oop map is instead generated and set in buildOopMap OopFlow::compute_reach after matching: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/buildOopMap.cpp#L122 So the field on the ideal node >> is unused. This patch removes the field and cleans up related code. I've left a comment in SafePointNode to point >> people looking for the oop map at buildOopMap.cpp >> Thanks, >> Jorn >> >> Testing: local build + tier1,tier2,tier3 > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Review comments: > - Remove comment in SafePointNode > - Remove forward declaration of class OopMap Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/109 From kuaiwei.kw at alibaba-inc.com Tue Sep 15 10:04:10 2020 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 15 Sep 2020 18:04:10 +0800 Subject: =?UTF-8?B?UmU6IFJGUjogODI1MzA0OTogRW5oYW5jZSBpdGFibGVfc3R1YiBmb3IgQUFyY2g2NCBhbmQg?= =?UTF-8?B?eDg2XzY0?= In-Reply-To: <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> References: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com>, <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> Message-ID: <8dc563d3-7aff-48cc-b44f-30171562c208.kuaiwei.kw@alibaba-inc.com> Thanks for your quick reply. > I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub. > every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test. > Below is test result on my machines, it looks slow loop has more improvement than origin one. Good, thanks for the numbers. I'm curious have you observed any improvements on larger scale benchmarks or real world apps? I'm asking because linear scan is already far from optimal when there are many superinterfaces present. Kevin: itable_stub was found hot on several online applications. So I started to work on this. Now I don't have chance to verify it online. So I uses microbenchmarks to verify. I will test with some benchmarks. > I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch. > My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate > slow path implementation. How do you think about it? Frankly speaking, I'd like to avoid the duplication. Kevin: Ok, I will try to merge them. Also, absence of guarantees about order of interfaces in the itable complicates things: REFC and DECC can be encountered in arbitrary order and the pass should take that into account. For example, I don't see early exit on success in slow variant, so every lookup has to go through the whole itable irrespective of whether it succeeds or fails. I attribute that to the complications induced by aforementioned aspect. Kevin: I use a counter for matching. If it reaches zero, the iteration can exit early. And speaking of the overall approach (as it is implemented now), IMO increased complexity doesn't worth it. If interface calls become a bottleneck, the problem lies not in itable stub, but the overall design which requires linear scan over itables. It's better to put the effort there than micro-optimizing the stub. Kevin: I agree we can improve itable design. My initial think is jvm may reorder itable at safepoint. I can take it as a follow up optimization. But I'm happy to change my mind if the rewritten implementation makes it easier to reason about the code. (FTR subtype checks suffer from a similar problem: unless Klass::_secondary_super_cache catches it, subtype check for an interface does a linear scan over _secondary_supers array.) Regards, Kevin ------------------------------------------------------------------ From:Vladimir Ivanov Send Time:2020?9?15?(???) 17:03 To:??(??) ; hotspot-dev ; kuaiwei ; hotspot-dev ; hotspot-compiler-dev Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 > I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub. > every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test. > Below is test result on my machines, it looks slow loop has more improvement than origin one. Good, thanks for the numbers. I'm curious have you observed any improvements on larger scale benchmarks or real world apps? I'm asking because linear scan is already far from optimal when there are many superinterfaces present. > I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch. > My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate > slow path implementation. How do you think about it? Frankly speaking, I'd like to avoid the duplication. Also, absence of guarantees about order of interfaces in the itable complicates things: REFC and DECC can be encountered in arbitrary order and the pass should take that into account. For example, I don't see early exit on success in slow variant, so every lookup has to go through the whole itable irrespective of whether it succeeds or fails. I attribute that to the complications induced by aforementioned aspect. And speaking of the overall approach (as it is implemented now), IMO increased complexity doesn't worth it. If interface calls become a bottleneck, the problem lies not in itable stub, but the overall design which requires linear scan over itables. It's better to put the effort there than micro-optimizing the stub. But I'm happy to change my mind if the rewritten implementation makes it easier to reason about the code. (FTR subtype checks suffer from a similar problem: unless Klass::_secondary_super_cache catches it, subtype check for an interface does a linear scan over _secondary_supers array.) Best regards, Vladimir Ivanov > ------------------------------------------------------------------ > From:Vladimir Ivanov > Send Time:2020?9?14?(???) 22:10 > To:kuaiwei ; hotspot-dev ; hotspot-compiler-dev > Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 > > Hi Kevin, > > Very interesting observations. I like the idea to optimize for the case > when REFC == DECC. > > Fusing 2 passes over the itable into one does look attractive, but I'm > not sure the proposed variant is correct. I suggest to split the patch > into 2 enhancements and handle them separately. > > I'm curious what kind of benchmarks you used and what are the > improvements observed with the patch. > > One suggestion about the implementation: > > src/hotspot/cpu/x86/macroAssembler_x86.cpp: > > +void MacroAssembler::lookup_interface_method_in_stub(Register recv_klass, > > I'd like to avoid having 2 independent implementations of itable lookup > (MacroAssembler::lookup_interface_method_in_stub() and > MacroAssembler::lookup_interface_method()). It would be nice to keep the > implementation unified between itable and MethodHandle linkToInterface > linker stubs. > > What MacroAssembler::lookup_interface_method(..., true > /*return_method*/) does is interface method lookup w/o proper subtype > check and it is equivalent to fast loop in > MacroAssembler::lookup_interface_method_in_stub(). > > As a possible path forward, you could introduce the fast path check > first by moving the fast path check into > VtableStubs::create_itable_stub() and guard the first path over the > itable. It would make the type checking pass over itable optional based > on runtime check. > > Then you could refactor MacroAssembler::lookup_interface_method() to > optionally do REFC and DECC checks on every iteration and migrate > VtableStubs::create_itable_stub() and > MethodHandles::generate_method_handle_dispatch() to it. > > Best regards, > Vladimir Ivanov > > On 14.09.2020 13:52, kuaiwei wrote: >> Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type >> checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as >> holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only >> one klass. If not, a slow loop is used to checking both klasses. >> >> Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go >> through itable once and reduce memory operations. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8253049 >> >> ------------- >> >> Commit messages: >> - 8253049: Enhance itable_stub for AArch64 and x86_64 >> >> Changes: https://git.openjdk.java.net/jdk/pull/128/files >> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 >> Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 >> Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod >> Patch: https://git.openjdk.java.net/jdk/pull/128.diff >> Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128 >> >> PR: https://git.openjdk.java.net/jdk/pull/128 >> > From jbhateja at openjdk.java.net Tue Sep 15 10:26:04 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 15 Sep 2020 10:26:04 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v2] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size > less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes > an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update arraycopynode.cpp Missed safety check. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/144/files - new: https://git.openjdk.java.net/jdk/pull/144/files/1601fba2..f6c46479 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=144&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=144&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/144.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/144/head:pull/144 PR: https://git.openjdk.java.net/jdk/pull/144 From jcm at openjdk.java.net Tue Sep 15 10:59:06 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 15 Sep 2020 10:59:06 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v2] In-Reply-To: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: <_2RfxBOE39VhwtDZe2F2qLb52IfF_JiCWwE2cJsEuiM=.01bb1177-808a-45ea-a8bf-3dccfab6ea38@github.com> > Hi > > Moving the review that is based on mercurial repo to github. > The history of conversation is > [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) > Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) > > @dholmes-ora could you please have a look. Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: removing unused definition load_class_by_index ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/169/files - new: https://git.openjdk.java.net/jdk/pull/169/files/cfc2d719..1c0786a5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/169.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/169/head:pull/169 PR: https://git.openjdk.java.net/jdk/pull/169 From roland at openjdk.java.net Tue Sep 15 11:52:14 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 15 Sep 2020 11:52:14 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed Message-ID: Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html ------------- Commit messages: - Loop unswitching may cause out of bound array load to be executed Changes: https://git.openjdk.java.net/jdk/pull/176/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=176&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252696 Stats: 24 lines in 3 files changed: 1 ins; 6 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/176.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/176/head:pull/176 PR: https://git.openjdk.java.net/jdk/pull/176 From jvernee at openjdk.java.net Tue Sep 15 12:26:26 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 15 Sep 2020 12:26:26 GMT Subject: Integrated: 8253002: Remove the unused SafePointNode::_oop_map field In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 10:17:42 GMT, Jorn Vernee wrote: > Hi, > > I've been looking a lot at the code for generating oop maps for call nodes lately, and noticed that SafePointNode had > an oopMap field that was unused (which led to some confusion as to where the oop map was actually set). > The oop map is instead generated and set in buildOopMap OopFlow::compute_reach after matching: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/buildOopMap.cpp#L122 So the field on the ideal node > is unused. This patch removes the field and cleans up related code. I've left a comment in SafePointNode to point > people looking for the oop map at buildOopMap.cpp > Thanks, > Jorn > > Testing: local build + tier1,tier2,tier3 This pull request has now been integrated. Changeset: d219d8b9 Author: Jorn Vernee URL: https://git.openjdk.java.net/jdk/commit/d219d8b9 Stats: 8 lines in 2 files changed: 8 ins; 0 del; 0 mod 8253002: Remove the unused SafePointNode::_oop_map field Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/109 From neliasso at openjdk.java.net Tue Sep 15 13:18:30 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 15 Sep 2020 13:18:30 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy In-Reply-To: References: Message-ID: On Mon, 7 Sep 2020 14:28:18 GMT, Jatin Bhateja wrote: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Changes requested by neliasso (Reviewer). src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7972: > 7970: assert(MaxVectorSize >= 32, "vector length < 32"); > 7971: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; > 7972: if (use64byteVector == false) { Change to "!use64byteVector" src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8012: > 8010: assert(MaxVectorSize == 64 || MaxVectorSize == 32, "vector length mismatch"); > 8011: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; > 8012: if (use64byteVector == false) { Change to "!use64byteVector" src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8026: > 8024: assert(MaxVectorSize == 64 || MaxVectorSize == 32, "vector length mismatch"); > 8025: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; > 8026: if (use64byteVector == false) { Change to "!use64byteVector" src/hotspot/cpu/x86/vm_version_x86.cpp line 1167: > 1165: > 1166: if (!FLAG_IS_DEFAULT(AVX3Threshold)) { > 1167: if (AVX3Threshold !=0 && !is_power_of_2(AVX3Threshold)) { Missing space before '0' src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7970: > 7968: KRegister mask, Register length, Register temp, > 7969: BasicType type, int offset, bool use64byteVector) { > 7970: assert(MaxVectorSize >= 32, "vector length < 32"); Why does "MaxVectorSize >= 32" imply that "vector length < 32"? This assert appears in multiple locations. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7971: > 7969: BasicType type, int offset, bool use64byteVector) { > 7970: assert(MaxVectorSize >= 32, "vector length < 32"); > 7971: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; When do you expect AVX3Threshold to be 0? ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From neliasso at openjdk.java.net Tue Sep 15 13:54:29 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 15 Sep 2020 13:54:29 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v2] In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 10:26:04 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size >> less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes >> an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update arraycopynode.cpp > > Missed safety check. This PR includes the changes for JDK-8252847. It makes it hard to review. ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From vladimir.x.ivanov at oracle.com Tue Sep 15 15:36:38 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Sep 2020 18:36:38 +0300 Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: References: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> Message-ID: <52ff5a6e-f134-8cd5-94a3-be6e0d5549bf@oracle.com> >> And speaking of the overall approach (as it is implemented now), IMO >> increased complexity doesn't worth it. If interface calls become a >> bottleneck, the problem lies not in itable stub, but the overall design >> which requires linear scan over itables. It's better to put the effort >> there than micro-optimizing the stub. > > Indeed. When I first came to HotSpot after working on GCJ for years > I was very surprised to see a linear scan used for interface dispatch. FTR Erik? has been looking into rewriting virtual dispatch logic: http://openjdk.java.net/jeps/8221828 Best regards, Vladimir Ivanov > > The code improvements look to be fairly minor. > From vladimir.x.ivanov at oracle.com Tue Sep 15 17:11:27 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Sep 2020 20:11:27 +0300 Subject: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: <8dc563d3-7aff-48cc-b44f-30171562c208.kuaiwei.kw@alibaba-inc.com> References: <7640e033-f590-0b63-79e3-59dd5b96f55f@oracle.com> <522cf585-60c4-4662-bbbd-d51158d11f3a.kuaiwei.kw@alibaba-inc.com> <08236fa7-213b-a84f-a927-59b74baee5d8@oracle.com> <8dc563d3-7aff-48cc-b44f-30171562c208.kuaiwei.kw@alibaba-inc.com> Message-ID: >>????I?updated?my?test?cases?in?test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java?.?My?tests?will?not?inline?interface?methods?and?most?cpu?are?used?by?itable_stub. >>?every?test?will?run?10?warmup?iterations?and?5?measure?iterations?for?one?score.?I?took?3?score?for?every?test. >>????Below?is?test?result?on?my?machines,?it?looks?slow?loop?has?more?improvement?than?origin?one. > > Good,?thanks?for?the?numbers.?I'm?curious?have?you?observed?any > improvements?on?larger?scale?benchmarks?or?real?world?apps? > > I'm?asking?because?linear?scan?is?already?far?from?optimal?when?there > are?many?superinterfaces?present. > > Kevin: itable_stub was found hot on several online applications. So I > started to work on this. Now I don't have chance to verify it online. So > I uses microbenchmarks to verify. I will > test with some benchmarks. That's unfortunate. It would be very helpful to confirm the results of the micro-benchmarks (nano-, in this particular case). >>????I?think?lookup_interface_method?can?be?reused?as?fast?path.?And?it?is?also?used?by?templateTable::invoke_interface?and?generate_method_handle_dispatch. >>?My?implementation?in?slow?path?need?more?registers?(6?registers?so?far),?I?need?to?check?if?there's?register?conflict?in?these?methods.?I'd?like?to?keep?a?separate >>?slow?path?implementation.?How?do?you?think?about?it? > > Frankly?speaking,?I'd?like?to?avoid?the?duplication. > > Kevin: Ok, I will try to merge them. > > Also,?absence?of?guarantees?about?order?of?interfaces?in?the?itable > complicates?things:?REFC?and?DECC?can?be?encountered?in?arbitrary?order > and?the?pass?should?take?that?into?account.?For?example,?I?don't?see > early?exit?on?success?in?slow?variant,?so?every?lookup?has?to?go?through > the?whole?itable?irrespective?of?whether?it?succeeds?or?fails.?I > attribute?that?to?the?complications?induced?by?aforementioned?aspect. > > Kevin: I use a counter for matching. If it reaches zero, the iteration > can exit early. Good. Thanks for the clarification. Alternatively, you could use 2 bits in the temp register to code the state. IMO it's clearer and more robust w.r.t. possible bugs. Or even explicitly encode the state in the code as an automaton by generating 3 loop variants (check REFC + check DECC + check both). But IMO it falls into over-engineering category :-) Also, on naming: I find it hard to reason about the logic. Registers are re-used for different purposes and the names don't help at all (even adds to the confusion). As an example: movptr(method_result, Address(recv_klass, holder_klass, Address::times_1)); > And?speaking?of?the?overall?approach?(as?it?is?implemented?now),?IMO > increased?complexity?doesn't?worth?it.?If?interface?calls?become?a > bottleneck,?the?problem?lies?not?in?itable?stub,?but?the?overall?design > which?requires?linear?scan?over?itables.?It's?better?to?put?the?effort > there?than?micro-optimizing?the?stub. > > Kevin: I agree we can improve itable design. My initial think is jvm may > reorder itable at safepoint. I can take it as a follow up optimization. Well, I would definitely prefer to avoid additional runtime changes (to sort interfaces in itables and verify their order later) just to support minor improvements in itable stubs. Best regards, Vladimir Ivanov > ------------------------------------------------------------------ > From:Vladimir Ivanov > Send Time:2020?9?15?(???) 17:03 > To:??(??) ; hotspot-dev > ; kuaiwei > ; hotspot-dev > ; hotspot-compiler-dev > > Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64 > > > >????I?updated?my?test?cases?in?test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java?.?My?tests?will?not?inline?interface?methods?and?most?cpu?are?used?by?itable_stub. > >?every?test?will?run?10?warmup?iterations?and?5?measure?iterations?for?one?score.?I?took?3?score?for?every?test. > >????Below?is?test?result?on?my?machines,?it?looks?slow?loop?has?more?improvement?than?origin?one. > > Good,?thanks?for?the?numbers.?I'm?curious?have?you?observed?any > improvements?on?larger?scale?benchmarks?or?real?world?apps? > > I'm?asking?because?linear?scan?is?already?far?from?optimal?when?there > are?many?superinterfaces?present. > > >????I?think?lookup_interface_method?can?be?reused?as?fast?path.?And?it?is?also?used?by?templateTable::invoke_interface?and?generate_method_handle_dispatch. > >?My?implementation?in?slow?path?need?more?registers?(6?registers?so?far),?I?need?to?check?if?there's?register?conflict?in?these?methods.?I'd?like?to?keep?a?separate > >?slow?path?implementation.?How?do?you?think?about?it? > > Frankly?speaking,?I'd?like?to?avoid?the?duplication. > > Also,?absence?of?guarantees?about?order?of?interfaces?in?the?itable > complicates?things:?REFC?and?DECC?can?be?encountered?in?arbitrary?order > and?the?pass?should?take?that?into?account.?For?example,?I?don't?see > early?exit?on?success?in?slow?variant,?so?every?lookup?has?to?go?through > > the?whole?itable?irrespective?of?whether?it?succeeds?or?fails.?I > attribute?that?to?the?complications?induced?by?aforementioned?aspect. > > And?speaking?of?the?overall?approach?(as?it?is?implemented?now),?IMO > increased?complexity?doesn't?worth?it.?If?interface?calls?become?a > bottleneck,?the?problem?lies?not?in?itable?stub,?but?the?overall?design > which?requires?linear?scan?over?itables.?It's?better?to?put?the?effort > there?than?micro-optimizing?the?stub. > > But?I'm?happy?to?change?my?mind?if?the?rewritten?implementation?makes?it > > easier?to?reason?about?the?code. > > (FTR?subtype?checks?suffer?from?a?similar?problem:?unless > Klass::_secondary_super_cache?catches?it,?subtype?check?for?an?interface > > does?a?linear?scan?over?_secondary_supers?array.) > > Best?regards, > Vladimir?Ivanov > > > >?------------------------------------------------------------------ > >?From:Vladimir?Ivanov? > >?Send?Time:2020?9?14?(???)?22:10 > >?To:kuaiwei?;?hotspot-dev?;?hotspot-compiler-dev? > >?Subject:Re:?RFR:?8253049:?Enhance?itable_stub?for?AArch64?and?x86_64 > > > >?Hi?Kevin, > > > >?Very?interesting?observations.?I?like?the?idea?to?optimize?for?the?case > >?when?REFC?==?DECC. > > > >?Fusing?2?passes?over?the?itable?into?one?does?look?attractive,?but?I'm > >?not?sure?the?proposed?variant?is?correct.?I?suggest?to?split?the?patch > >?into?2?enhancements?and?handle?them?separately. > > > >?I'm?curious?what?kind?of?benchmarks?you?used?and?what?are?the > >?improvements?observed?with?the?patch. > > > >?One?suggestion?about?the?implementation: > > > >?src/hotspot/cpu/x86/macroAssembler_x86.cpp: > > > >?+void?MacroAssembler::lookup_interface_method_in_stub(Register?recv_klass, > > > >?I'd?like?to?avoid?having?2?independent?implementations?of?itable?lookup > >?(MacroAssembler::lookup_interface_method_in_stub()?and > >?MacroAssembler::lookup_interface_method()).?It?would?be?nice?to?keep?the > >?implementation?unified?between?itable?and?MethodHandle?linkToInterface > >?linker?stubs. > > > >?What?MacroAssembler::lookup_interface_method(...,?true > >?/*return_method*/)?does?is?interface?method?lookup?w/o?proper?subtype > >?check?and?it?is?equivalent?to?fast?loop?in > >?MacroAssembler::lookup_interface_method_in_stub(). > > > >?As?a?possible?path?forward,?you?could?introduce?the?fast?path?check > >?first?by?moving?the?fast?path?check?into > >?VtableStubs::create_itable_stub()?and?guard?the?first?path?over?the > >?itable.?It?would?make?the?type?checking?pass?over?itable?optional?based > >?on?runtime?check. > > > >?Then?you?could?refactor?MacroAssembler::lookup_interface_method()?to > >?optionally?do?REFC?and?DECC?checks?on?every?iteration?and?migrate > >?VtableStubs::create_itable_stub()??and > >?MethodHandles::generate_method_handle_dispatch()?to?it. > > > >?Best?regards, > >?Vladimir?Ivanov > > > >?On?14.09.2020?13:52,?kuaiwei?wrote: > >>?Now?itable_stub?will?go?through?instanceKlass's?itable?twice?to?look?up?a?method?entry.?resolved?klass?is?used?for?type > >>?checking?and?method?holder?klass?is?used?to?find?method?entry.?In?many?cases?,?we?observed?resolved?klass?is?as?same?as > >>?holder?klass.?So?we?can?improve?itable?stub?based?on?it.?If?they?are?same?klass,?stub?uses?a?fast?loop?to?check?only > >>?one?klass.?If?not,?a?slow?loop?is?used?to?checking?both?klasses. > >> > >>?Even?entering?in?slow?loop,?new?implementation?can?be?better?than?old?one?in?some?cases.?Because?new?stub?just?need?go > >>?through?itable?once?and?reduce?memory?operations. > >> > >> > >>?bug: https://bugs.openjdk.java.net/browse/JDK-8253049 > >> > >>?------------- > >> > >>?Commit?messages: > >>????-?8253049:?Enhance?itable_stub?for?AArch64?and?x86_64 > >> > >>?Changes: https://git.openjdk.java.net/jdk/pull/128/files > >>????Webrev: > https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00 > >>?????Issue: https://bugs.openjdk.java.net/browse/JDK-8253049 > >>?????Stats:?220?lines?in?7?files?changed:?172?ins;?35?del;?13?mod > >>?????Patch: https://git.openjdk.java.net/jdk/pull/128.diff > >>?????Fetch:?git?fetch > https://git.openjdk.java.net/jdk?pull/128/head:pull/128 > >> > >>?PR: https://git.openjdk.java.net/jdk/pull/128 > >> > > > > From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 15 17:49:18 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 15 Sep 2020 17:49:18 GMT Subject: RFR: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros Message-ID: Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which compiler threads should not receive anyway. ------------- Commit messages: - 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros Changes: https://git.openjdk.java.net/jdk/pull/191/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=191&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252966 Stats: 26 lines in 3 files changed: 0 ins; 23 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/191.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/191/head:pull/191 PR: https://git.openjdk.java.net/jdk/pull/191 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 15 17:49:20 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 15 Sep 2020 17:49:20 GMT Subject: RFR: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 17:41:07 GMT, Roberto Casta?eda Lozano wrote: > Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with > `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which > compiler threads should not receive anyway. Remove the KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros, replacing uses of KILL_COMPILE_ON_FATAL_ with CHECK_AND_CLEAR_. Unlike KILL_COMPILE_ON_FATAL_, CHECK_AND_CLEAR_ ignores ThreadDeath exceptions, which compiler threads should not receive anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/191 From divcesar at gmail.com Tue Sep 15 18:14:48 2020 From: divcesar at gmail.com (=?UTF-8?B?Q8Opc2Fy?=) Date: Tue, 15 Sep 2020 11:14:48 -0700 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc In-Reply-To: <619a2215-9525-22f3-61c6-d0f1cb5fe3e0@oracle.com> References: <78ea86f9-a4e1-5a4b-b102-d06bcdb3aaf8@oracle.com> <619a2215-9525-22f3-61c6-d0f1cb5fe3e0@oracle.com> Message-ID: Done that, here it is: https://github.com/openjdk/jdk/pull/164 C?sar. On Tue, Sep 15, 2020 at 12:25 AM Tobias Hartmann wrote: > > On 14.09.20 20:51, Cesar Soares Lucas wrote: > > AFAIU, now that Openjdk moved to GitHub I should proceed to opening a > > Pull Request there instead of updating the Webrev, right? > > Yes, please open a PR for this. > > Best regards, > Tobias > From vlivanov at openjdk.java.net Tue Sep 15 18:41:09 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 15 Sep 2020 18:41:09 GMT Subject: RFR: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros In-Reply-To: References: Message-ID: <7p6wzwK9775fV-yNlWGPei55AYmZOzvWio4Kivw3KRc=.c76ed6bc-dee9-4dcb-9dcf-be5c03429a28@github.com> On Tue, 15 Sep 2020 17:41:07 GMT, Roberto Casta?eda Lozano wrote: > Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with > `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which > compiler threads should not receive anyway. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/191 From neliasso at openjdk.java.net Tue Sep 15 18:52:58 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 15 Sep 2020 18:52:58 GMT Subject: RFR: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 17:41:07 GMT, Roberto Casta?eda Lozano wrote: > Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with > `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which > compiler threads should not receive anyway. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/191 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 15 18:58:50 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 15 Sep 2020 18:58:50 GMT Subject: RFR: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 18:50:16 GMT, Nils Eliasson wrote: >> Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with >> `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which >> compiler threads should not receive anyway. > > Looks good! Thank you Vladimir and Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/191 From github.com+2249648+JohnTortugo at openjdk.java.net Tue Sep 15 19:19:25 2020 From: github.com+2249648+JohnTortugo at openjdk.java.net (John Tortugo) Date: Tue, 15 Sep 2020 19:19:25 GMT Subject: RFR: 8253040 : Remove unused Matcher::regnum_to_fpu_offset() Message-ID: Relates to: https://bugs.openjdk.java.net/browse/JDK-8253040 Tested on: x86_64 - Linux - Tier1 Remove unused method `Matcher::regnum_to_fpu_offset()` from source base. ------------- Commit messages: - revert adding test files - Remove unused method Matcher::regnum_to_fpu_offset Changes: https://git.openjdk.java.net/jdk/pull/194/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=194&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253040 Stats: 33 lines in 7 files changed: 0 ins; 33 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/194.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/194/head:pull/194 PR: https://git.openjdk.java.net/jdk/pull/194 From adityam at openjdk.java.net Tue Sep 15 20:22:39 2020 From: adityam at openjdk.java.net (Aditya Mandaleeka) Date: Tue, 15 Sep 2020 20:22:39 GMT Subject: RFR: 8253040 : Remove unused Matcher::regnum_to_fpu_offset() In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 19:10:29 GMT, John Tortugo wrote: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8253040 > Tested on: x86_64 - Linux - Tier1 > > Remove unused method `Matcher::regnum_to_fpu_offset()` from source base. LGTM ------------- Marked as reviewed by adityam (Author). PR: https://git.openjdk.java.net/jdk/pull/194 From serguei.spitsyn at oracle.com Tue Sep 15 20:28:50 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 15 Sep 2020 13:28:50 -0700 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <682ee88d-097a-df57-7374-b3413b7964fd@oracle.com> <3ae58a8e-405a-d98c-79c5-c6a0bdf5cc27@oracle.com> <96ad21a3-cae4-2218-b047-6912e6a07b21@oracle.com> Message-ID: <0686c4e5-ee04-9da7-e88e-a6730d69c6a9@oracle.com> Hi Richard, This is on my review list. I'll try to get it reviewed by the end of this week. Thanks, Serguei On 9/8/20 10:02, Reingruber, Richard wrote: > Hello Marty, > > Sure. I'd be happy if Serguei could review the change. > > Thanks, Richard. > > -----Original Message----- > From: Marty Thompson > Sent: Dienstag, 8. September 2020 18:55 > To: Reingruber, Richard ; Daniel Daugherty ; serviceability-dev ; hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents > > Hello Richard, > > It would be good if Serguei Spitsyn could review before this is pushed. Serguei is out this week. Can you wait until Serguei is back in the office the week of Sept 14? > > Regards, > > Marty > >> -----Original Message----- >> From: Reingruber, Richard >> Sent: Tuesday, September 8, 2020 9:45 AM >> To: Daniel Daugherty ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Dan, >> >> I'd be very happy about a review from somebody on the Serviceability team. >> I have asked for reviews many times (kindly I hope). And the change is for >> review for more than a year now. >> >> According to [1] I'd think all requirements to push are met already. But >> maybe I missed something? >> >> After renaming of methods in SafepointMechanism the change needs to be >> rebased (already done). I'll publish a pull request as soon as possible. >> >> Thanks, Richard. >> >> [1] >> https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change >> >> -----Original Message----- >> From: Daniel D. Daugherty >> Sent: Dienstag, 8. September 2020 18:16 >> To: Reingruber, Richard ; serviceability-dev >> ; hotspot-compiler- >> dev at openjdk.java.net; Hotspot dev runtime > dev at openjdk.java.net> >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi Richard, >> >> I haven't seen a review from anyone on the Serviceability team and I think >> you should get a review from them since JVM/TI is involved. >> Perhaps I missed it... >> >> Dan >> >> >> On 9/7/20 10:09 AM, Reingruber, Richard wrote: >>> Hi, >>> >>> I would like to close the review of this change. >>> >>> It has received a lot of helpful feedback during the process and 2 >>> full Reviews. Thanks everybody! >>> >>> I'm planning to push it this week on Thursday as solution for JBS items: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>> https://bugs.openjdk.java.net/browse/JDK-8233915 >>> >>> Version to be pushed: >>> >>> http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ >>> >>> Hope to get my GIT/Skara setup going until then... :) >>> >>> Thanks, Richard. >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On Behalf Of Reingruber, >>> Richard >>> Sent: Mittwoch, 2. September 2020 23:27 >>> To: Robbin Ehn ; serviceability-dev >>> ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: [CAUTION] RE: RFR(L) 8227745: Enable Escape Analysis for >>> Better Performance in the Presence of JVMTI Agents >>> >>> Hi Robin, >>> >>>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>>> Hi Robbin, >>>>> >>>>> // taking the discussion back to the mailing lists >>>>> >>>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>>> > handshake/safepoint instead? >>>> So for handshakes using asynch handshake and allowing blocking inside >>>> would fix that. (future fix, I'm working on that now) >>> Just to make it clear: I'm not fond of the extra suspension mechanism >>> currently used for JDK-8227745 either. I want to get rid of it and I >>> will work on it. Asynch handshakes (JDK-8238761) could be a >>> replacement for it. At least I think they can be used to suspend the target >> thread. >>>> For safepoint, since we have suspended all threads, ~'safepointed them' >>>> with a JavaThread, you _could_ just execute the action directly (e.g. >>>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>>> safely suspended until the destructor of EB, no? >>> Yes, this should be possible. This would be an advanced change though. >>> I would like EscapeBarriers to be a no-op and fall back to current >>> implementation, if C2-EscapeAnalysis/Graal are disabled. >>> >>>> So I suggest future work to instead just execute the safepoint with >>>> the requesting JT instead of having a this special safepoiting mechanism. >>>> Since you are missing above functionality I see why you went this way. >>>> If you need to push it, it's fine by me. >>> We will work on further improvements. Top of the list would be >>> eliminating the extra suspend mechanism. >>> >>> The implementation has matured for more than 12 months now [1]. It's >>> been tested extensively at SAP over that time and passed also extended >>> testing at Oracle kindly conducted by Vladimir Kozlov. We've got two >>> full Reviews and incorporated extensive feedback from a number of >>> OpenJDK Reviewers (including you, thanks!). Based on that I reckon >>> we're good to push the change as enhancement >>> (JDK-8227745) and bug fix (JDK-8233915). >>> >>>> Thanks for explaining once again :) >>> Pleasure :) >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/02 >>> 8729.html >>> >>> -----Original Message----- >>> From: Robbin Ehn >>> Sent: Mittwoch, 2. September 2020 16:54 >>> To: Reingruber, Richard ; >>> serviceability-dev ; >>> hotspot-compiler-dev at openjdk.java.net; Hotspot dev runtime >>> >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>> Performance in the Presence of JVMTI Agents >>> >>> Hi Richard, >>> >>> On 2020-09-02 15:48, Reingruber, Richard wrote: >>>> Hi Robbin, >>>> >>>> // taking the discussion back to the mailing lists >>>> >>>> > I still don't understand why you don't deoptimize the objects inside >> the >>>> > handshake/safepoint instead? >>> So for handshakes using asynch handshake and allowing blocking inside >>> would fix that. (future fix, I'm working on that now) >>> >>> For safepoint, since we have suspended all threads, ~'safepointed them' >>> with a JavaThread, you _could_ just execute the action directly (e.g. >>> skipping VM_HeapWalkOperation safepoint) since they are suppose to be >>> safely suspended until the destructor of EB, no? >>> >>> So I suggest future work to instead just execute the safepoint with >>> the requesting JT instead of having a this special safepoiting mechanism. >>> >>> Since you are missing above functionality I see why you went this way. >>> If you need to push it, it's fine by me. >>> >>> Thanks for explaining once again :) >>> >>> /Robbin >>> >>>> This is unfortunately not possible. Deoptimizing objects includes >>>> reallocating scalar replaced objects, i.e. calling >>>> Deoptimization::realloc_objects(). This cannot be done at a safepoint or >> handshake. >>>> 1. The vm thread is not allowed to allocate on the java heap >>>> See for instance assertions in ParallelScavengeHeap::mem_allocate() >>>> >>>> >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/4c73e >> 045ce815d52abcdc99499266ccf2e6e9b4c/src/hotspot/share/gc/parallel/par >> allelScavengeHeap.cpp*L258__;Iw!!GqivPVa7Brio!K0f5chjtePI6MKBSBOoBKy >> a >>>> 9YZTJlVhsExQYMDO96v3Af_Klc_E4R26_dSyowotF$ >>>> >>>> This is not easy to change, I suppose, because it will be difficult to gc if >>>> necessary. >>>> >>>> 2. Using a direct handshake would not work either. The problem there is >> again >>>> gc. Let J be the JavaThread that is executing the direct handshake. The >> vm >>>> would deadlock if the vm thread waits for J to execute the closure of a >>>> handshake-all and J waits for the vm thread to execute a gc vm >> operation. >>>> Patricio Chilano made me aware of this: >>>> https://bugs.openjdk.java.net/browse/JDK-8230594 >>>> >>>> Cheers, Richard. >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn >>>> Sent: Mittwoch, 2. September 2020 13:56 >>>> To: Reingruber, Richard >>>> Cc: Lindenmaier, Goetz ; Vladimir Kozlov >>>> ; David Holmes >> >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>> Performance in the Presence of JVMTI Agents >>>> >>>> Hi, >>>> >>>> I still don't understand why you don't deoptimize the objects inside >>>> the handshake/safepoint instead? >>>> >>>> E.g. >>>> >>>> JvmtiEnv::GetOwnedMonitorInfo you only should need the execute the >>>> code >>>> from: >>>> eb.deoptimize_objects(MaxJavaStackTraceDepth)) before looping over >>>> the stack, so: >>>> >>>> void >>>> GetOwnedMonitorInfoClosure::do_thread(Thread *target) { >>>> assert(target->is_Java_thread(), "just checking"); >>>> JavaThread *jt = (JavaThread *)target; >>>> >>>> if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >>>> + if (EscapeBarrier::deoptimize_objects(jt, >>>> + MaxJavaStackTraceDepth)) { >>>> _result = >>>> ((JvmtiEnvBase*)_env)->get_owned_monitors(_calling_thread, jt, >>>> _owned_monitors_list); >>>> } else { >>>> _result = JVMTI_ERROR_OUT_OF_MEMORY; >>>> } >>>> } >>>> } >>>> >>>> Why try 'suspend' the thread first? >>>> >>>> >>>> When we de-optimize all threads why not just in the following safepoint? >>>> E.g. >>>> VM_HeapWalkOperation::doit() { >>>> + EscapeBarrier::deoptimize_objects_all_threads(); >>>> ... >>>> } >>>> >>>> Thanks, Robbin >>>> >>>> From vlivanov at openjdk.java.net Tue Sep 15 22:45:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 15 Sep 2020 22:45:57 GMT Subject: RFR: 8253040 : Remove unused Matcher::regnum_to_fpu_offset() In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 19:10:29 GMT, John Tortugo wrote: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8253040 > Tested on: x86_64 - Linux - Tier1 > > Remove unused method `Matcher::regnum_to_fpu_offset()` from source base. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/194 From github.com+2249648+JohnTortugo at openjdk.java.net Wed Sep 16 05:16:46 2020 From: github.com+2249648+JohnTortugo at openjdk.java.net (John Tortugo) Date: Wed, 16 Sep 2020 05:16:46 GMT Subject: RFR: 8253040 : Remove unused Matcher::regnum_to_fpu_offset() In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 22:43:06 GMT, Vladimir Ivanov wrote: >> Relates to: https://bugs.openjdk.java.net/browse/JDK-8253040 >> Tested on: x86_64 - Linux - Tier1 >> >> Remove unused method `Matcher::regnum_to_fpu_offset()` from source base. > > Looks good. @iwanowww - Can you please merge this PR? ------------- PR: https://git.openjdk.java.net/jdk/pull/194 From adityam at openjdk.java.net Wed Sep 16 06:20:02 2020 From: adityam at openjdk.java.net (Aditya Mandaleeka) Date: Wed, 16 Sep 2020 06:20:02 GMT Subject: RFR: 8253146: C2: Purge unused MachCallNode::_arg_size field In-Reply-To: References: Message-ID: <7CyQxEy3_ckoojsRqz6mWA8elqfjQAQYvpAC2wH_M5I=.b5132c68-32e5-4405-a36c-9c0ce64abeab@github.com> On Tue, 15 Sep 2020 07:14:58 GMT, Aleksey Shipilev wrote: > While doing changes in related code, noticed that `MachCallNode::_arg_size` is computed needlessly, taking memory > without a good reason. Let's purge it. > Testing: text search for _argsize; Linux x86_64 {release,fastdebug,slowdebug} builds; Linux x86_64 tier1; Linux ppc64 > fastdebug build Marked as reviewed by adityam (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/167 From shade at openjdk.java.net Wed Sep 16 06:42:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 16 Sep 2020 06:42:03 GMT Subject: Integrated: 8253146: C2: Purge unused MachCallNode::_arg_size field In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 07:14:58 GMT, Aleksey Shipilev wrote: > While doing changes in related code, noticed that `MachCallNode::_arg_size` is computed needlessly, taking memory > without a good reason. Let's purge it. > Testing: text search for _argsize; Linux x86_64 {release,fastdebug,slowdebug} builds; Linux x86_64 tier1; Linux ppc64 > fastdebug build This pull request has now been integrated. Changeset: 7c564e13 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/7c564e13 Stats: 9 lines in 3 files changed: 9 ins; 0 del; 0 mod 8253146: C2: Purge unused MachCallNode::_arg_size field Reviewed-by: thartmann, adityam ------------- PR: https://git.openjdk.java.net/jdk/pull/167 From github.com+2249648+JohnTortugo at openjdk.java.net Wed Sep 16 06:45:39 2020 From: github.com+2249648+JohnTortugo at openjdk.java.net (John Tortugo) Date: Wed, 16 Sep 2020 06:45:39 GMT Subject: Integrated: 8253040 : Remove unused Matcher::regnum_to_fpu_offset() In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 19:10:29 GMT, John Tortugo wrote: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8253040 > Tested on: x86_64 - Linux - Tier1 > > Remove unused method `Matcher::regnum_to_fpu_offset()` from source base. This pull request has now been integrated. Changeset: fbf4699d Author: Cesar Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/fbf4699d Stats: 33 lines in 7 files changed: 33 ins; 0 del; 0 mod 8253040: Remove unused Matcher::regnum_to_fpu_offset() Reviewed-by: adityam, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/194 From github.com+8792647+robcasloz at openjdk.java.net Wed Sep 16 06:50:34 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 16 Sep 2020 06:50:34 GMT Subject: Integrated: 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 17:41:07 GMT, Roberto Casta?eda Lozano wrote: > Remove the `KILL_COMPILE_ON_FATAL_` and `KILL_COMPILE_ON_ANY` macros, replacing uses of `KILL_COMPILE_ON_FATAL_` with > `CHECK_AND_CLEAR_`. Unlike `KILL_COMPILE_ON_FATAL_`, `CHECK_AND_CLEAR_` ignores `ThreadDeath` exceptions, which > compiler threads should not receive anyway. This pull request has now been integrated. Changeset: efe3540d Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/efe3540d Stats: 26 lines in 3 files changed: 23 ins; 0 del; 3 mod 8252966: CI: Remove KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros Remove the KILL_COMPILE_ON_FATAL_ and KILL_COMPILE_ON_ANY macros, replacing uses of KILL_COMPILE_ON_FATAL_ with CHECK_AND_CLEAR_. Unlike KILL_COMPILE_ON_FATAL_, CHECK_AND_CLEAR_ ignores ThreadDeath exceptions, which compiler threads should not receive anyway. Reviewed-by: vlivanov, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/191 From dholmes at openjdk.java.net Wed Sep 16 07:00:55 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 16 Sep 2020 07:00:55 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v2] In-Reply-To: <_2RfxBOE39VhwtDZe2F2qLb52IfF_JiCWwE2cJsEuiM=.01bb1177-808a-45ea-a8bf-3dccfab6ea38@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> <_2RfxBOE39VhwtDZe2F2qLb52IfF_JiCWwE2cJsEuiM=.01bb1177-808a-45ea-a8bf-3dccfab6ea38@github.com> Message-ID: <8uyHRtbTr67w0rqGE-VS-SCGrD0uVnBNV7rURU6WZII=.424ce1f3-98fa-4bbe-b971-9df6fdef239b@github.com> On Tue, 15 Sep 2020 10:59:06 GMT, Jamsheed Mohammed C M wrote: >> Hi >> >> Moving the review that is based on mercurial repo to github. >> The history of conversation is >> [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) >> Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) >> >> @dholmes-ora could you please have a look. > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > removing unused definition load_class_by_index Looks good to me! No further comments. src/hotspot/share/runtime/thread.cpp line 2392: > 2390: if (check_unsafe_error && > 2391: condition == _async_unsafe_access_error && !has_pending_exception()) { > 2392: // May be we are at method entry and requires to save do not unlock flag. Suggest: // We may be at method entry which requires we save the do-not-unlock flag. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Wed Sep 16 08:56:01 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 16 Sep 2020 08:56:01 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v2] In-Reply-To: <8uyHRtbTr67w0rqGE-VS-SCGrD0uVnBNV7rURU6WZII=.424ce1f3-98fa-4bbe-b971-9df6fdef239b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> <_2RfxBOE39VhwtDZe2F2qLb52IfF_JiCWwE2cJsEuiM=.01bb1177-808a-45ea-a8bf-3dccfab6ea38@github.com> <8uyHRtbTr67w0rqGE-VS-SCGrD0uVnBNV7rURU6WZII=.424ce1f3-98fa-4bbe-b971-9df6fdef239b@github.com> Message-ID: On Wed, 16 Sep 2020 06:56:31 GMT, David Holmes wrote: >> Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: >> >> removing unused definition load_class_by_index > > src/hotspot/share/runtime/thread.cpp line 2392: > >> 2390: if (check_unsafe_error && >> 2391: condition == _async_unsafe_access_error && !has_pending_exception()) { >> 2392: // May be we are at method entry and requires to save do not unlock flag. > > Suggest: > // We may be at method entry which requires we save the do-not-unlock flag. Thank you @dholmes-ora , Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Wed Sep 16 09:09:45 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 16 Sep 2020 09:09:45 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v2] In-Reply-To: <8uyHRtbTr67w0rqGE-VS-SCGrD0uVnBNV7rURU6WZII=.424ce1f3-98fa-4bbe-b971-9df6fdef239b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> <_2RfxBOE39VhwtDZe2F2qLb52IfF_JiCWwE2cJsEuiM=.01bb1177-808a-45ea-a8bf-3dccfab6ea38@github.com> <8uyHRtbTr67w0rqGE-VS-SCGrD0uVnBNV7rURU6WZII=.424ce1f3-98fa-4bbe-b971-9df6fdef239b@github.com> Message-ID: On Wed, 16 Sep 2020 06:58:18 GMT, David Holmes wrote: >> Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: >> >> removing unused definition load_class_by_index > > Looks good to me! No further comments. could you i get second review ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Wed Sep 16 09:09:43 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 16 Sep 2020 09:09:43 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v3] In-Reply-To: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: > Hi > > Moving the review that is based on mercurial repo to github. > The history of conversation is > [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) > Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) > > @dholmes-ora could you please have a look. Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: comment modified wrt review feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/169/files - new: https://git.openjdk.java.net/jdk/pull/169/files/1c0786a5..506094bf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/169.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/169/head:pull/169 PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Wed Sep 16 09:36:28 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Wed, 16 Sep 2020 09:36:28 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: > Hi > > Moving the review that is based on mercurial repo to github. > The history of conversation is > [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) > Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) > > @dholmes-ora could you please have a look. Jamsheed Mohammed C M has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: comment modified wrt review feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/169/files - new: https://git.openjdk.java.net/jdk/pull/169/files/506094bf..9777e8c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=169&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/169.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/169/head:pull/169 PR: https://git.openjdk.java.net/jdk/pull/169 From jbhateja at openjdk.java.net Wed Sep 16 12:42:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Sep 2020 12:42:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size > less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes > an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - 8252848: Rebase patch with branch tip. - Update arraycopynode.cpp Missed safety check. - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. - 8252848: Strengthening the check to detect partially in-lined array copy before Memory Barrier. - 8252848: Updating pull request-144, added a safety check on node type during pattern matching in ArrayCopyNode::may_modify(). - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/144/files - new: https://git.openjdk.java.net/jdk/pull/144/files/f6c46479..b9eaa468 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=144&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=144&range=01-02 Stats: 4673 lines in 137 files changed: 2387 ins; 1700 del; 586 mod Patch: https://git.openjdk.java.net/jdk/pull/144.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/144/head:pull/144 PR: https://git.openjdk.java.net/jdk/pull/144 From jbhateja at openjdk.java.net Wed Sep 16 12:52:53 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Sep 2020 12:52:53 GMT Subject: Withdrawn: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Sun, 13 Sep 2020 19:02:59 GMT, Jatin Bhateja wrote: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for sub-word type small array copy operations with size > less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes > an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/144 From jatin.bhateja at intel.com Wed Sep 16 13:08:56 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Wed, 16 Sep 2020 13:08:56 +0000 Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v2] In-Reply-To: References: Message-ID: Hi Nils, I have closed this pull request-144 and will re-open a new one for partial in-lining. There is a code overlap with PR-61 because both these issues were related to one parent JBS (JDK-8251871). Different pull requests PR61 and PR144 were created for each of the sub-tasks (JDK-8252847 and JDK-8252848). For completeness of the independent patches there is some duplication of assembler routines. But, I guess it will be difficult to integrate them post review since bot may encounter merge conflicts. Is there a way to get them review in parallel as independent patches without creating one unified patch? Regards, Jatin > -----Original Message----- > From: hotspot-compiler-dev On > Behalf Of Nils Eliasson > Sent: Tuesday, September 15, 2020 7:24 PM > To: hotspot-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR: 8252848: Optimize small primitive arrayCopy operations > through partial inlining using AVX-512 masked instructions [v2] > > On Tue, 15 Sep 2020 10:26:04 GMT, Jatin Bhateja > wrote: > > >> Summary: > >> > >> 1) Partial in-lining technique avoids call overhead penalty for > >> sub-word type small array copy operations with size less than 32 > >> bytes. 2) At runtime, a conditional check based on copy length either > >> calls an array-copy stub or executes an optimized instruction > >> sequence using AVX-512 masked instructions emitted at the call site. > >> 3) New runtime flag > >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum > size for partial in-lining. > >> Performance Results: > >> System : CascadeLake Server, Intel(R) > Xeon(R) Platinum 8280L CPU @ 2.70GHz > >> Micros : > test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > >> ArrayCopyPartialInlineSize : 32 > >> > >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > >> -- | -- | -- | -- | -- > >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835 > >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | > >> 0.820195974 ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | > >> 1.774752475 ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | > >> 1.868621064 ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | > >> 1.783516095 ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | > >> 2.021923621 ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | > >> 0.945780903 ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | > >> 1.02117081 ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | > >> 0.890917886 ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | > >> 0.973313287 ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | > >> 1.189361945 ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 > >> | 1.000094067 ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | > >> 2.068638604 ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | > >> 2.058475204 ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | > >> 2.008512213 ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | > >> 1.990727003 ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | > >> 2.214258627 ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | > >> 0.986595174 ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | > >> 1.001771741 ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | > >> 0.954089313 ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | > >> 0.565510734 ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | > >> 0.985462345 ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | > >> 1.775733994 ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | > >> 1.857711803 ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | > >> 1.82404055 ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | > >> 2.070238957 ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | > >> 0.946676541 ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | > >> 1.001287416 ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | > >> 0.906707692 ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | > >> 1.012100774 ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | > >> 0.990837467 ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | > >> 1.011095137 ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | > >> 2.008160237 ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | > >> 2.083428138 ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | > >> 2.059537353 ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | > >> 1.977062523 ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | > >> 2.163946588 ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | > >> 0.979031453 ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | > >> 0.979475983 ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | > >> 0.953952321 ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | > >> 0.987171053 ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | > >> 0.9934507 ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | > >> 1.81788559 ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | > >> 1.857349747 ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | > >> 1.783516095 ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | > >> 2.073886273 ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | > >> 0.94649139 ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | > >> 0.981692629 ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | > >> 0.949923935 ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | > >> 0.97457385 ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | > >> 1.001925096 ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | > >> 0.998092975 > >> > >> Detailed Reports: > >> Baseline : > >> > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt] > (http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > >> WithOpt : > >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI > >> _Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/J > >> MH_With_PI_Opts.txt) > > > > Jatin Bhateja has updated the pull request incrementally with one > additional commit since the last revision: > > > > Update arraycopynode.cpp > > > > Missed safety check. > > This PR includes the changes for JDK-8252847. It makes it hard to review. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/144 From jbhateja at openjdk.java.net Thu Sep 17 05:16:52 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 17 Sep 2020 05:16:52 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2] In-Reply-To: References: Message-ID: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847: Review comments resolution ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/61/files - new: https://git.openjdk.java.net/jdk/pull/61/files/b4f4081f..0cd42ccb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From jbhateja at openjdk.java.net Thu Sep 17 05:20:24 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 17 Sep 2020 05:20:24 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2] In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 13:12:48 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8252847: Review comments resolution > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7971: > >> 7969: BasicType type, int offset, bool use64byteVector) { >> 7970: assert(MaxVectorSize >= 32, "vector length < 32"); >> 7971: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; > > When do you expect AVX3Threshold to be 0? As of now when user explicitly pass -XX:AVX3Threshold=0 , default value of AVX3Threshold is 4096. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From felix.yang at huawei.com Thu Sep 17 06:40:56 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 17 Sep 2020 06:40:56 +0000 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> <95965aeb-9d97-3b27-e684-967b6155eb34@redhat.com> Message-ID: Hi all, > -----Original Message----- > From: Sean Mullan [mailto:sean.mullan at oracle.com] > Sent: Wednesday, September 9, 2020 12:16 AM > To: Andrew Haley ; Yangfei (Felix) > ; hotspot-compiler-dev at openjdk.java.net; core- > libs-dev at openjdk.java.net > Cc: aarch64-port-dev at openjdk.java.net; Stuart Monteith > > Subject: Re: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 > accelerator/intrinsic > > Since this change affects security code, please make sure you add security- > dev at openjdk.java.net on any followup code reviews. Thanks for reminding that. I have just switched to github and created a PR for this issue: https://github.com/openjdk/jdk/pull/207 Let's switch to this PR for the followup code reviews. I see security and hotspot labels was automatically added by the bots. I also added labels for core-libs and hotspot-compiler. > > On 9/1/20 10:44 AM, Andrew Haley wrote: > > On 01/09/2020 11:53, Yangfei (Felix) wrote: > >> Sure, I am happy if the original author of the assembly code or someone > else from Linaro could help here. > >> I wasn't aware there was such an requirement here given that assembly > code is licensed under GPL. > > > > There sure is. All code must be contributed by its owner and put on > > the cr.openjdk site. Especially GPL code. I have added ard.biesheuvel at linaro.org to Contributed-by: in the git commit msg. And the newly created PR was Acked-by: Ard Biesheuvel ard.biesheuvel at linaro.org Hope this addresses Andrew's concern. Best regards, Felix From neliasso at openjdk.java.net Thu Sep 17 13:24:54 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 17 Sep 2020 13:24:54 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2] In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 05:16:52 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. >> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. >> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop >> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block >> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. >> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user >> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy >> sizes above 4096 bytes. JMH Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8252847: Review comments resolution I like that you have extracted the avx512 stub code from the rest - that makes it a lot more readable! Overall the new code feels easy to understand and read. I found one more minor issue (appears in four places). My only concern is that it's getting hard to follow under what circumstances avx3 instructions are used: Could it be the case that different thresholds are needed for when you are using avx3 instructions with 32 or 64 byte vectors? Are we sure all variants are tested? Also - have you thought about supporting oop-copies? You only have to call the BarrierSetAssembler::arraycopy_prologue/epilogue like in the old versions. It's not a requirement for me to approve this - but an encouragement for a future patch. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2390: > 2388: address *entry, const char *name, > 2389: bool dest_uninitialized = false) { > 2390: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) { "false == is_oop" => !is_oop src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2501: > 2499: address generate_disjoint_long_oop_copy(bool aligned, bool is_oop, address *entry, > 2500: const char *name, bool dest_uninitialized = false) { > 2501: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) { false == is_oop => !is_oop src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2608: > 2606: address nooverlap_target, address *entry, > 2607: const char *name, bool dest_uninitialized = false) { > 2608: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) { false == is_oop => !is_oop src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2282: > 2280: address generate_disjoint_int_oop_copy(bool aligned, bool is_oop, address* entry, > 2281: const char *name, bool dest_uninitialized = false) { > 2282: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) { false == is_oop => !is_oop ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/61 From dnsimon at openjdk.java.net Thu Sep 17 14:29:34 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 17 Sep 2020 14:29:34 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: Message-ID: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> On Thu, 10 Sep 2020 11:52:34 GMT, Allan Gregersen wrote: > The idea is to add a more powerful API for cases where the current iterateFrames API cannot be used. > > For example, a debugger needs access to the content of stack frames such as local variables or monitors. In cases where > threads execute in the runtime or in native code, it's not possible to obtain a thread suspension hook, for which > iterateFrames can be used on the suspended thread. The getStackFrames method enables an immediate stack frames lookup > regardless of the status of the underlying thread. Another use case would be for lookup of backtraces for non-current > threads. The implementation is done by means of a VM operation that collects vframe data for each thread during a > safepoint, whereafter required object reallocation/reassign fields is performed based on the collected snapshot. @vnkozlov @veresov @dean-long it would be great to get some reviews of this soon. To assist, let me try provide a little more context on the area of JVMCI which is motivated by [Truffle](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/package-summary.html). The [`StackIntrospection.iterateFrames()`](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/src/jdk/vm/ci/code/stack/StackIntrospection.java#L45) API exists for Truffle to support walking guest language frames. In Truffle, all values, including primitives, are boxed which explains why there is no `InspectedFrame.setLocal(int index, Object value)` method. All Truffle guest language frame locals are accessed/updated by reading/updating the boxed value returned by `InspectedFrame.getLocal()`. With this existing API, a thread can only inspect its own stack frames. As the description of this PR states, it extends `StackIntrospection.iterateFrames()` with `StackIntrospection.getStackFrames()` so that a thread can inspect the frames of other threads. It is somewhat analogous to `java.lang.Thread.getAllStackTraces()` except it allows the local variables of the frames to be accessed as well. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/src/jdk/vm/ci/code/stack/StackIntrospection.java line 52: > 50: * that any client inspecting and possibly mutating the frame contents will do so under the > 51: * assumption that the underlying threads might have continued, executing potentially > 52: * invalidating the frame state. "... might have continued, executing potentially invalidating the frame state." Not sure what this is trying to say. Maybe you mean something like "... might have continued, in which case mutations are not reflected in the running thread state." src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/src/jdk/vm/ci/code/stack/StackIntrospection.java line 55: > 53: * > 54: * Note that the locals of the {@link InspectedFrame}s will be collected as copies when the > 55: * underlying frame was compiled, whereas they'll be references for interpreted frames. Use "will be collected as copies when the underlying frame was compiled" - what does that mean? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+8448088+ardbiesheuvel at openjdk.java.net Thu Sep 17 15:35:28 2020 From: github.com+8448088+ardbiesheuvel at openjdk.java.net (Ard Biesheuvel) Date: Thu, 17 Sep 2020 15:35:28 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 05:18:28 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make >> sure that there's no regression. >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on >> real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is >> detected. > > @ardbiesheuvel : Ard, could you please ack this patch? Thanks. Acked-by: Ard Biesheuvel ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From github.com+70893615+jasontatton-aws at openjdk.java.net Thu Sep 17 18:22:35 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Thu, 17 Sep 2020 18:22:35 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> On Fri, 11 Sep 2020 23:04:01 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Hi Andrew, > > The current indexOf char intrinsics for StringUTF16 and the new one here for StringLatin1 both use the AVX2 ? i.e. 256 > bit instructions, these are also affected by the frequency scaling which affects the AVX-512 instructions you pointed > out. Of course in a world where all the work taking place is AVX instructions this wouldn?t be an issue but in mixed > mode execution this is a problem. However, the compiler does have knowledge of the capability of the CPU upon which > it?s optimizing code for and is able to decide whether to use AVX instructions if they are supported by the CPU AND if > it wouldn?t be detrimental for performance. In fact, there is a flag which one can use to interact with > this: -XX:UseAVX=version. This of course made testing this patch an interesting experience as the AVX2 instructions > were not enabled on the Xeon processors which I had access to at AWS, but in the end I was able to use an i7 on my > corporate macbook to validate the code. From: mlbridge[bot] Sent: 11 September 2020 17:01 > To: openjdk/jdk Cc: Tatton, Jason ; Mention > Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) (#71) > > > Mailing list message from Andrew Haley on hotspot-dev: > > On 11/09/2020 11:23, Jason Tatton wrote: > > For the x86 implementation there may be two further improvements we > can make in order to improve performance of both the StringUTF16 and > StringLatin1 indexOf(char) intrinsics: > > 1. Make use of AVX-512 instructions. > > Is this really a good idea? > > When the processor detects Intel AVX instructions, additional > voltage is applied to the core. With the additional voltage applied, > the processor could run hotter, requiring the operating frequency to > be reduced to maintain operations within the TDP limits. The higher > voltage is maintained for 1 millisecond after the last Intel AVX > instruction completes, and then the voltage returns to the nominal > TDP voltage level. > > https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf > > So, if StringLatin1.indexOf(char) executes enough to make a difference > to any real-world program, the effect may well be to slow down the clock > for all of the code that does not use AVX. > > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > ? > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or > unsubscribe. Hi everyone, This patch is just missing a couple of reviewers... Please can someone step forward? I think it's a fairly straightforward change. -Jason ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From iveresov at openjdk.java.net Thu Sep 17 21:04:32 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 17 Sep 2020 21:04:32 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: On Wed, 16 Sep 2020 09:36:28 GMT, Jamsheed Mohammed C M wrote: >> Hi >> >> Moving the review that is based on mercurial repo to github. >> The history of conversation is >> [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) >> Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) >> >> @dholmes-ora could you please have a look. > > Jamsheed Mohammed C M has refreshed the contents of this pull request, and previous commits have been removed. The > incremental views will show differences compared to the previous content of the PR. The pull request contains one new > commit since the last revision: > comment modified wrt review feedback src/hotspot/share/prims/whitebox.cpp line 1056: > 1054: > 1055: // Compile method and check result > 1056: nmethod* nm = CompileBroker::compile_method(mh, bci, comp_level, mh, mh->invocation_count(), > CompileTask::Reason_Whitebox, CHECK_false); Shouldn't this be CHECK_NULL ? The function returns a pointer. ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Fri Sep 18 05:34:30 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Fri, 18 Sep 2020 05:34:30 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: On Wed, 16 Sep 2020 18:17:41 GMT, Igor Veresov wrote: >> Jamsheed Mohammed C M has refreshed the contents of this pull request, and previous commits have been removed. The >> incremental views will show differences compared to the previous content of the PR. > > src/hotspot/share/prims/whitebox.cpp line 1056: > >> 1054: >> 1055: // Compile method and check result >> 1056: nmethod* nm = CompileBroker::compile_method(mh, bci, comp_level, mh, mh->invocation_count(), >> CompileTask::Reason_Whitebox, CHECK_false); > > Shouldn't this be CHECK_NULL ? The function returns a pointer. bool WhiteBox::compile_method returns a bool ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From iveresov at openjdk.java.net Fri Sep 18 05:34:29 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 18 Sep 2020 05:34:29 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: On Wed, 16 Sep 2020 09:36:28 GMT, Jamsheed Mohammed C M wrote: >> Hi >> >> Moving the review that is based on mercurial repo to github. >> The history of conversation is >> [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) >> Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) >> >> @dholmes-ora could you please have a look. > > Jamsheed Mohammed C M has refreshed the contents of this pull request, and previous commits have been removed. The > incremental views will show differences compared to the previous content of the PR. Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From iveresov at openjdk.java.net Fri Sep 18 05:34:30 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 18 Sep 2020 05:34:30 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: On Fri, 18 Sep 2020 05:29:36 GMT, Jamsheed Mohammed C M wrote: >> src/hotspot/share/prims/whitebox.cpp line 1056: >> >>> 1054: >>> 1055: // Compile method and check result >>> 1056: nmethod* nm = CompileBroker::compile_method(mh, bci, comp_level, mh, mh->invocation_count(), >>> CompileTask::Reason_Whitebox, CHECK_false); >> >> Shouldn't this be CHECK_NULL ? The function returns a pointer. > > bool WhiteBox::compile_method returns a bool Yes, you're right of course. Reviewed. ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From jcm at openjdk.java.net Fri Sep 18 05:51:07 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Fri, 18 Sep 2020 05:51:07 GMT Subject: Integrated: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. In-Reply-To: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: <9JLPHYjgFC7rjN0KZsRYO-biZ8dAGpqvaSiPLUr8hTw=.e0c6159b-0c6f-4f0c-92b9-cdf3a97553b7@github.com> On Tue, 15 Sep 2020 08:35:01 GMT, Jamsheed Mohammed C M wrote: > Hi > > Moving the review that is based on mercurial repo to github. > The history of conversation is > [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) > Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) > > @dholmes-ora could you please have a look. This pull request has now been integrated. Changeset: 73c9088b Author: Jamsheed Mohammed C M URL: https://git.openjdk.java.net/jdk/commit/73c9088b Stats: 218 lines in 21 files changed: 39 ins; 104 del; 75 mod 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. Reviewed-by: dholmes, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From adinn at redhat.com Fri Sep 18 08:48:42 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 18 Sep 2020 09:48:42 +0100 Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> Message-ID: On 17/09/2020 19:22, Jason Tatton wrote: > This patch is just missing a couple of reviewers... Please can someone step forward? > > I think it's a fairly straightforward change. I believe you got a review from Andrew Haley -- it was quoted in your follow-up from which I selected the above response. What you did not get was license to proceed and push this change. That's because what is actually missing is the justification he asked for. As Andrew pointed out the change is simple but the reason for implementing it is not. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From jbhateja at openjdk.java.net Fri Sep 18 08:52:30 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 18 Sep 2020 08:52:30 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v3] In-Reply-To: References: Message-ID: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847: Review comments resolution ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/61/files - new: https://git.openjdk.java.net/jdk/pull/61/files/0cd42ccb..271b6457 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From jbhateja at openjdk.java.net Fri Sep 18 08:52:30 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 18 Sep 2020 08:52:30 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2] In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 13:22:07 GMT, Nils Eliasson wrote: > My only concern is that it's getting hard to follow under what circumstances avx3 instructions are used: > Could it be the case that different thresholds are needed for when you are using avx3 instructions with 32 or 64 byte > vectors? Are we sure all variants are tested? Following 2 runtime flags influence the implementation :- - MaxVectorSize: Determined during VM initialization based on the CPUID of the target. - AVX3Theshold: Set to a default value of 4096 bytes based on prior performance analysis. Following general rules were followed during implementation: 1) If target support AVX3 features (BW+VL+F) then copy will use 32 byte vectors (YMMs) for both special cases and aligned copy loop. This is default configuration. 2) If copy length is above AVX3Threshold, then we can safely use 64 byte vectors (ZMMs) for main copy loop (and tail) since bulk of the cycles will be consumed in it. 3) Leaf level Macro Assembly routines can dynamically choose b/w YMM or ZMM register based on the AVX3Threshold value. 4) If user forces MaxVectorSize=32 then above 4096 bytes its seen that REP MOVs shows a better performance for disjoint copies. For conjoint/backward copy vector based copy performs better. Thus, for 32 byte vector we do not need any threshold since they execute at max frequency level. tier1, tier2 and tier3 did not show any new issues with the changes. > Also - have you thought about supporting oop-copies? You only have to call the We may not see significant performance improvement considering prologue and epilogue barriers does considerable processing over object arrays. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From jbhateja at openjdk.java.net Fri Sep 18 09:10:05 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 18 Sep 2020 09:10:05 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2] In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 13:22:07 GMT, Nils Eliasson wrote: > BarrierSetAssembler::arraycopy_prologue/epilogue like in the old versions. It's not a requirement for me to approve > this - but an encouragement for a future patch. Yes that's a good pointer, I can explore extending existing GC barriers for array copy as a separate next step. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 18 10:57:44 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 18 Sep 2020 10:57:44 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing Message-ID: Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each main run of the IGVN loop. Also add `GenerateStressSeed' and 'StressSeed=N` options to randomly generate or specify the seed. In either case, the seed is logged if `LogCompilation` is enabled. The generation or specification of seeds also affects the randomization triggered by `StressLCM` and `StressGCM`. The new options are declared as production+diagnostic for consistency with these existing options. ------------- Commit messages: - 8252219: C2: Randomize IGVN worklist for stress testing Changes: https://git.openjdk.java.net/jdk/pull/242/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252219 Stats: 172 lines in 7 files changed: 172 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 18 10:57:44 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 18 Sep 2020 10:57:44 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 10:28:28 GMT, Roberto Casta?eda Lozano wrote: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed' and 'StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The generation or specification of seeds also > affects the randomization triggered by `StressLCM` and `StressGCM`. The new options are declared as > production+diagnostic for consistency with these existing options. Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each main run of the IGVN loop. Also add 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify the seed. In either case, the seed is logged if 'LogCompilation' is enabled. The generation or specification of seeds also affects the randomization triggered by 'StressLCM' and 'StressGCM'. The new options are declared as production+diagnostic for consistency with these existing options. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Sep 18 11:06:55 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 18 Sep 2020 11:06:55 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> Message-ID: <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> On Thu, 17 Sep 2020 18:20:08 GMT, Jason Tatton wrote: >> Hi Andrew, >> >> The current indexOf char intrinsics for StringUTF16 and the new one here for StringLatin1 both use the AVX2 ? i.e. 256 >> bit instructions, these are also affected by the frequency scaling which affects the AVX-512 instructions you pointed >> out. Of course in a world where all the work taking place is AVX instructions this wouldn?t be an issue but in mixed >> mode execution this is a problem. However, the compiler does have knowledge of the capability of the CPU upon which >> it?s optimizing code for and is able to decide whether to use AVX instructions if they are supported by the CPU AND if >> it wouldn?t be detrimental for performance. In fact, there is a flag which one can use to interact with >> this: -XX:UseAVX=version. This of course made testing this patch an interesting experience as the AVX2 instructions >> were not enabled on the Xeon processors which I had access to at AWS, but in the end I was able to use an i7 on my >> corporate macbook to validate the code. From: mlbridge[bot] Sent: 11 September 2020 17:01 >> To: openjdk/jdk Cc: Tatton, Jason ; Mention >> Subject: Re: [openjdk/jdk] 8173585: Intrinsify StringLatin1.indexOf(char) (#71) >> >> >> Mailing list message from Andrew Haley on hotspot-dev: >> >> On 11/09/2020 11:23, Jason Tatton wrote: >> >> For the x86 implementation there may be two further improvements we >> can make in order to improve performance of both the StringUTF16 and >> StringLatin1 indexOf(char) intrinsics: >> >> 1. Make use of AVX-512 instructions. >> >> Is this really a good idea? >> >> When the processor detects Intel AVX instructions, additional >> voltage is applied to the core. With the additional voltage applied, >> the processor could run hotter, requiring the operating frequency to >> be reduced to maintain operations within the TDP limits. The higher >> voltage is maintained for 1 millisecond after the last Intel AVX >> instruction completes, and then the voltage returns to the nominal >> TDP voltage level. >> >> https://computing.llnl.gov/tutorials/linux_clusters/intelAVXperformanceWhitePaper.pdf >> >> So, if StringLatin1.indexOf(char) executes enough to make a difference >> to any real-world program, the effect may well be to slow down the clock >> for all of the code that does not use AVX. >> >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> >> ? >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub, or >> unsubscribe. > > Hi everyone, > > This patch is just missing a couple of reviewers... Please can someone step forward? > > I think it's a fairly straightforward change. > > -Jason Hi Andrew, Thanks for coming back to me. Looking on the github [PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a reviewer for this (perhaps this is a feature which is not being used). > That's > because what is actually missing is the justification he asked for. As > Andrew pointed out the change is simple but the reason for implementing > it is not. There are two separate things here: 1). Justification for the change itself: -The objective and justification for this patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as raised in [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on the [mailing list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html). 2). Discussion around future enhancements - concerning potential use of AVX-512 instructions and a more optimal implementation for short strings. -These would be separate JBS's I'm not advocating for/against this, they are just ideas separate from this JBS. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From neliasso at openjdk.java.net Fri Sep 18 11:22:43 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 18 Sep 2020 11:22:43 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v3] In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 08:52:30 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. >> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. >> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop >> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block >> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. >> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user >> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy >> sizes above 4096 bytes. JMH Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8252847: Review comments resolution Thanks for the clarification och update! Reviewed. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/61 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Sep 18 11:56:09 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 18 Sep 2020 11:56:09 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge master - 8173585: further whitespace changes required by jcheck - JDK-8173585 - whitespace changes required by jcheck - JDK-8173585 ------------- Changes: https://git.openjdk.java.net/jdk/pull/71/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=01 Stats: 523 lines in 16 files changed: 506 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk/pull/71 From adinn at redhat.com Fri Sep 18 12:42:23 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 18 Sep 2020 13:42:23 +0100 Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> Message-ID: On 18/09/2020 12:06, Jason Tatton wrote: > There are two separate things here: > 1). Justification for the change itself: > -The objective and justification for this patch is to bring the performance of StringLatin1 indexOf(char) in > line with StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as raised in > [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on the [mailing > list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html). > 2). Discussion around future enhancements - concerning potential use of AVX-512 instructions and a more optimal > implementation for short strings. > -These would be separate JBS's I'm not advocating for/against this, they are just ideas separate from this JBS. I don't agree that these two things are separable. Andrew's point applies to both. In the first case the problem is that the 'evidence' we have does not testify to the possibility Andrew outlines. Both code examples used to justify the idea StringLatin1.indexOf(char) will perform 'better' with an AVX-based intrinsic are micro-benchmarks that do a lot of intensive String manipulation and nothing else i.e. they won't get hit by the possible cost of ramping up power for AVX because they make extensive use of AVX and then stop. That's very unlikely to happen in a real world case. So, the fact that this change removes the disparity seen with these benchmarks is still not evidence for a general improvement. So, I don't (yet) see a reason to make this change and the possibility still stands that adopting this change may end up making most code that uses StringLatin1.indexOf(char) worse. It might be a good idea to consider finding some way to test whether the cost Andrew has highlighted makes a difference before committing this change. I know the same argument might might be raised aginst the existing intrinsics but surely that's an a fortiori argument for not proceeding. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From fyang at openjdk.java.net Fri Sep 18 15:54:57 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Fri, 18 Sep 2020 15:54:57 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge master - Fix trailing whitespace issue - 8252204: AArch64: Implement SHA3 accelerator/intrinsic Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=01 Stats: 1012 lines in 29 files changed: 940 ins; 13 del; 59 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Sep 18 16:00:00 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 18 Sep 2020 16:00:00 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> Message-ID: <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> On Fri, 18 Sep 2020 11:04:34 GMT, Jason Tatton wrote: >> Hi everyone, >> >> This patch is just missing a couple of reviewers... Please can someone step forward? >> >> I think it's a fairly straightforward change. >> >> -Jason > > Hi Andrew, > > Thanks for coming back to me. Looking on the github [PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a > reviewer for this (perhaps this is a feature which is not being used). >> That's >> because what is actually missing is the justification he asked for. As >> Andrew pointed out the change is simple but the reason for implementing >> it is not. > > There are two separate things here: > 1). Justification for the change itself: > -The objective and justification for this patch is to bring the performance of StringLatin1 indexOf(char) in line with > StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as raised in > [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on the [mailing > list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html). > > 2). Discussion around future enhancements - concerning potential use of AVX-512 instructions and a more optimal > implementation for short strings. > -These would be separate JBS's I'm not advocating for/against this, they are just ideas separate from this JBS. The AVX2 code path represents approximately 1/6th of the patch (1/7th including the infrastructure ?code around this). I don?t think we should discard the entire patch because 1/6th of the code may have unintended consequences. This is especially the case when the rest of the code has been benchmarked, with certainty, to show the desired performance improvement has been achieved. ? Additionally, I do not see how those unintended consequences will ever be realised because the JVM has knowledge of the AVX capability of the chip it?s running on and disables the AVX2 code path for chips which suffer from the performance degradation which has been outlined in this discussion. Thus protecting us from unintended consequences. Unless we are asserting that this mechanism to globally control the use of AVX2 instructions is broken or otherwise non functional I see no reason to remove the AVX2 code. And to be consistent we would really need to look at removing all instances of AVX2 code in the JVM (of which there is quite a lot). ? As I see it there are three ways forward: 1. Accept the patch as is. 2. Modify the patch to remove the AVX code path for x86, and/or any other modifications needed. 3. Discard the patch entirely. At this point I am in favour of approach 1 but happy to accept 2 if advised that this is the right thing to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From fyang at openjdk.java.net Fri Sep 18 16:05:31 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Fri, 18 Sep 2020 16:05:31 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v3] In-Reply-To: References: Message-ID: <0uUIKwe29qJd177yzRttb-vvIkccG_wV5YuEQG4VmNY=.d2b684c1-e098-4384-9ea6-bba1dded72cf@github.com> > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Rebase ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/207/files - new: https://git.openjdk.java.net/jdk/pull/207/files/32f0bdc3..3e155193 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=01-02 Stats: 9 lines in 2 files changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From adinn at openjdk.java.net Fri Sep 18 16:34:04 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 18 Sep 2020 16:34:04 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Fri, 18 Sep 2020 15:55:52 GMT, Jason Tatton wrote: >> Hi Andrew, >> >> Thanks for coming back to me. Looking on the github [PR](https://github.com/openjdk/jdk/pull/71) nobody is tagged as a >> reviewer for this (perhaps this is a feature which is not being used). >>> That's >>> because what is actually missing is the justification he asked for. As >>> Andrew pointed out the change is simple but the reason for implementing >>> it is not. >> >> There are two separate things here: >> 1). Justification for the change itself: >> -The objective and justification for this patch is to bring the performance of StringLatin1 indexOf(char) in line with >> StringUTF16 indexOf(char) for x86 and ARM64. This solves the problem as raised in >> [JDK-8173585](https://bugs.openjdk.java.net/browse/JDK-8173585), and also on the [mailing >> list](http://mail.openjdk.java.net/pipermail/jdk9-dev/2017-January/005539.html). >> >> 2). Discussion around future enhancements - concerning potential use of AVX-512 instructions and a more optimal >> implementation for short strings. >> -These would be separate JBS's I'm not advocating for/against this, they are just ideas separate from this JBS. > > The AVX2 code path represents approximately 1/6th of the patch (1/7th including the infrastructure ?code around this). > I don?t think we should discard the entire patch because 1/6th of the code may have unintended consequences. This is > especially the case when the rest of the code has been benchmarked, with certainty, to show the desired performance > improvement has been achieved. ? Additionally, I do not see how those unintended consequences will ever be realised > because the JVM has knowledge of the AVX capability of the chip it?s running on and disables the AVX2 code path for > chips which suffer from the performance degradation which has been outlined in this discussion. Thus protecting us from > unintended consequences. Unless we are asserting that this mechanism to globally control the use of AVX2 instructions > is broken or otherwise non functional I see no reason to remove the AVX2 code. And to be consistent we would really > need to look at removing all instances of AVX2 code in the JVM (of which there is quite a lot). ? As I see it there are > three ways forward: 1. Accept the patch as is. 2. Modify the patch to remove the AVX code path for x86, and/or any > other modifications needed. 3. Discard the patch entirely. At this point I am in favour of approach 1 but happy to > accept 2 if advised that this is the right thing to do. "the JVM has knowledge of the AVX capability of the chip it?s running on and disables the AVX2 code path for chips which suffer from the performance degradation which has been outlined in this discussion" Does it? The white paper Andrew cited doesn't mention this as being specific to only some chips that implement AVX2. Can you explain where this restricted effect is documented? Also, I assume you are referring to the code in vm_version_x86.cpp with this comment // Don't use AVX-512 on older Skylakes unless explicitly requested is that correct? ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Sep 18 23:14:30 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 18 Sep 2020 23:14:30 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Fri, 18 Sep 2020 16:31:23 GMT, Andrew Dinn wrote: > Can you explain where this restricted effect is documented? Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip supports it is mostly controlled in: [vm_version_x86.cpp]( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp) specifically: [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755) and in [generate_get_cpu_info]( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611). In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also tested on a E5-2680 which the JVM does not enable AVX2 for. However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which the current flagship: Threadripper 3990X, is based, is able to support AVX2 [without the frequency scaling]( https://www.anandtech.com/Show/Index/14525?cPage=7&all=False&sort=0&page=9&slug=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome) which some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm how it is classified in the JVM. Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX is enabled: `-XX:UseAVX=version.` ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From minqi at openjdk.java.net Fri Sep 18 23:54:11 2020 From: minqi at openjdk.java.net (Yumin Qi) Date: Fri, 18 Sep 2020 23:54:11 GMT Subject: RFR: 8253208: Move CDS related code to a separate class Message-ID: With more CDS related code added to VM, it is time to move CDS code to a separate class. CDS is the new class which is specific to CDS. Tests: tier1-4 ------------- Commit messages: - 8253208: Move CDS related code to a separate class - Merge branch 'master' of https://github.com/yminqi/jdk into jdk-8253208 - Merge remote-tracking branch 'origin/jdk-8252689' - 8252689: Classes are loaded from jrt:/java.base even when CDS is used - 8252689: Classes are loaded from jrt:/java.base even when CDS is used - 8252689: Classes are loaded from jrt:/java.base even when CDS is used Changes: https://git.openjdk.java.net/jdk/pull/261/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=261&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253208 Stats: 196 lines in 20 files changed: 110 ins; 53 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/261.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/261/head:pull/261 PR: https://git.openjdk.java.net/jdk/pull/261 From kvn at openjdk.java.net Sat Sep 19 01:57:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 19 Sep 2020 01:57:14 GMT Subject: RFR: 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 Message-ID: Code added by [8248830](https://bugs.openjdk.java.net/browse/JDK-8248830) uses Node::is_Con() check when looking for constant shift values. Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. I used C2 types to check and get shift values. I also refactor code to consolidate checks. Method degenerate_vector_rotate() in vectornode.cpp has is_Con() check and, in general, it could be TOP because we do loop optimizations after vectorization. I added isa_int() check and treat 'cnt' in other case as variable to do transformation on 'else' branch and let sub-graph collapse there. I also refactor degenerate_vector_rotate() to make it compact. I removed OrVNode::Ideal() method added by 8248830 because it is currently not used (it is for Vector API). And its code is convoluted and does not match code in OrINode::Ideal(). I moved Rotate vectorization tests into new test files TestIntVectRotate.java and TestLongVectRotate.java and added more tests methods for which vectors are created. Tested: tier1, hs-tier2, hs-tier3. Verified fix with replay file from bug report. I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. I created subtask to add new regerssion test later because this fix is urgent and I did not have time to prepare it. The fix was already reviewed on mailing list https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039886.html ------------- Commit messages: - 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 Changes: https://git.openjdk.java.net/jdk/pull/262/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=262&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252188 Stats: 1342 lines in 7 files changed: 964 ins; 346 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/262.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/262/head:pull/262 PR: https://git.openjdk.java.net/jdk/pull/262 From kbarrett at openjdk.java.net Sat Sep 19 13:36:51 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 19 Sep 2020 13:36:51 GMT Subject: RFR: 8253311: Cleanup relocInfo constructors Message-ID: 8253311: Cleanup relocInfo constructors ------------- Commit messages: - cleanup constructor Changes: https://git.openjdk.java.net/jdk/pull/265/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=265&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253311 Stats: 44 lines in 2 files changed: 12 ins; 13 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/265.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/265/head:pull/265 PR: https://git.openjdk.java.net/jdk/pull/265 From kbarrett at openjdk.java.net Sat Sep 19 13:36:51 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 19 Sep 2020 13:36:51 GMT Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 13:28:05 GMT, Kim Barrett wrote: > 8253311: Cleanup relocInfo constructors Please review this cleanup of the relocInfo constructors. We use constructor delegation to avoid code duplication. The old code seems to have been attempting to fake constructor delegation for similar reasons. Also fixed some comments about a nonexistent BoundRelocation class. Testing: mach5 tier1-3 ------------- PR: https://git.openjdk.java.net/jdk/pull/265 From alanb at openjdk.java.net Sat Sep 19 15:54:25 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Sat, 19 Sep 2020 15:54:25 GMT Subject: RFR: 8253208: Move CDS related code to a separate class In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 23:47:56 GMT, Yumin Qi wrote: > With more CDS related code added to VM, it is time to move CDS code to a separate class. CDS is the new class which is > specific to CDS. > Tests: tier1-4 src/java.base/share/classes/jdk/internal/misc/CDS.java line 42: > 40: public static native void defineArchivedModules(ClassLoader platformLoader, ClassLoader systemLoader); > 41: > 42: public static native long getRandomSeedForCDSDump(); The moving of the archive methods to CDS looks okay but inconsistent to only comment 3 of the 5 methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/261 From kvn at openjdk.java.net Sat Sep 19 16:01:25 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 19 Sep 2020 16:01:25 GMT Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 13:28:05 GMT, Kim Barrett wrote: > 8253311: Cleanup relocInfo constructors Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/265 From kvn at openjdk.java.net Sat Sep 19 16:01:25 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 19 Sep 2020 16:01:25 GMT Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 15:58:38 GMT, Vladimir Kozlov wrote: >> 8253311: Cleanup relocInfo constructors > > Marked as reviewed by kvn (Reviewer). Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/265 From vlivanov at openjdk.java.net Sat Sep 19 16:02:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 19 Sep 2020 16:02:06 GMT Subject: RFR: 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 01:45:57 GMT, Vladimir Kozlov wrote: > Code added by [8248830](https://bugs.openjdk.java.net/browse/JDK-8248830) uses Node::is_Con() check when looking for > constant shift values. > Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. I used C2 types > to check and get shift values. I also refactor code to consolidate checks. > Method degenerate_vector_rotate() in vectornode.cpp has is_Con() check and, in general, it could be TOP because we do > loop optimizations after vectorization. I added isa_int() check and treat 'cnt' in other case as variable to do > transformation on 'else' branch and let sub-graph collapse there. I also refactor degenerate_vector_rotate() to make it > compact. I removed OrVNode::Ideal() method added by 8248830 because it is currently not used (it is for Vector API). > And its code is convoluted and does not match code in OrINode::Ideal(). I moved Rotate vectorization tests into new > test files TestIntVectRotate.java and TestLongVectRotate.java and added more tests methods for which vectors are > created. Tested: tier1, hs-tier2, hs-tier3. > Verified fix with replay file from bug report. > I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. > > I created subtask to add new regerssion test later because this fix is urgent and I did not have time to prepare it. > > The fix was already reviewed on mailing list > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039886.html Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/262 From kvn at openjdk.java.net Sat Sep 19 16:09:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 19 Sep 2020 16:09:13 GMT Subject: Integrated: 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 01:45:57 GMT, Vladimir Kozlov wrote: > Code added by [8248830](https://bugs.openjdk.java.net/browse/JDK-8248830) uses Node::is_Con() check when looking for > constant shift values. > Unfortunately it does not guarantee that it will be Integer constant because TOP node is also ConNode. I used C2 types > to check and get shift values. I also refactor code to consolidate checks. > Method degenerate_vector_rotate() in vectornode.cpp has is_Con() check and, in general, it could be TOP because we do > loop optimizations after vectorization. I added isa_int() check and treat 'cnt' in other case as variable to do > transformation on 'else' branch and let sub-graph collapse there. I also refactor degenerate_vector_rotate() to make it > compact. I removed OrVNode::Ideal() method added by 8248830 because it is currently not used (it is for Vector API). > And its code is convoluted and does not match code in OrINode::Ideal(). I moved Rotate vectorization tests into new > test files TestIntVectRotate.java and TestLongVectRotate.java and added more tests methods for which vectors are > created. Tested: tier1, hs-tier2, hs-tier3. > Verified fix with replay file from bug report. > I also checked that RotateBenchmark.java added by 8248830 still creates Rotate vectors after this fix. > > I created subtask to add new regerssion test later because this fix is urgent and I did not have time to prepare it. > > The fix was already reviewed on mailing list > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039886.html This pull request has now been integrated. Changeset: 1438ce09 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/1438ce09 Stats: 1342 lines in 7 files changed: 346 ins; 964 del; 32 mod 8252188: Crash in OrINode::Ideal(PhaseGVN*, bool)+0x8b9 Reviewed-by: vlivanov, thartmann, jbhateja ------------- PR: https://git.openjdk.java.net/jdk/pull/262 From kim.barrett at oracle.com Sat Sep 19 17:04:31 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 19 Sep 2020 13:04:31 -0400 Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: <4E99E362-0E67-47D0-A77A-E086BF02E405@oracle.com> > On Sep 19, 2020, at 12:01 PM, Vladimir Kozlov wrote: > > On Sat, 19 Sep 2020 15:58:38 GMT, Vladimir Kozlov wrote: > >>> 8253311: Cleanup relocInfo constructors >> >> Marked as reviewed by kvn (Reviewer). > > Looks good. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/265 Thanks. From iklam at openjdk.java.net Sun Sep 20 06:13:47 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Sun, 20 Sep 2020 06:13:47 GMT Subject: RFR: 8253208: Move CDS related code to a separate class In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 23:47:56 GMT, Yumin Qi wrote: > With more CDS related code added to VM, it is time to move CDS code to a separate class. CDS is the new class which is > specific to CDS. > Tests: tier1-4 src/java.base/share/classes/jdk/internal/misc/CDS.java line 52: > 50: * Check if CDS sharing is enabled by via the UseSharedSpaces flag. > 51: */ > 52: public static native boolean isCDSSharingEnabled(); I think the word CDS is redundant in the method names. How about getRandomSeedForCDSDump() -> getRandomSeedForDumping() isCDSDumpingEnabled() -> isDynamicDumpingEnabled() // doesn't return true if we're doing a static dump isCDSSharingEnabled() -> isSharingEnabled() src/java.base/share/native/libjava/CDS.c line 49: > 47: JNIEXPORT jboolean JNICALL > 48: Java_jdk_internal_misc_CDS_isCDSDumpingEnabled(JNIEnv *env, jclass jcls) { > 49: return JVM_IsCDSDumpingEnabled(env); Maybe: return JVM_IsCDSDynamicDumpingEnabled(env) ------------- PR: https://git.openjdk.java.net/jdk/pull/261 From fyang at openjdk.java.net Sun Sep 20 13:53:00 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Sun, 20 Sep 2020 13:53:00 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v4] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/207/files - new: https://git.openjdk.java.net/jdk/pull/207/files/3e155193..04bdb42e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=02-03 Stats: 474 lines in 2 files changed: 61 ins; 9 del; 404 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From github.com+8792647+robcasloz at openjdk.java.net Sun Sep 20 17:04:51 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Sun, 20 Sep 2020 17:04:51 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 10:29:48 GMT, Roberto Casta?eda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed' and 'StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The generation or specification of seeds also >> affects the randomization triggered by `StressLCM` and `StressGCM`. The new options are declared as >> production+diagnostic for consistency with these existing options. > > Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, > the worklist is shuffled before each main run of the IGVN loop. Also add > 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify > the seed. In either case, the seed is logged if 'LogCompilation' is enabled. > > The generation or specification of seeds also affects the randomization > triggered by 'StressLCM' and 'StressGCM'. The new options are declared as > production+diagnostic for consistency with these existing options. Reverted to "draft mode", as I just realized the design is not repeatable since it relies on global PRNG state. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From kvn at openjdk.java.net Sun Sep 20 23:06:31 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 20 Sep 2020 23:06:31 GMT Subject: RFR: 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < =?UTF-8?B?cGNfb2Zmc+KApg==?= Message-ID: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> Fix frame state recording for System.arraycopy() intrinsic. This is port of Graal fix: https://github.com/oracle/graal/commit/438a7cb0257 Graal unit test ArrayCopyIntrinsificationTest.java is updated to catch this case. I ran tier1 and tier3-graal testing. I also ran CodeCacheInfoOnCompilation/Test.java and c2/Test6603011.java jtreg tests 100 times in Mach5 and got 6 failures with latest JDK. With fix tests passed. ------------- Commit messages: - 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < pc_offset) failed: must specify a new, larger pc offset failure Changes: https://git.openjdk.java.net/jdk/pull/272/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=272&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8247251 Stats: 49 lines in 6 files changed: 37 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/272/head:pull/272 PR: https://git.openjdk.java.net/jdk/pull/272 From felix.yang at huawei.com Mon Sep 21 02:59:47 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 21 Sep 2020 02:59:47 +0000 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: <28d501ee-0f96-48e0-bfc4-230721018968@redhat.com> References: <28d501ee-0f96-48e0-bfc4-230721018968@redhat.com> Message-ID: Hi, > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-retn at openjdk.java.net] On Behalf > Of Andrew Haley > Sent: Friday, September 18, 2020 9:48 PM > To: Fei Yang ; hotspot-dev at openjdk.java.net; > security-dev at openjdk.java.net > Subject: Re: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic > > On 17/09/2020 05:26, Fei Yang wrote: > > For now, this feature will not be enabled automatically for aarch64. > > We can auto-enable this when it is fully tested on real hardware. But > > for the above testing purposes, this is auto-enabled when the > corresponding hardware feature is detected. > > Given that there's no real hardware, it's extra-important to add the new > instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in > assembler_aarch64.cc:asm_check. I have added on commit in PR resolving this: https://github.com/openjdk/jdk/pull/207/commits/04bdb42e971aa1c2f78bb5c916db62910e167053?file-filters%5B%5D= I grouped SHA512SIMDOp, SHA3SIMDOp and SVEVectorOp after the # ARMv8.2A comment. So anticipate more changes in file assembler_aarch64.cpp. BTW: If this feature is not auto-enabled when the SHA3 hardware feature is there, we will have one failure for the following test: test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java 15 #-----testresult----- 16 description=file\:/home/yangfei/github/jdk/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java 17 elapsed=31546 0\:00\:31.546 18 end=Mon Sep 21 10\:27\:58 CST 2020 19 environment=regtest 20 execStatus=Failed. Execution failed\: `main' threw exception\: java.lang.AssertionError\: Option 'UseSHA3Intrinsics' is expected to have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default Any suggestions for this? Thanks, Felix From github.com+8792647+robcasloz at openjdk.java.net Mon Sep 21 06:35:57 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 21 Sep 2020 06:35:57 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only Message-ID: Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. ------------- Commit messages: - 8252583: Make PhiNode::is_copy() debug only Changes: https://git.openjdk.java.net/jdk/pull/275/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252583 Stats: 32 lines in 7 files changed: 15 ins; 4 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/275/head:pull/275 PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Mon Sep 21 06:35:57 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 21 Sep 2020 06:35:57 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 06:27:41 GMT, Roberto Casta?eda Lozano wrote: > Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code > with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. Convert PhiNode::is_copy() into an actual, debug-only predicate. Replace calls to is_copy() from non-debug code with explicit assertions. Remove dead loop in debug-only MergeMemStream::match_memory(). ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From neliasso at openjdk.java.net Mon Sep 21 07:33:03 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 21 Sep 2020 07:33:03 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 06:27:41 GMT, Roberto Casta?eda Lozano wrote: > Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code > with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. Changes requested by neliasso (Reviewer). src/hotspot/share/opto/addnode.cpp line 104: > 102: assert(!in2->as_Phi()->is_copy(), "in2 cannot be a copy"); > 103: #endif > 104: if( in2->is_Phi() && (phi = in2->as_Phi()) && phi->region()->is_Loop() && phi->in(2)==add){ Since the row is touched - please add whitespace where its missing: "phi->in(2)==add" -> "phi->in(2) == add" src/hotspot/share/opto/addnode.cpp line 98: > 96: assert(!in1->as_Phi()->is_copy(), "in1 cannot be a copy"); > 97: #endif > 98: if( in1->is_Phi() && (phi = in1->as_Phi()) && phi->region()->is_Loop() && phi->in(2)==add) Since the row is touched - please add whitespace where its missing. "phi->in(2)==add" -> "phi->in(2) == add" ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From adinn at openjdk.java.net Mon Sep 21 09:23:26 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 21 Sep 2020 09:23:26 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Fri, 18 Sep 2020 23:11:46 GMT, Jason Tatton wrote: > > Can you explain where this restricted effect is documented? > Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip > supports it is mostly controlled in: vm_version_x86.cpp specifically: get_processor_features and in > generate_get_cpu_info. Yes, I can see what the code does. I was asking where the cpu behaviour is documented independent of the code. > In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the > aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this > processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also > tested on a E5-2680 which the JVM does not enable AVX2 for. 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe is a Skylake design. However, the code I pointed you at in vm_version_x86 which claims to detect 'early Skylake' designs is only disabling AVX512 support. It still enables AVX2. Similarly, the code generates machine code to check the processor capabilities has a special check if use_evex is set (i.e. AVX3 is requested) for Skylake which disables AVX512 but does nto disable AVX2 support. However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which the current flagship: Threadripper 3990X, is based, is able to support AVX2 without the frequency scaling which some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm how it is classified in the JVM. Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX is enabled: -XX:UseAVX=version. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From xliu at openjdk.java.net Mon Sep 21 09:37:01 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 21 Sep 2020 09:37:01 GMT Subject: RFR: 8253392: remove PhaseCCP_DCE declaration Message-ID: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> hello, reviewers, May I ask to review this trivial patch? it's a clean-up. The forward declaration of PhaseCCP_DCE is not useful. ------------- Commit messages: - 8253392: remove PhaseCCP_DCE declaration Changes: https://git.openjdk.java.net/jdk/pull/277/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=277&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253392 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/277.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/277/head:pull/277 PR: https://git.openjdk.java.net/jdk/pull/277 From neliasso at openjdk.java.net Mon Sep 21 09:45:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 21 Sep 2020 09:45:11 GMT Subject: RFR: 8253392: remove PhaseCCP_DCE declaration In-Reply-To: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> References: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> Message-ID: On Mon, 21 Sep 2020 08:01:27 GMT, Xin Liu wrote: > hello, reviewers, > May I ask to review this trivial patch? > it's a clean-up. The forward declaration of PhaseCCP_DCE is not useful. Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/277 From neliasso at openjdk.java.net Mon Sep 21 09:51:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 21 Sep 2020 09:51:11 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 11:45:51 GMT, Roland Westrelin wrote: > Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html src/hotspot/share/opto/loopPredicate.cpp line 254: > 252: break; > 253: if (iff->in(1)->Opcode() == Op_Opaque4 && skeleton_predicate_has_opaque(iff)) { > 254: // Only need to clone range check predicates as those can be changed and duplicated by inserting > pre/main/post loops Is the comment still correct? It is still always a range check predicate? ------------- PR: https://git.openjdk.java.net/jdk/pull/176 From simonis at openjdk.java.net Mon Sep 21 10:00:30 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 21 Sep 2020 10:00:30 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> Message-ID: <7DJmBVYGzIaMRYO0Ad5UumqLK3_ehORlOFTAtX_aSP0=.e020152a-a2a0-4db5-ac82-dc17736975a9@github.com> On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains four commits: > - Merge master > - 8173585: further whitespace changes required by jcheck > - JDK-8173585 - whitespace changes required by jcheck > - JDK-8173585 src/hotspot/share/classfile/vmSymbols.cpp line 295: > 293: if (symbol == NULL) return NO_SID; > 294: return find_sid(symbol); > 295: } I think it is common sense to have a newline at the end of a file. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+8792647+robcasloz at openjdk.java.net Mon Sep 21 12:01:20 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 21 Sep 2020 12:01:20 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: > Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code > with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix spacing in touched lines ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/275/files - new: https://git.openjdk.java.net/jdk/pull/275/files/6175d16a..53962988 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/275/head:pull/275 PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Mon Sep 21 12:04:08 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 21 Sep 2020 12:04:08 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 07:30:15 GMT, Nils Eliasson wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spacing in touched lines > > Changes requested by neliasso (Reviewer). Thanks Nils! I just updated the pull request. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From thartmann at openjdk.java.net Mon Sep 21 12:36:17 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 21 Sep 2020 12:36:17 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 12:01:20 GMT, Roberto Casta?eda Lozano wrote: >> Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code >> with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix spacing in touched lines Changes requested by thartmann (Reviewer). src/hotspot/share/opto/memnode.cpp line 4892: > 4890: if (mem == n) return true; > 4891: if (n->is_Phi()) > 4892: assert(!n->as_Phi()->is_copy(), "n cannot be a copy"); The if condition should be added to the assert: `assert(!n->is_Phi() || !n->as_Phi()->is_copy() ...` src/hotspot/share/opto/loopnode.cpp line 53: > 51: bool Node::is_cloop_ind_var() const { > 52: #ifdef ASSERT > 53: if (is_Phi()) assert(!as_Phi()->is_copy(), "this phi cannot be a copy"); The if condition should be added to the assert (the `#ifdef ASSERT` can then be removed): `assert(!is_Phi() || !as_Phi()->is_copy() ...)` src/hotspot/share/opto/addnode.cpp line 96: > 94: #ifdef ASSERT > 95: if (in1->is_Phi()) > 96: assert(!in1->as_Phi()->is_copy(), "in1 cannot be a copy"); The if condition should be added to the assert (the `#ifdef ASSERT` can then be removed): `assert(!in1->is_Phi() || !in1->as_Phi()->is_copy() ...` src/hotspot/share/opto/addnode.cpp line 102: > 100: #ifdef ASSERT > 101: if (in2->is_Phi()) > 102: assert(!in2->as_Phi()->is_copy(), "in2 cannot be a copy"); The if condition should be added to the assert. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+70893615+jasontatton-aws at openjdk.java.net Mon Sep 21 12:45:55 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Mon, 21 Sep 2020 12:45:55 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3] In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: Add missing newline to end of vmSymbols.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/71/files - new: https://git.openjdk.java.net/jdk/pull/71/files/b85a7fb4..c8cc441e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk/pull/71 From thartmann at openjdk.java.net Mon Sep 21 12:50:25 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 21 Sep 2020 12:50:25 GMT Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 13:28:05 GMT, Kim Barrett wrote: > 8253311: Cleanup relocInfo constructors Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/265 From kim.barrett at oracle.com Mon Sep 21 13:12:09 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 21 Sep 2020 09:12:09 -0400 Subject: RFR: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: <0E985FFC-6C73-4D88-9455-8693895969D7@oracle.com> > On Sep 21, 2020, at 8:50 AM, Tobias Hartmann wrote: > > On Sat, 19 Sep 2020 13:28:05 GMT, Kim Barrett wrote: > >> 8253311: Cleanup relocInfo constructors > > Looks good. > > ------------- > > Marked as reviewed by thartmann (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/265 Thanks. From sergei.tsypanov at yandex.ru Mon Sep 21 13:57:55 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Mon, 21 Sep 2020 15:57:55 +0200 Subject: C2 does not elide the zeroing of the array in String.repeat() Message-ID: <121011600695117@mail.yandex.ru> Hello, as it appears from https://shipilev.net/blog/2016/arrays-wisdom-ancients/ C2 sometimes can eliminate zeroing of newly allocated array (particularly in ArrayList.toArray(T[])). However in case of String.repeat() VM does not elide the zeroing of the array even in case when repeated String is represented with 1 byte: if (len == 1) { final byte[] single = new byte[count]; Arrays.fill(single, value[0]); return new String(single, coder); } Here we are sure that the array is localized and for sure will be completely filled, however zeroing is present. When I run the benchmark [1] with fresh-built JDK it gives (length) Mode Cnt Score Error Units repeatOneByteString 8 avgt 50 14.020 ? 1.928 ns/op repeatOneByteString 64 avgt 50 24.618 ? 2.712 ns/op repeatOneByteString 128 avgt 50 36.555 ? 1.394 ns/op repeatOneByteString 1024 avgt 50 134.731 ? 7.022 ns/op then if in String.repeat() I replace final byte[] single = new byte[count]; with final byte[] single = StringConcatHelper.newArray(count); where StringConcatHelper.newArray(int) delegates directly to UNSAFE.allocateUninitializedArray(Class, int), the same benchmark demonstrates good improvement: (length) Mode Cnt Score Error Units repeatOneByteString 8 avgt 50 12.545 ? 0.164 ns/op repeatOneByteString 64 avgt 50 18.393 ? 0.686 ns/op repeatOneByteString 128 avgt 50 25.550 ? 0.378 ns/op repeatOneByteString 1024 avgt 50 90.454 ? 1.015 ns/op So the question is whether there's an issue in C2 (and whether it is fixeable) or not? Originally the question appeared in https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/068641.html Cheers, Sergey Tsypanov 1. @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class MiscStringBenchmark { @Benchmark public String repeatOneByteString(Data data) { return data.oneByteString.repeat(data.length); } @State(Scope.Thread) public static class Data { @Param({"8", "64", "128", "1024"}) private int length; private final String oneByteString = "a"; } } From kbarrett at openjdk.java.net Mon Sep 21 14:53:02 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 21 Sep 2020 14:53:02 GMT Subject: Integrated: 8253311: Cleanup relocInfo constructors In-Reply-To: References: Message-ID: On Sat, 19 Sep 2020 13:28:05 GMT, Kim Barrett wrote: > 8253311: Cleanup relocInfo constructors This pull request has now been integrated. Changeset: 2e30ff61 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/2e30ff61 Stats: 44 lines in 2 files changed: 13 ins; 12 del; 19 mod 8253311: Cleanup relocInfo constructors Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/265 From roland at openjdk.java.net Mon Sep 21 15:26:16 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 21 Sep 2020 15:26:16 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 09:47:19 GMT, Nils Eliasson wrote: >> Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html > > src/hotspot/share/opto/loopPredicate.cpp line 254: > >> 252: break; >> 253: if (iff->in(1)->Opcode() == Op_Opaque4 && skeleton_predicate_has_opaque(iff)) { >> 254: // Only need to clone range check predicates as those can be changed and duplicated by inserting >> pre/main/post loops > > Is the comment still correct? It is still always a range check predicate? Hi Nils, Yes, the comment is still correct. Skeleton predicates are always range checks. But in some cases to protect against overflow, predicates perform range checks with long computations in which case they use an IfNode and not a RangeCheckNode. ------------- PR: https://git.openjdk.java.net/jdk/pull/176 From xxinliu at amazon.com Mon Sep 21 16:14:19 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 21 Sep 2020 16:14:19 +0000 Subject: RFR: 8253392: remove PhaseCCP_DCE declaration In-Reply-To: References: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com>, Message-ID: <1600704859184.59191@amazon.com> Thanks, Nils! First time to try Skara CLI, so far so good! The only problem is I forgot to config user.name in my git. My name is 'ubuntu' this time. fixed. I will use my name in next PR. if it looks good, please help me to 'integrate' this PR. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Nils Eliasson Sent: Monday, September 21, 2020 2:45 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR: 8253392: remove PhaseCCP_DCE declaration CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Mon, 21 Sep 2020 08:01:27 GMT, Xin Liu wrote: > hello, reviewers, > May I ask to review this trivial patch? > it's a clean-up. The forward declaration of PhaseCCP_DCE is not useful. Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/277 From kvn at openjdk.java.net Mon Sep 21 18:48:24 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 21 Sep 2020 18:48:24 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Thu, 17 Sep 2020 14:26:56 GMT, Doug Simon wrote: >> The idea is to add a more powerful API for cases where the current iterateFrames API cannot be used. >> >> For example, a debugger needs access to the content of stack frames such as local variables or monitors. In cases where >> threads execute in the runtime or in native code, it's not possible to obtain a thread suspension hook, for which >> iterateFrames can be used on the suspended thread. The getStackFrames method enables an immediate stack frames lookup >> regardless of the status of the underlying thread. Another use case would be for lookup of backtraces for non-current >> threads. The implementation is done by means of a VM operation that collects vframe data for each thread during a >> safepoint, whereafter required object reallocation/reassign fields is performed based on the collected snapshot. > > @vnkozlov @veresov @dean-long it would be great to get some reviews of this soon. To assist, let me try provide a > little more context on the area of JVMCI which is motivated by > [Truffle](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/package-summary.html). The > [`StackIntrospection.iterateFrames()`](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/src/jdk/vm/ci/code/stack/StackIntrospection.java#L45) > API exists for Truffle to support walking guest language frames. In Truffle, all values, including primitives, are > boxed which explains why there is no `InspectedFrame.setLocal(int index, Object value)` method. All Truffle guest > language frame locals are accessed/updated by reading/updating the boxed value returned by `InspectedFrame.getLocal()`. > With this existing API, a thread can only inspect its own stack frames. As the description of this PR states, it > extends `StackIntrospection.iterateFrames()` with `StackIntrospection.getStackFrames()` so that a thread can inspect > the frames of other threads. It is somewhat analogous to `java.lang.Thread.getAllStackTraces()` except it allows the > local variables of the frames to be accessed as well. I would like to hear answer to @dholmes-ora question in JBS: "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From neliasso at openjdk.java.net Mon Sep 21 20:02:15 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 21 Sep 2020 20:02:15 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 11:45:51 GMT, Roland Westrelin wrote: > Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html Thanks for your answer! Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/176 From kvn at openjdk.java.net Mon Sep 21 20:54:42 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 21 Sep 2020 20:54:42 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 12:32:36 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spacing in touched lines > > Changes requested by thartmann (Reviewer). Sorry, I am not sure this is better than original code where assert is in one place - in PhiNode::is_copy(). The method is in .hpp file and is inlined - NULL check will be eliminated. It will only be executed in slowdebug build. May be we should "bite the bullet" and remove this method at all - we don't hit the assert in years. We can replace is_copy() check with assert in PhiNode::Ideal() - that should be enough to guarantee that we don't create Phi with NULL control edge. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From xliu at openjdk.java.net Mon Sep 21 21:00:16 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 21 Sep 2020 21:00:16 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 12:01:20 GMT, Roberto Casta?eda Lozano wrote: >> Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code >> with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix spacing in touched lines src/hotspot/share/opto/loopnode.cpp line 53: > 51: bool Node::is_cloop_ind_var() const { > 52: #ifdef ASSERT > 53: if (is_Phi()) assert(!as_Phi()->is_copy(), "this phi cannot be a copy"); This line doesn't seem right because `assert` is empty if NDEBUG is set. it will alter the original c++ code. better to put predicate in assert like Tobias said. `assert(!is_Phi() || !as_Phi()->is_copy() ...)` ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From jcm at openjdk.java.net Tue Sep 22 01:56:31 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 01:56:31 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 Message-ID: if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != Unpack_uncommon_trap) { assert(thread->has_pending_exception(), "should have thrown OOME/Async"); introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. Request for review. ------------- Commit messages: - fixing buggy code introduced in 8249451 Changes: https://git.openjdk.java.net/jdk/pull/292/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=292&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253447 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/292.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/292/head:pull/292 PR: https://git.openjdk.java.net/jdk/pull/292 From jcm at openjdk.java.net Tue Sep 22 02:05:01 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 02:05:01 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 01:48:53 GMT, Jamsheed Mohammed C M wrote: > if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != > Unpack_uncommon_trap) { > assert(thread->has_pending_exception(), "should have thrown OOME/Async"); > > introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. > > This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced > in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. > Request for review. @dholmes-ora @veresov @fisk could you please have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From jcm at openjdk.java.net Tue Sep 22 02:05:01 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 02:05:01 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: References: Message-ID: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> > if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != > Unpack_uncommon_trap) { > assert(thread->has_pending_exception(), "should have thrown OOME/Async"); > > introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. > > This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced > in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. > Request for review. Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: fixing the assert message too ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/292/files - new: https://git.openjdk.java.net/jdk/pull/292/files/d81ce188..4eea9a95 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=292&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=292&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/292.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/292/head:pull/292 PR: https://git.openjdk.java.net/jdk/pull/292 From iveresov at openjdk.java.net Tue Sep 22 02:25:21 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 22 Sep 2020 02:25:21 GMT Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions. [v4] In-Reply-To: References: <2zjS36Nz0zH4AorRbppunfKPFkciaMD865WyBdMzOFI=.fc7a6fd1-96b4-4769-ab0b-b71e7f5bdc9b@github.com> Message-ID: <2cAu7aYUryynOP5VQhG5B7OR3LCDqjjOfyXvCmEK1cE=.66a98839-c47f-446e-acfc-e65ec1f70407@github.com> On Wed, 16 Sep 2020 09:36:28 GMT, Jamsheed Mohammed C M wrote: >> Hi >> >> Moving the review that is based on mercurial repo to github. >> The history of conversation is >> [here](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039861.html) >> Issue:[ JDK-8249451 ](https://bugs.openjdk.java.net/browse/JDK-8249451) >> >> @dholmes-ora could you please have a look. > > Jamsheed Mohammed C M has refreshed the contents of this pull request, and previous commits have been removed. The > incremental views will show differences compared to the previous content of the PR. The pull request contains one new > commit since the last revision: > comment modified wrt review feedback Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/169 From iveresov at openjdk.java.net Tue Sep 22 02:26:07 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 22 Sep 2020 02:26:07 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> Message-ID: On Tue, 22 Sep 2020 02:05:01 GMT, Jamsheed Mohammed C M wrote: >> if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != >> Unpack_uncommon_trap) { >> assert(thread->has_pending_exception(), "should have thrown OOME/Async"); >> >> introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. >> >> This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced >> in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. >> Request for review. > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing the assert message too Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From kvn at openjdk.java.net Tue Sep 22 02:34:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 02:34:04 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Mon, 21 Sep 2020 09:20:56 GMT, Andrew Dinn wrote: >>> Can you explain where this restricted effect is documented? >> >> Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip >> supports it is mostly controlled in: [vm_version_x86.cpp]( >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp) specifically: >> [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755) >> and in [generate_get_cpu_info]( >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611). In order to test the >> patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the aforementioned code permitted >> AVX2 instructions for (maybe this is an error and it should not be enabled for this processor though) as most of the >> infrastructure I personally use here at AWS runs on Intel Xeon processors - I also tested on a E5-2680 which the JVM >> does not enable AVX2 for. However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen >> 2 architecture, of which the current flagship: Threadripper 3990X, is based, is able to support AVX2 [without the >> frequency scaling]( >> https://www.anandtech.com/Show/Index/14525?cPage=7&all=False&sort=0&page=9&slug=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome) >> which some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm >> how it is classified in the JVM. Also, I found when investigating this that there is actually a JVM flag which can be >> used to control what level of AVX is enabled: `-XX:UseAVX=version.` > >> > Can you explain where this restricted effect is documented? > >> Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip >> supports it is mostly controlled in: vm_version_x86.cpp specifically: get_processor_features and in >> generate_get_cpu_info. > > Yes, I can see what the code does. I was asking where the cpu behaviour is documented independent of the code. > >> In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the >> aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this >> processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also >> tested on a E5-2680 which the JVM does not enable AVX2 for. > > 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe is a Skylake design. However, the code > I pointed you at in vm_version_x86 which claims to detect 'early Skylake' designs is only disabling AVX512 support. It > still enables AVX2. Similarly, the code that generates machine code to check the processor capabilities has a special > check if use_evex is set (i.e. AVX3 is requested) which disables AVX512 for Skylake but does not disable AVX2 support. >> However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which >> the current flagship: Threadripper 3990X, is based, is able to support AVX2 without the frequency scaling which >> some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm how it >> is classified in the JVM. > > Well, it would be good to know where you read that and to see if that confirms thar the code is avoiding the issue > Andrew raised. >> Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX >> is enabled: -XX:UseAVX=version. > > Yes, indeed. However, what I am trying to understand is whether the current code is bypassing the problem Andrew > brought up in the cases where that problem actually exists. It doesn't look like it so far given that the problem > applies to AVX2 and only AVX512 support is being disabled and, even then only for some (Skylake) processors. Without > some clear documentation of what processors suffer from this power surge problem it will not be possible to decide > whether this patch is doing the right thing or not. Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed by @theRealAph is sensitive to vector size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896 By keeping vector size less or equal to 32 bytes we should avoid it. And as I can see this intrinsic code is using 32 bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, Assembler::AVX_256bit);` Also we never had issues with AVX2. only with AVX512 regarding performance hit: https://bugs.openjdk.java.net/browse/JDK-8221092 I would like to see performance numbers for for all values of UseAVX flag : 0, 1, 2, 3 The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in .ad file. Make sure to test with UseAVX=0 to make sure that some avx instructions are not mixed into non avx code. And also with UseSSE=2 (for example) to make sure shared code correctly recognize that intrinsics is not supported. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From dholmes at openjdk.java.net Tue Sep 22 02:37:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 22 Sep 2020 02:37:56 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> Message-ID: On Tue, 22 Sep 2020 02:05:01 GMT, Jamsheed Mohammed C M wrote: >> if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != >> Unpack_uncommon_trap) { >> assert(thread->has_pending_exception(), "should have thrown OOME/Async"); >> >> introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. >> >> This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced >> in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. >> Request for review. > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing the assert message too src/hotspot/share/runtime/deoptimization.cpp line 531: > 529: #endif > 530: > 531: if (thread->frames_to_pop_failed_realloc() > 0 && exec_mode != Unpack_uncommon_trap) { I'm not at all clear on whether an async-exception could be pending at this point. The original change indicated it could be, but now you are saying it can't. How is that known? ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From kvn at openjdk.java.net Tue Sep 22 03:02:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 03:02:16 GMT Subject: RFR: 8252518: cache result of CompilerToVM.getComponentType In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 09:19:57 GMT, Doug Simon wrote: > Linux perf profiles of CompileTheWorld with libgraal show that `CompilerToVM.getComponentType` is the most expensive > JVMCI VM entry point with almost 2% of total execution time: > + 1.87% 0.04% [.] c2v_getComponentType > + 0.54% 0.00% [.] c2v_installCode > 0.39% 0.00% [.] c2v_getResolvedJavaType0 > 0.04% 0.00% [.] c2v_resolvePossiblyCachedConstantInPool > 0.03% 0.00% [.] c2v_interpreterFrameSize > 0.03% 0.01% [.] c2v_isAssignableFrom > 0.02% 0.00% [.] c2v_translate > 0.01% 0.00% [.] c2v_getIdentityHashCode > > It's worth caching the result of this call. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/172 From kvn at openjdk.java.net Tue Sep 22 03:02:17 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 03:02:17 GMT Subject: RFR: 8252518: cache result of CompilerToVM.getComponentType In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 02:59:37 GMT, Vladimir Kozlov wrote: >> Linux perf profiles of CompileTheWorld with libgraal show that `CompilerToVM.getComponentType` is the most expensive >> JVMCI VM entry point with almost 2% of total execution time: >> + 1.87% 0.04% [.] c2v_getComponentType >> + 0.54% 0.00% [.] c2v_installCode >> 0.39% 0.00% [.] c2v_getResolvedJavaType0 >> 0.04% 0.00% [.] c2v_resolvePossiblyCachedConstantInPool >> 0.03% 0.00% [.] c2v_interpreterFrameSize >> 0.03% 0.01% [.] c2v_isAssignableFrom >> 0.02% 0.00% [.] c2v_translate >> 0.01% 0.00% [.] c2v_getIdentityHashCode >> >> It's worth caching the result of this call. > > Marked as reviewed by kvn (Reviewer). Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/172 From jcm at openjdk.java.net Tue Sep 22 03:15:59 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 03:15:59 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> Message-ID: <4UXzkGK0txAGnug89GEPNmT58axvNWBQ1aNwtb85uBI=.f08c488e-79e3-4c73-adc6-54788999eff7@github.com> On Tue, 22 Sep 2020 02:35:17 GMT, David Holmes wrote: >> Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing the assert message too > > src/hotspot/share/runtime/deoptimization.cpp line 531: > >> 529: #endif >> 530: >> 531: if (thread->frames_to_pop_failed_realloc() > 0 && exec_mode != Unpack_uncommon_trap) { > > I'm not at all clear on whether an async-exception could be pending at this point. The original change indicated it > could be, but now you are saying it can't. How is that known? Async-exception can be pending for Deoptimization::load_class_by_index(Java code executed case). This can happen for C2/ and probably for JVMCI too. https://github.com/openjdk/jdk/blob/d1f9b8a8b54843f06a93078c4a058af86fcc2aac/src/hotspot/share/runtime/deoptimization.cpp#L1964 In all cases deopt entries are equipped to handle pending exceptions. In my previous code I incorrectly tried to handle it using Unpack_exception route. this can have implication that I am at method entry and I don't handle locks properly. now i simply leave it to the deopt entries to handle pending exceptions. ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From kvn at openjdk.java.net Tue Sep 22 03:37:25 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 03:37:25 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v3] In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 11:19:42 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8252847: Review comments resolution > > Thanks for the clarification och update! > > Reviewed. @jatin-bhateja Can you put summary of performance improvement into JBS? ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From dholmes at openjdk.java.net Tue Sep 22 04:16:30 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 22 Sep 2020 04:16:30 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: <4UXzkGK0txAGnug89GEPNmT58axvNWBQ1aNwtb85uBI=.f08c488e-79e3-4c73-adc6-54788999eff7@github.com> References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> <4UXzkGK0txAGnug89GEPNmT58axvNWBQ1aNwtb85uBI=.f08c488e-79e3-4c73-adc6-54788999eff7@github.com> Message-ID: <4AAQXW3Vy9cWsNdT55xzsW51g2voP0hktq_dGTwR6y0=.cab032ef-a8e3-441c-ad9c-ebf2a2fb8075@github.com> On Tue, 22 Sep 2020 03:13:04 GMT, Jamsheed Mohammed C M wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 531: >> >>> 529: #endif >>> 530: >>> 531: if (thread->frames_to_pop_failed_realloc() > 0 && exec_mode != Unpack_uncommon_trap) { >> >> I'm not at all clear on whether an async-exception could be pending at this point. The original change indicated it >> could be, but now you are saying it can't. How is that known? > > Async-exception can be pending for Deoptimization::load_class_by_index(Java code executed case). This can happen for > C2/ and probably for JVMCI too. > https://github.com/openjdk/jdk/blob/d1f9b8a8b54843f06a93078c4a058af86fcc2aac/src/hotspot/share/runtime/deoptimization.cpp#L1964 > In all cases deopt entries are equipped to handle pending exceptions. > In my previous code I incorrectly tried to handle it using Unpack_exception route. this can have implication that I am > at method entry and I don't handle locks properly. > now i simply leave it to the deopt entries to handle pending exceptions. Okay. I'm not familiar with that code at all so will leave this for compiler folk. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From github.com+35809+mknjc at openjdk.java.net Tue Sep 22 04:56:09 2020 From: github.com+35809+mknjc at openjdk.java.net (mknjc) Date: Tue, 22 Sep 2020 04:56:09 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Fri, 18 Sep 2020 23:11:46 GMT, Jason Tatton wrote: >> "the JVM has knowledge of the AVX capability of the chip it?s running on and disables the AVX2 code path for chips >> which suffer from the performance degradation which has been outlined in this discussion" >> Does it? The white paper Andrew cited doesn't mention this as being specific to only some chips that implement AVX2. >> Can you explain where this restricted effect is documented? >> Also, I assume you are referring to the code in vm_version_x86.cpp with this comment >> >> // Don't use AVX-512 on older Skylakes unless explicitly requested >> >> is that correct? > >> Can you explain where this restricted effect is documented? > > Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip > supports it is mostly controlled in: [vm_version_x86.cpp]( > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp) specifically: > [get_processor_features](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L684-L755) > and in [generate_get_cpu_info]( > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L69-L611). In order to test the > patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the aforementioned code permitted > AVX2 instructions for (maybe this is an error and it should not be enabled for this processor though) as most of the > infrastructure I personally use here at AWS runs on Intel Xeon processors - I also tested on a E5-2680 which the JVM > does not enable AVX2 for. However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen > 2 architecture, of which the current flagship: Threadripper 3990X, is based, is able to support AVX2 [without the > frequency scaling]( > https://www.anandtech.com/Show/Index/14525?cPage=7&all=False&sort=0&page=9&slug=amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome) > which some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm > how it is classified in the JVM. Also, I found when investigating this that there is actually a JVM flag which can be > used to control what level of AVX is enabled: `-XX:UseAVX=version.` I really don't see the problem with using AVX for this? As long as the used instructions all only activate license l0 the cpu don't scale the frequency at all. For AVX2 these are most of all instructions which don't use the FMA or floating point ports. Additionally the cpu doesn't instant scale down the frequency but runs the 256 bit instructions with reduced throughput but full cpu clock until enough instructions use the avx command set. For more information see https://stackoverflow.com/a/56861355/130541 So as long the 512bit width instructions aren't used there should no frequency scaling happening. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From xliu at openjdk.java.net Tue Sep 22 05:09:37 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 22 Sep 2020 05:09:37 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 20:52:19 GMT, Vladimir Kozlov wrote: > Sorry, I am not sure this is better than original code where assert is in one place - in PhiNode::is_copy(). > The method is in .hpp file and is inlined - NULL check will be eliminated. It will only be executed in slowdebug build. > May be we should "bite the bullet" and remove this method at all - we don't hit the assert in years. > We can replace is_copy() check with assert in PhiNode::Ideal() - that should be enough to guarantee that we don't > create Phi with NULL control edge. Another thing is that a node has a member called uint Node::is_Copy() const. PhiNode/Region nodes both have a member callled "is_copy()". Is it intentional? IMHO, it's not good in style. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From eosterlund at openjdk.java.net Tue Sep 22 06:23:34 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Sep 2020 06:23:34 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: <4AAQXW3Vy9cWsNdT55xzsW51g2voP0hktq_dGTwR6y0=.cab032ef-a8e3-441c-ad9c-ebf2a2fb8075@github.com> References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> <4UXzkGK0txAGnug89GEPNmT58axvNWBQ1aNwtb85uBI=.f08c488e-79e3-4c73-adc6-54788999eff7@github.com> <4AAQXW3Vy9cWsNdT55xzsW51g2voP0hktq_dGTwR6y0=.cab032ef-a8e3-441c-ad9c-ebf2a2fb8075@github.com> Message-ID: On Tue, 22 Sep 2020 04:13:26 GMT, David Holmes wrote: >> Async-exception can be pending for Deoptimization::load_class_by_index(Java code executed case). This can happen for >> C2/ and probably for JVMCI too. >> https://github.com/openjdk/jdk/blob/d1f9b8a8b54843f06a93078c4a058af86fcc2aac/src/hotspot/share/runtime/deoptimization.cpp#L1964 >> In all cases deopt entries are equipped to handle pending exceptions. >> In my previous code I incorrectly tried to handle it using Unpack_exception route. this can have implication that I am >> at method entry and I don't handle locks properly. >> now i simply leave it to the deopt entries to handle pending exceptions. > > Okay. I'm not familiar with that code at all so will leave this for compiler folk. Thanks. So basically when you get here through the uncommon trap path, you have just called a JRT_ENTRY function and returned from it by now. That means you can have an async exception installed as a pending exception. If you just leave it there, the deopt entry of the interpreter will check for it and throw it. So we were already equipped to deal with this, and that is what should happen. The code that clears the pending exception and sets the exception oop instead is for when you are unwinding due to exception throwing into a deoptimized frame. The deopt handler is returned as exception handler PC for such frames, and hence needs to quack like an exception handler. But that is not at all the scenario we are in when we go through the uncommon trap; we are not in the middle of throwing an exception. Conversely, we are just about to throw it - that difference is the crucial thing. So clearing the pending exception is likely to just make it disappear (or crash later). ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From eosterlund at openjdk.java.net Tue Sep 22 06:23:34 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Sep 2020 06:23:34 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> Message-ID: On Tue, 22 Sep 2020 02:05:01 GMT, Jamsheed Mohammed C M wrote: >> if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != >> Unpack_uncommon_trap) { >> assert(thread->has_pending_exception(), "should have thrown OOME/Async"); >> >> introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. >> >> This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced >> in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. >> Request for review. > > Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: > > fixing the assert message too Looks good, thanks for fixing this. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/292 From jcm at openjdk.java.net Tue Sep 22 06:29:39 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 06:29:39 GMT Subject: Integrated: 8253447: Remove buggy code introduced by 8249451 In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 01:48:53 GMT, Jamsheed Mohammed C M wrote: > if ((thread->has_pending_exception() || thread->frames_to_pop_failed_realloc() > 0) && exec_mode != > Unpack_uncommon_trap) { > assert(thread->has_pending_exception(), "should have thrown OOME/Async"); > > introduced a buggy code checking, clearing pending exception and taking Unpack_exception route. > > This can have consequences as the deopt entries may have additional logic depending on bci's. and the change introduced > in 8249451 doesn't honor deopt exception checking and forward logic. Thank you @fisk for pointing the bug in the code. > Request for review. This pull request has now been integrated. Changeset: f7b1ce45 Author: Jamsheed Mohammed C M URL: https://git.openjdk.java.net/jdk/commit/f7b1ce45 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8253447: Remove buggy code introduced by 8249451 Reviewed-by: iveresov, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From jcm at openjdk.java.net Tue Sep 22 06:29:38 2020 From: jcm at openjdk.java.net (Jamsheed Mohammed C M) Date: Tue, 22 Sep 2020 06:29:38 GMT Subject: RFR: 8253447: Remove buggy code introduced by 8249451 [v2] In-Reply-To: References: <-1narCtDccARvrE8LbvdKMuaDtBU9f_08MJjysQ7xB0=.19e1536d-5d9d-4e4b-a1fc-9249e735fbd1@github.com> Message-ID: On Tue, 22 Sep 2020 06:20:30 GMT, Erik ?sterlund wrote: >> Jamsheed Mohammed C M has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing the assert message too > > Looks good, thanks for fixing this. Thank you @dholmes-ora @veresov @fisk ------------- PR: https://git.openjdk.java.net/jdk/pull/292 From eosterlund at openjdk.java.net Tue Sep 22 06:44:41 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Sep 2020 06:44:41 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Mon, 21 Sep 2020 18:45:24 GMT, Vladimir Kozlov wrote: > I would like to hear answer to @dholmes-ora question in JBS: > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" I also want to know this. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From never at openjdk.java.net Tue Sep 22 07:26:30 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 22 Sep 2020 07:26:30 GMT Subject: RFR: 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < =?UTF-8?B?cGNfb2Zmc+KApg==?= In-Reply-To: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> References: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> Message-ID: <8slCuLPBvcpI2AztYuxjhLe2zrYlY994c6DVlckOAiA=.0607fac7-f114-4aaa-8e19-ee6f2a040c3d@github.com> On Sun, 20 Sep 2020 23:00:23 GMT, Vladimir Kozlov wrote: > Fix frame state recording for System.arraycopy() intrinsic. This is port of Graal fix: > https://github.com/oracle/graal/commit/438a7cb0257 > > Graal unit test ArrayCopyIntrinsificationTest.java is updated to catch this case. > > I ran tier1 and tier3-graal testing. I also ran CodeCacheInfoOnCompilation/Test.java and c2/Test6603011.java jtreg > tests 100 times in Mach5 and got 6 failures with latest JDK. With fix tests passed. Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/272 From adinn at openjdk.java.net Tue Sep 22 09:01:22 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Tue, 22 Sep 2020 09:01:22 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <3X2uVszEC9NytfF9QvL6dJQ1SHztYEpAkMKNsN3VuKc=.9aec19c0-0e5d-4512-ac41-2eed8a0cd078@github.com> <-jCc20O2YVYuUT1bu-9h-nHSx0zP35TMleyo1-fdlRI=.f8e3fe1f-3a7e-44a7-b24b-8c809cf727c8@github.com> <2dlrgVtF5twY9WGHEKS2ZcLEm1jPmGgOfFF6o-ZyUHc=.c33e5814-724a-4569-9eed-97e8afaaa440@github.com> Message-ID: On Tue, 22 Sep 2020 02:31:14 GMT, Vladimir Kozlov wrote: >>> > Can you explain where this restricted effect is documented? >> >>> Certainly! I?ve found that determining the capability of the CPU and whether to enable AVX2 support if the chip >>> supports it is mostly controlled in: vm_version_x86.cpp specifically: get_processor_features and in >>> generate_get_cpu_info. >> >> Yes, I can see what the code does. I was asking where the cpu behaviour is documented independent of the code. >> >>> In order to test the patch comprehensively I had to track down an Intel Core i7 (I7-9750H) processor which the >>> aforementioned code permitted AVX2 instructions for (maybe this is an error and it should not be enabled for this >>> processor though) as most of the infrastructure I personally use here at AWS runs on Intel Xeon processors - I also >>> tested on a E5-2680 which the JVM does not enable AVX2 for. >> >> 'maybe'? The documentation Andrew provided mentioned Xeon E5 v3 which I believe is a Skylake design. However, the code >> I pointed you at in vm_version_x86 which claims to detect 'early Skylake' designs is only disabling AVX512 support. It >> still enables AVX2. Similarly, the code that generates machine code to check the processor capabilities has a special >> check if use_evex is set (i.e. AVX3 is requested) which disables AVX512 for Skylake but does not disable AVX2 support. >>> However, this is just the Intel side of things. When it comes to AMD I read that the AMD Zen 2 architecture, of which >>> the current flagship: Threadripper 3990X, is based, is able to support AVX2 without the frequency scaling which >>> some/all(?) of the Intel chips incur. I personally don?t have access to one of these chips so I cannot confirm how it >>> is classified in the JVM. >> >> Well, it would be good to know where you read that and to see if that confirms thar the code is avoiding the issue >> Andrew raised. >>> Also, I found when investigating this that there is actually a JVM flag which can be used to control what level of AVX >>> is enabled: -XX:UseAVX=version. >> >> Yes, indeed. However, what I am trying to understand is whether the current code is bypassing the problem Andrew >> brought up in the cases where that problem actually exists. It doesn't look like it so far given that the problem >> applies to AVX2 and only AVX512 support is being disabled and, even then only for some (Skylake) processors. Without >> some clear documentation of what processors suffer from this power surge problem it will not be possible to decide >> whether this patch is doing the right thing or not. > > Based on comment by @jatin-bhateja (Intel) frequency level switchover pointed by @theRealAph is sensitive to vector > size https://github.com/openjdk/jdk/pull/144#issuecomment-692044896 > > By keeping vector size less or equal to 32 bytes we should avoid it. And as I can see this intrinsic code is using 32 > bytes (chars) and 16 bytes vectors: `pbroadcastb(vec1, vec1, Assembler::AVX_256bit);` > Also we never had issues with AVX2. only with AVX512 regarding performance hit: > https://bugs.openjdk.java.net/browse/JDK-8221092 > > I would like to see performance numbers for for all values of UseAVX flag : 0, 1, 2, 3 > > The usage is guarded UseSSE42Intrinsics in UseSSE42Intrinsics predicate in .ad file. Make sure to test with UseAVX=0 to > make sure that some avx instructions are not mixed into non avx code. And also with UseSSE=2 (for example) to make sure > shared code correctly recognize that intrinsics is not supported. @vnkozlov @mknjc @jatin-bhateja Thanks for providing the relevant details. I'm now quite content that this patch avoids any potential frequency scaling problem. I'm also glad that an explanation of why this is so is now available -- although it's not perfect that we are relying on a stackoverflow post for the full details. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 09:17:16 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 09:17:16 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v3] In-Reply-To: References: Message-ID: > Convert `PhiNode::is_copy()` into an actual, debug-only predicate. Replace calls to `is_copy()` from non-debug code > with explicit assertions. Remove dead loop in debug-only `MergeMemStream::match_memory()`. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of github.com:robcasloz/jdk into JDK-8252583 - Simplify assertions by folding in phi tests - Fix spacing in touched lines - 8252583: Make PhiNode::is_copy() debug only Convert PhiNode::is_copy() into an actual, debug-only predicate. Replace calls to is_copy() from non-debug code with explicit assertions. Remove dead loop in debug-only MergeMemStream::match_memory(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/275/files - new: https://git.openjdk.java.net/jdk/pull/275/files/53962988..844a31af Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=01-02 Stats: 7965 lines in 297 files changed: 4684 ins; 2611 del; 670 mod Patch: https://git.openjdk.java.net/jdk/pull/275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/275/head:pull/275 PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+4676506+javeleon at openjdk.java.net Tue Sep 22 09:19:57 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Tue, 22 Sep 2020 09:19:57 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Tue, 22 Sep 2020 06:42:00 GMT, Erik ?sterlund wrote: > I would like to hear answer to @dholmes-ora question in JBS: > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 09:25:54 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 09:25:54 GMT Subject: RFR: 8252583: Make PhiNode::is_copy() debug only [v2] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 05:04:56 GMT, Xin Liu wrote: >> Sorry, I am not sure this is better than original code where assert is in one place - in PhiNode::is_copy(). >> The method is in .hpp file and is inlined - NULL check will be eliminated. It will only be executed in slowdebug build. >> May be we should "bite the bullet" and remove this method at all - we don't hit the assert in years. >> We can replace is_copy() check with assert in PhiNode::Ideal() - that should be enough to guarantee that we don't >> create Phi with NULL control edge. > >> Sorry, I am not sure this is better than original code where assert is in one place - in PhiNode::is_copy(). >> The method is in .hpp file and is inlined - NULL check will be eliminated. It will only be executed in slowdebug build. >> May be we should "bite the bullet" and remove this method at all - we don't hit the assert in years. >> We can replace is_copy() check with assert in PhiNode::Ideal() - that should be enough to guarantee that we don't >> create Phi with NULL control edge. > > Another thing is that a node has a member called uint Node::is_Copy() const. PhiNode/Region nodes both have a member > callled "is_copy()". Is it intentional? IMHO, it's not good in style. Thanks Tobias and Xin for the reviews! I updated the PR for the record, even if we might end up moving the assertions somewhere else as per Vladimir's suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From eosterlund at openjdk.java.net Tue Sep 22 10:53:53 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Sep 2020 10:53:53 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Tue, 22 Sep 2020 09:17:08 GMT, Allan Gregersen wrote: > > I would like to hear answer to @dholmes-ora question in JBS: > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? 1) You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging Truffle code alone, to be maintained separately. 2) We are talking about JVMTI, not JVMCI. iterateFrames is defined in JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3) When you just read locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I would like to better understand the problem domain before taking this further. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 12:57:42 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 12:57:42 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Sun, 20 Sep 2020 17:02:29 GMT, Roberto Casta?eda Lozano wrote: >> Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, >> the worklist is shuffled before each main run of the IGVN loop. Also add >> 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify >> the seed. In either case, the seed is logged if 'LogCompilation' is enabled. >> >> The generation or specification of seeds also affects the randomization >> triggered by 'StressLCM' and 'StressGCM'. The new options are declared as >> production+diagnostic for consistency with these existing options. > > Reverted to "draft mode", as I just realized the design is not repeatable since it relies on global PRNG state. Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each main run of the IGVN loop. Also add 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify the seed. In either case, the seed is logged if 'LogCompilation' is enabled. The new options are declared as production+diagnostic for consistency with the existing 'StressLCM' and 'StressGCM' options. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 13:04:47 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 13:04:47 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 12:54:31 GMT, Roberto Casta?eda Lozano wrote: >> Reverted to "draft mode", as I just realized the design is not repeatable since it relies on global PRNG state. > > Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, > the worklist is shuffled before each main run of the IGVN loop. Also add > 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify > the seed. In either case, the seed is logged if 'LogCompilation' is enabled. > The new options are declared as production+diagnostic for consistency with the > existing 'StressLCM' and 'StressGCM' options. This pull request is ready for review again. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 13:31:05 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 13:31:05 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v4] In-Reply-To: References: Message-ID: > Remove unused notion of "PhiNode-to-copy degradation", where PhiNodes can be degraded to copies by setting their > RegionNode to NULL. Remove corresponding `PhiNode::is_copy()` test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in `PhiNode::Ideal()`. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Clean up unused PhiNode-to-copy degradation Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be degraded to copies by setting their RegionNode to NULL. Remove corresponding PhiNode::is_copy() test, which always returned NULL (false). Assert that PhiNodes have an associated RegionNode in PhiNode::Ideal(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/275/files - new: https://git.openjdk.java.net/jdk/pull/275/files/844a31af..91f30bae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=02-03 Stats: 27 lines in 7 files changed: 1 ins; 22 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/275/head:pull/275 PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 13:31:05 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 13:31:05 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v2] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 09:23:08 GMT, Roberto Casta?eda Lozano wrote: >>> Sorry, I am not sure this is better than original code where assert is in one place - in PhiNode::is_copy(). >>> The method is in .hpp file and is inlined - NULL check will be eliminated. It will only be executed in slowdebug build. >>> May be we should "bite the bullet" and remove this method at all - we don't hit the assert in years. >>> We can replace is_copy() check with assert in PhiNode::Ideal() - that should be enough to guarantee that we don't >>> create Phi with NULL control edge. >> >> Another thing is that a node has a member called uint Node::is_Copy() const. PhiNode/Region nodes both have a member >> callled "is_copy()". Is it intentional? IMHO, it's not good in style. > > Thanks Tobias and Xin for the reviews! I updated the PR for the record, even if we might end up moving the assertions > somewhere else as per Vladimir's suggestion. Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be degraded to copies by setting their RegionNode to NULL. Remove corresponding PhiNode::is_copy() test, which always returned NULL (false). Assert that PhiNodes have an associated RegionNode in PhiNode::Ideal(). ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 13:41:22 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 13:41:22 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v2] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 13:24:35 GMT, Roberto Casta?eda Lozano wrote: >> Thanks Tobias and Xin for the reviews! I updated the PR for the record, even if we might end up moving the assertions >> somewhere else as per Vladimir's suggestion. > > Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be > degraded to copies by setting their RegionNode to NULL. Remove corresponding > PhiNode::is_copy() test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in PhiNode::Ideal(). Tested successfully on hs-tier1, hs-tier2, and hs-tier3. ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From simonis at openjdk.java.net Tue Sep 22 15:21:58 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 22 Sep 2020 15:21:58 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Mon, 21 Sep 2020 12:45:55 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline to end of vmSymbols.cpp src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1889: > 1887: bind(SCAN_TO_32_CHAR_LOOP); > 1888: vmovdqu(vec3, Address(result, 0)); > 1889: vpcmpeqb(vec3, vec3, vec1, 1); Using `Assembler::AVX_256bit` instead of `1` will make this easier to understand. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From simonis at openjdk.java.net Tue Sep 22 15:21:56 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 22 Sep 2020 15:21:56 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> Message-ID: On Fri, 18 Sep 2020 11:56:09 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains four commits: > - Merge master > - 8173585: further whitespace changes required by jcheck > - JDK-8173585 - whitespace changes required by jcheck > - JDK-8173585 Hi Jason, thanks for bringing String.indexOf() for latin strings up to date with the Unicode version. Your changes look good except a few minor issues I've commented on right in the code. I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test versions where the search character will be found beyond the first 32/64 characters after "fromIndex"? test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 24: > 22: > 23: public static void main(String[] args) throws Exception { > 24: for (int i = 0; i < 100_0; ++i) {//repeat such that we enter into C2 code... The placement of the underscore looks strange to me. I'd expect it to separate thousands (like 1_000) if at all but not sure if id use it for one thousand at all as that's really not such a big number that it is hard to read.. Also, the Tier4InvocationThreshold is 5000 so I'm not sure youre reaching C2? src/hotspot/share/classfile/vmSymbols.cpp line 295: > 293: if (symbol == NULL) return NO_SID; > 294: return find_sid(symbol); > 295: } I think it is common sense to have a newline at the end of a file. test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 84: > 82: } > 83: > 84: } Please put an EOL at the end of the file. test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java line 199: > 197: return ret; > 198: } > 199: } Please put an EOL at the end of the file. ------------- Changes requested by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/71 From jbhateja at openjdk.java.net Tue Sep 22 15:32:18 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 15:32:18 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions Message-ID: Summary: 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. Performance Results: System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java ArrayCopyPartialInlineSize : 32 JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain -- | -- | -- | -- | -- ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 Detailed Reports: Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) ------------- Commit messages: - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252848 Stats: 515 lines in 23 files changed: 495 ins; 0 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Tue Sep 22 15:35:29 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 15:35:29 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 15:24:41 GMT, Jatin Bhateja wrote: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) This pull request is an incremental work over [pull request-61](https://github.com/openjdk/jdk/pull/61). It shares leaf level assembler routines with it, to build the standalone patch following common code patch needs to be applied on top of this PR. http://cr.openjdk.java.net/~jbhateja/8252848/common_code_with_8252847.diff ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From never at openjdk.java.net Tue Sep 22 15:36:15 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 22 Sep 2020 15:36:15 GMT Subject: RFR: 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < =?UTF-8?B?cGNfb2Zmc+KApg==?= In-Reply-To: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> References: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> Message-ID: On Sun, 20 Sep 2020 23:00:23 GMT, Vladimir Kozlov wrote: > Fix frame state recording for System.arraycopy() intrinsic. This is port of Graal fix: > https://github.com/oracle/graal/commit/438a7cb0257 > > Graal unit test ArrayCopyIntrinsificationTest.java is updated to catch this case. > > I ran tier1 and tier3-graal testing. I also ran CodeCacheInfoOnCompilation/Test.java and c2/Test6603011.java jtreg > tests 100 times in Mach5 and got 6 failures with latest JDK. With fix tests passed. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/272 From kvn at openjdk.java.net Tue Sep 22 15:44:50 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 15:44:50 GMT Subject: Integrated: 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < =?UTF-8?B?cGNfb2Zmc+KApg==?= In-Reply-To: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> References: <8qQAkpOU6JCqbX4iwm12oWd_Q3-P9wxln5_veFF7EJk=.4d3e4f99-e294-4d43-8cc6-8abda6a2125b@github.com> Message-ID: On Sun, 20 Sep 2020 23:00:23 GMT, Vladimir Kozlov wrote: > Fix frame state recording for System.arraycopy() intrinsic. This is port of Graal fix: > https://github.com/oracle/graal/commit/438a7cb0257 > > Graal unit test ArrayCopyIntrinsificationTest.java is updated to catch this case. > > I ran tier1 and tier3-graal testing. I also ran CodeCacheInfoOnCompilation/Test.java and c2/Test6603011.java jtreg > tests 100 times in Mach5 and got 6 failures with latest JDK. With fix tests passed. This pull request has now been integrated. Changeset: 24e12b38 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/24e12b38 Stats: 49 lines in 6 files changed: 2 ins; 37 del; 10 mod 8247251: Assert (_pcs_length == 0 || last_pc()->pc_offset() < pc_offs? Co-authored-by: Tom Rodriguez Reviewed-by: never ------------- PR: https://git.openjdk.java.net/jdk/pull/272 From jbhateja at openjdk.java.net Tue Sep 22 15:47:29 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 15:47:29 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v4] In-Reply-To: References: Message-ID: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847: Review comments resolution; code reorganized to cover arraycopy for reference types. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/61/files - new: https://git.openjdk.java.net/jdk/pull/61/files/271b6457..fadd3687 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=02-03 Stats: 691 lines in 5 files changed: 340 ins; 198 del; 153 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From jbhateja at openjdk.java.net Tue Sep 22 15:55:11 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 15:55:11 GMT Subject: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v5] In-Reply-To: References: Message-ID: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847 : Modifying file permission to resolve jcheck failure. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/61/files - new: https://git.openjdk.java.net/jdk/pull/61/files/fadd3687..78c4fe73 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=03-04 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From never at openjdk.java.net Tue Sep 22 16:07:42 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 22 Sep 2020 16:07:42 GMT Subject: RFR: 8252518: cache result of CompilerToVM.getComponentType In-Reply-To: References: Message-ID: <8lcCvH9CLMAMXGD70704_cRrkQ5PQX27pv8wTfIqxOI=.7982e01a-997c-40a6-b065-ddbb0f067811@github.com> On Tue, 15 Sep 2020 09:19:57 GMT, Doug Simon wrote: > Linux perf profiles of CompileTheWorld with libgraal show that `CompilerToVM.getComponentType` is the most expensive > JVMCI VM entry point with almost 2% of total execution time: > + 1.87% 0.04% [.] c2v_getComponentType > + 0.54% 0.00% [.] c2v_installCode > 0.39% 0.00% [.] c2v_getResolvedJavaType0 > 0.04% 0.00% [.] c2v_resolvePossiblyCachedConstantInPool > 0.03% 0.00% [.] c2v_interpreterFrameSize > 0.03% 0.01% [.] c2v_isAssignableFrom > 0.02% 0.00% [.] c2v_translate > 0.01% 0.00% [.] c2v_getIdentityHashCode > > It's worth caching the result of this call. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/172 From kvn at openjdk.java.net Tue Sep 22 16:14:35 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 16:14:35 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v4] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 13:31:05 GMT, Roberto Casta?eda Lozano wrote: >> Remove unused notion of "PhiNode-to-copy degradation", where PhiNodes can be degraded to copies by setting their >> RegionNode to NULL. Remove corresponding `PhiNode::is_copy()` test, which always returned NULL (false). Assert that >> PhiNodes have an associated RegionNode in `PhiNode::Ideal()`. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Clean up unused PhiNode-to-copy degradation > > Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be > degraded to copies by setting their RegionNode to NULL. Remove corresponding > PhiNode::is_copy() test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in PhiNode::Ideal(). Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/275 From jbhateja at openjdk.java.net Tue Sep 22 16:38:56 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 16:38:56 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 03:34:37 GMT, Vladimir Kozlov wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? yes, I have added the summary in JBS. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From jbhateja at openjdk.java.net Tue Sep 22 16:41:48 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 22 Sep 2020 16:41:48 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 03:34:37 GMT, Vladimir Kozlov wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? Yes, I have added the summary to JBS ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From dnsimon at openjdk.java.net Tue Sep 22 16:59:24 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 22 Sep 2020 16:59:24 GMT Subject: Integrated: 8252518: cache result of CompilerToVM.getComponentType In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 09:19:57 GMT, Doug Simon wrote: > Linux perf profiles of CompileTheWorld with libgraal show that `CompilerToVM.getComponentType` is the most expensive > JVMCI VM entry point with almost 2% of total execution time: > + 1.87% 0.04% [.] c2v_getComponentType > + 0.54% 0.00% [.] c2v_installCode > 0.39% 0.00% [.] c2v_getResolvedJavaType0 > 0.04% 0.00% [.] c2v_resolvePossiblyCachedConstantInPool > 0.03% 0.00% [.] c2v_interpreterFrameSize > 0.03% 0.01% [.] c2v_isAssignableFrom > 0.02% 0.00% [.] c2v_translate > 0.01% 0.00% [.] c2v_getIdentityHashCode > > It's worth caching the result of this call. This pull request has now been integrated. Changeset: 0f26ab16 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/0f26ab16 Stats: 13 lines in 1 file changed: 0 ins; 8 del; 5 mod 8252518: cache result of CompilerToVM.getComponentType Reviewed-by: kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/172 From kvn at openjdk.java.net Tue Sep 22 17:07:19 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 17:07:19 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v2] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 12:44:33 GMT, Roberto Casta?eda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The > incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five > additional commits since the last revision: > - Apply minor rearrangements to simplify the patch > - Do not use the per-compilation seed for StressLCM and StressGCM > > Use global seed for StressLCM and StressGCM as before to preserve their behavior > by now. In the future, these options can also benefit from using the local > (per-compilation) seed by simply using Compile::random() instead of os::random() > and extending the Compile constructor accordingly. > - Merge branch 'master' of github.com:robcasloz/jdk into JDK-8252219 > - Replace global seed with compilation-local seeds for repeatability > > Replace usage of global seed for stress testing with a seed per compilation, to > make stress runs repeatable in the face of concurrent method compilations. This > affects StressLCM, StressGCM, and StressIGVN. Reuse pseudonumber generation > logic in os::random_helper. > - 8252219: C2: Randomize IGVN worklist for stress testing > > Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, > the worklist is shuffled before each main run of the IGVN loop. Also add > 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify > the seed. In either case, the seed is logged if 'LogCompilation' is enabled. > > The generation or specification of seeds also affects the randomization > triggered by 'StressLCM' and 'StressGCM'. The new options are declared as > production+diagnostic for consistency with these existing options. src/hotspot/share/opto/compile.cpp line 4456: > 4454: > 4455: int Compile::random() { > 4456: _stress_seed = os::next_random(_stress_seed); I don't see os::next_random() in runtime/os.hpp src/hotspot/share/opto/c2_globals.hpp line 58: > 56: "Generate random seed for IGVN stress testing") \ > 57: \ > 58: product(uintx, StressSeed, 0, DIAGNOSTIC, \ uintx is 64 bit type. Use uint as type for _stress_seed field. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From gdub at openjdk.java.net Tue Sep 22 18:03:37 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Tue, 22 Sep 2020 18:03:37 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v4] In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. Gilles Duboscq has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains six new commits since the last revision: - Move LambdaEagerInitTest to test/jdk/java/lang/invoke/lambda - Include capturing case test, use jdk.test.lib.Assert - Remove disableEagerInitialization concerns from BridgeMethod.java - Remove extra field test from LambdaTest6 - Wrap long lines - Add dedicated test in the jdk tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/93/files - new: https://git.openjdk.java.net/jdk/pull/93/files/625feb94..5525f217 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=93&range=02-03 Stats: 164 lines in 2 files changed: 91 ins; 73 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/93.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/93/head:pull/93 PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Tue Sep 22 18:20:22 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 22 Sep 2020 18:20:22 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v4] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Tue, 22 Sep 2020 18:03:37 GMT, Gilles Duboscq wrote: >> [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the >> jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda >> classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the >> context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of >> non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas >> are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely >> _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and >> ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance >> behaviour of lambdas in either direction. > > Gilles Duboscq has refreshed the contents of this pull request, and previous commits have been removed. The incremental > views will show differences compared to the previous content of the PR. Looks good. Thanks for the update. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/93 From gdub at openjdk.java.net Tue Sep 22 18:26:31 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Tue, 22 Sep 2020 18:26:31 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v4] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Tue, 22 Sep 2020 18:17:49 GMT, Mandy Chung wrote: >> Gilles Duboscq has refreshed the contents of this pull request, and previous commits have been removed. The incremental >> views will show differences compared to the previous content of the PR. The pull request contains six new commits since >> the last revision: >> - Move LambdaEagerInitTest to test/jdk/java/lang/invoke/lambda >> - Include capturing case test, use jdk.test.lib.Assert >> - Remove disableEagerInitialization concerns from BridgeMethod.java >> - Remove extra field test from LambdaTest6 >> - Wrap long lines >> - Add dedicated test in the jdk tests > > Looks good. Thanks for the update. Thanks @mlchung. Do I need a second review? ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From mchung at openjdk.java.net Tue Sep 22 18:39:49 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 22 Sep 2020 18:39:49 GMT Subject: RFR: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode [v4] In-Reply-To: References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Tue, 22 Sep 2020 18:24:02 GMT, Gilles Duboscq wrote: > Thanks @mlchung. Do I need a second review? No need. You can integrate once you run the regression tests (I usually run tier1-tier3). ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 19:27:20 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 19:27:20 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Define 'StressSeed' option as 'uint' rather than 'uintx' ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/242/files - new: https://git.openjdk.java.net/jdk/pull/242/files/73abfb27..e1131852 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 19:42:20 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 19:42:20 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v2] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 16:56:46 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The >> incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five >> additional commits since the last revision: >> - Apply minor rearrangements to simplify the patch >> - Do not use the per-compilation seed for StressLCM and StressGCM >> >> Use global seed for StressLCM and StressGCM as before to preserve their behavior >> by now. In the future, these options can also benefit from using the local >> (per-compilation) seed by simply using Compile::random() instead of os::random() >> and extending the Compile constructor accordingly. >> - Merge branch 'master' of github.com:robcasloz/jdk into JDK-8252219 >> - Replace global seed with compilation-local seeds for repeatability >> >> Replace usage of global seed for stress testing with a seed per compilation, to >> make stress runs repeatable in the face of concurrent method compilations. This >> affects StressLCM, StressGCM, and StressIGVN. Reuse pseudonumber generation >> logic in os::random_helper. >> - 8252219: C2: Randomize IGVN worklist for stress testing >> >> Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, >> the worklist is shuffled before each main run of the IGVN loop. Also add >> 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify >> the seed. In either case, the seed is logged if 'LogCompilation' is enabled. >> >> The generation or specification of seeds also affects the randomization >> triggered by 'StressLCM' and 'StressGCM'. The new options are declared as >> production+diagnostic for consistency with these existing options. > > src/hotspot/share/opto/compile.cpp line 4456: > >> 4454: >> 4455: int Compile::random() { >> 4456: _stress_seed = os::next_random(_stress_seed); > > I don't see os::next_random() in runtime/os.hpp It is declared in this PR's version of src/hotspot/share/runtime/os.hpp, line 759: https://github.com/openjdk/jdk/blob/e11318520bc548fae1a534dccc7e07218279f15b/src/hotspot/share/runtime/os.hpp#L759 ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 22 19:47:32 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Sep 2020 19:47:32 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 13:02:26 GMT, Roberto Casta?eda Lozano wrote: >> Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, >> the worklist is shuffled before each main run of the IGVN loop. Also add >> 'GenerateStressSeed' and 'StressSeed=N' options to randomly generate or specify >> the seed. In either case, the seed is logged if 'LogCompilation' is enabled. >> The new options are declared as production+diagnostic for consistency with the >> existing 'StressLCM' and 'StressGCM' options. > > This pull request is ready for review again. Thanks for reviewing, Vladimir! I just addressed your two comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From phh at openjdk.java.net Tue Sep 22 20:03:55 2020 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 22 Sep 2020 20:03:55 GMT Subject: RFR: 8253392: remove PhaseCCP_DCE declaration In-Reply-To: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> References: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> Message-ID: On Mon, 21 Sep 2020 08:01:27 GMT, Xin Liu wrote: > hello, reviewers, > May I ask to review this trivial patch? > it's a clean-up. The forward declaration of PhaseCCP_DCE is not useful. Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/277 From enikitin at openjdk.java.net Tue Sep 22 20:19:29 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 22 Sep 2020 20:19:29 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum Message-ID: Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it from ProblemList.txt. Second change is marking the test with randomness keyword from the [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). Tested using mach5 on the 4 platforms, 50 runs each. ------------- Commit messages: - 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum Changes: https://git.openjdk.java.net/jdk/pull/309/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=309&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8208257 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/309.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/309/head:pull/309 PR: https://git.openjdk.java.net/jdk/pull/309 From xliu at openjdk.java.net Tue Sep 22 20:28:00 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 22 Sep 2020 20:28:00 GMT Subject: Integrated: 8253392: remove PhaseCCP_DCE declaration In-Reply-To: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> References: <_IkVnn1okRZs98acNtvrIvXw3tEKUAtAseQCHAm0H7E=.354a8b7e-2e6c-4c7d-a7f5-9fcc50d89d83@github.com> Message-ID: <26qK6aASyy5TlyVdf8lSboU4Z8bR5jLQblQMFybnezk=.4c675352-60e2-42ce-9bcd-3c37a507dff4@github.com> On Mon, 21 Sep 2020 08:01:27 GMT, Xin Liu wrote: > hello, reviewers, > May I ask to review this trivial patch? > it's a clean-up. The forward declaration of PhaseCCP_DCE is not useful. This pull request has now been integrated. Changeset: 426c9049 Author: Xin Liu Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/426c9049 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8253392: remove PhaseCCP_DCE declaration remove the deprecated declaration PhaseCCP_DCE Reviewed-by: neliasso, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/277 From kvn at openjdk.java.net Tue Sep 22 20:37:47 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 22 Sep 2020 20:37:47 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 19:27:20 GMT, Roberto Casta?eda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Define 'StressSeed' option as 'uint' rather than 'uintx' Nice! Did you try to run mach5 testing with IGV stress enabled by using --jvm-args "" mach5 option? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/242 From enikitin at openjdk.java.net Tue Sep 22 20:40:23 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 22 Sep 2020 20:40:23 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:13:13 GMT, Evgeny Nikitin wrote: > Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). > > I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it > from ProblemList.txt. > Second change is marking the test with randomness keyword from the > [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). > Tested using mach5 on the 4 platforms, 50 runs each. A question by Igor Ignatyev: > looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. **The answer:** 8058176 is creating lots of i2/c2i adapters and eats the code cache very intensively. I added a code cache monitor to the test (0e99acf3efb28108d79780aa7825af9d24cd84b3), and it eats 8M of cache almost immediately: > 00:00.001 Cache Monitor started 00:00.009 Cache Monitor: 16% (1.3 MB) used 00:00.131 Cache Monitor: 20% (1.6 MB) used 00:00.232 Cache Monitor: 27% (2.2 MB) used 00:00.332 Cache Monitor: 36% (2.9 MB) used 00:00.434 Cache Monitor: 45% (3.6 MB) used 00:00.535 Cache Monitor: 60% (4.9 MB) used 00:00.637 Cache Monitor: 71% (5.7 MB) used 00:00.745 Cache Monitor: 79% (6.3 MB) used 00:00.847 Cache Monitor: 86% (7 MB) used 00:00.982 Cache Monitor: 92% (7.4 MB) used 00:01.084 Cache Monitor: 84% (6.8 MB) used 00:01.188 Cache Monitor: 89% (7.2 MB) used 00:01.292 Cache Monitor: 88% (7.1 MB) used 00:01.394 Cache Monitor: 89% (7.2 MB) used 00:01.496 Cache Monitor: 92% (7.4 MB) used 00:01.598 Cache Monitor: 96% (7.7 MB) used 00:01.700 Cache Monitor: 98% (7.9 MB) used >[1.865s][warning][codecache] CodeCache is full. Compiler has been disabled. >[1.865s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize= breakpointTheOtherStratum, in its turn, was hanging, I guess, because of some logical problems (debuggee hanged, debugger crashed, etc.). Not a performance/cache exhaustion. Please check similar output for the breakpointTheOtherStratum (b826254e5040a6af1305a2cd9242cd1069ff504a): > binder> Launching debugee 00:00.240 Cache Monitor: 24% (1.9 MB) used 00:00.341 Cache Monitor: 26% (2.1 MB) used binder> Waiting for VM initialized Initial VMStartEvent received: VMStartEvent in thread main 00:00.442 Cache Monitor: 29% (2.4 MB) used 00:00.543 Cache Monitor: 32% (2.6 MB) used 00:00.644 Cache Monitor: 33% (2.7 MB) used 00:00.745 Cache Monitor: 34% (2.7 MB) used 00:00.845 Cache Monitor: 35% (2.8 MB) used 00:00.946 Cache Monitor: 35% (2.8 MB) used 00:01.047 Cache Monitor: 33% (2.7 MB) used 00:01.149 Cache Monitor: 34% (2.7 MB) used 00:01.250 Cache Monitor: 34% (2.7 MB) used 00:01.351 Cache Monitor: 36% (3 MB) used 00:01.452 Cache Monitor: 39% (3.2 MB) used 00:01.554 Cache Monitor: 41% (3.3 MB) used 00:01.655 Cache Monitor: 43% (3.5 MB) used 00:01.756 Cache Monitor: 45% (3.6 MB) used 00:01.858 Cache Monitor: 47% (3.8 MB) used 00:01.959 Cache Monitor: 49% (4 MB) used debugee.stdout> ### TRACE 1: DEBUGGEE PASSED The second one does not eat memory that aggressively, the difference between lowest and highest amounts is just 2MB. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From iignatyev at openjdk.java.net Tue Sep 22 20:55:02 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 22 Sep 2020 20:55:02 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:37:53 GMT, Evgeny Nikitin wrote: >> Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). >> >> I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it >> from ProblemList.txt. >> Second change is marking the test with randomness keyword from the >> [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). >> Tested using mach5 on the 4 platforms, 50 runs each. > > A question by Igor Ignatyev: > >> looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. > > **The answer:** > > 8058176 is creating lots of i2/c2i adapters and eats the code cache very intensively. I added a code cache monitor to > the test (0e99acf3efb28108d79780aa7825af9d24cd84b3), and it eats 8M of cache almost immediately: >> 00:00.001 Cache Monitor started > 00:00.009 Cache Monitor: 16% (1.3 MB) used > 00:00.131 Cache Monitor: 20% (1.6 MB) used > 00:00.232 Cache Monitor: 27% (2.2 MB) used > 00:00.332 Cache Monitor: 36% (2.9 MB) used > 00:00.434 Cache Monitor: 45% (3.6 MB) used > 00:00.535 Cache Monitor: 60% (4.9 MB) used > 00:00.637 Cache Monitor: 71% (5.7 MB) used > 00:00.745 Cache Monitor: 79% (6.3 MB) used > 00:00.847 Cache Monitor: 86% (7 MB) used > 00:00.982 Cache Monitor: 92% (7.4 MB) used > 00:01.084 Cache Monitor: 84% (6.8 MB) used > 00:01.188 Cache Monitor: 89% (7.2 MB) used > 00:01.292 Cache Monitor: 88% (7.1 MB) used > 00:01.394 Cache Monitor: 89% (7.2 MB) used > 00:01.496 Cache Monitor: 92% (7.4 MB) used > 00:01.598 Cache Monitor: 96% (7.7 MB) used > 00:01.700 Cache Monitor: 98% (7.9 MB) used >>[1.865s][warning][codecache] CodeCache is full. Compiler has been disabled. >>[1.865s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize= > > breakpointTheOtherStratum, in its turn, was hanging, I guess, because of some logical problems (debuggee hanged, > debugger crashed, etc.). Not a performance/cache exhaustion. Please check similar output for the > breakpointTheOtherStratum (b826254e5040a6af1305a2cd9242cd1069ff504a): >> binder> Launching debugee > 00:00.240 Cache Monitor: 24% (1.9 MB) used > 00:00.341 Cache Monitor: 26% (2.1 MB) used > binder> Waiting for VM initialized > Initial VMStartEvent received: VMStartEvent in thread main > 00:00.442 Cache Monitor: 29% (2.4 MB) used > 00:00.543 Cache Monitor: 32% (2.6 MB) used > 00:00.644 Cache Monitor: 33% (2.7 MB) used > 00:00.745 Cache Monitor: 34% (2.7 MB) used > 00:00.845 Cache Monitor: 35% (2.8 MB) used > 00:00.946 Cache Monitor: 35% (2.8 MB) used > 00:01.047 Cache Monitor: 33% (2.7 MB) used > 00:01.149 Cache Monitor: 34% (2.7 MB) used > 00:01.250 Cache Monitor: 34% (2.7 MB) used > 00:01.351 Cache Monitor: 36% (3 MB) used > 00:01.452 Cache Monitor: 39% (3.2 MB) used > 00:01.554 Cache Monitor: 41% (3.3 MB) used > 00:01.655 Cache Monitor: 43% (3.5 MB) used > 00:01.756 Cache Monitor: 45% (3.6 MB) used > 00:01.858 Cache Monitor: 47% (3.8 MB) used > 00:01.959 Cache Monitor: 49% (4 MB) used > debugee.stdout> ### TRACE 1: DEBUGGEE PASSED > > The second one does not eat memory that aggressively, the difference between lowest and highest amounts is just 2MB. > A question by Igor Ignatyev: > > > looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. > > **The answer:** > > 8058176 is creating lots of i2/c2i adapters and eats the code cache very intensively. I added a code cache monitor to > the test ([0e99acf](https://github.com/openjdk/jdk/commit/0e99acf3efb28108d79780aa7825af9d24cd84b3)), and it eats 8M of > cache almost immediately: <...> > breakpointTheOtherStratum, in its turn, was hanging, I guess, because of some logical problems (debuggee hanged, > debugger crashed, etc.). Not a performance/cache exhaustion. Please check similar output for the > breakpointTheOtherStratum ([b826254](https://github.com/openjdk/jdk/commit/b826254e5040a6af1305a2cd9242cd1069ff504a)): > <...> the reason I suggested to close [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176) as CNR is b/c this test is problem-listed due to [8208257](https://bugs.openjdk.java.net/browse/JDK-8208257) and [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176). so the absence of the failure in your rerun would mean that both 8058176 and 8208257 can not be reproduced, hence both should be closed. I also would like to change my statement about changing 8208257's title, I think it would be better to restore the original title, open a new issue to unproblem-list the test, close both 8208257 and 8058176 as CNR. adding `randomness` k/w can be done as a part of this new issue or by a separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From iignatyev at openjdk.java.net Tue Sep 22 20:55:03 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 22 Sep 2020 20:55:03 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:50:27 GMT, Igor Ignatyev wrote: >> A question by Igor Ignatyev: >> >>> looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. >> >> **The answer:** >> >> 8058176 is creating lots of i2/c2i adapters and eats the code cache very intensively. I added a code cache monitor to >> the test (0e99acf3efb28108d79780aa7825af9d24cd84b3), and it eats 8M of cache almost immediately: >>> 00:00.001 Cache Monitor started >> 00:00.009 Cache Monitor: 16% (1.3 MB) used >> 00:00.131 Cache Monitor: 20% (1.6 MB) used >> 00:00.232 Cache Monitor: 27% (2.2 MB) used >> 00:00.332 Cache Monitor: 36% (2.9 MB) used >> 00:00.434 Cache Monitor: 45% (3.6 MB) used >> 00:00.535 Cache Monitor: 60% (4.9 MB) used >> 00:00.637 Cache Monitor: 71% (5.7 MB) used >> 00:00.745 Cache Monitor: 79% (6.3 MB) used >> 00:00.847 Cache Monitor: 86% (7 MB) used >> 00:00.982 Cache Monitor: 92% (7.4 MB) used >> 00:01.084 Cache Monitor: 84% (6.8 MB) used >> 00:01.188 Cache Monitor: 89% (7.2 MB) used >> 00:01.292 Cache Monitor: 88% (7.1 MB) used >> 00:01.394 Cache Monitor: 89% (7.2 MB) used >> 00:01.496 Cache Monitor: 92% (7.4 MB) used >> 00:01.598 Cache Monitor: 96% (7.7 MB) used >> 00:01.700 Cache Monitor: 98% (7.9 MB) used >>>[1.865s][warning][codecache] CodeCache is full. Compiler has been disabled. >>>[1.865s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize= >> >> breakpointTheOtherStratum, in its turn, was hanging, I guess, because of some logical problems (debuggee hanged, >> debugger crashed, etc.). Not a performance/cache exhaustion. Please check similar output for the >> breakpointTheOtherStratum (b826254e5040a6af1305a2cd9242cd1069ff504a): >>> binder> Launching debugee >> 00:00.240 Cache Monitor: 24% (1.9 MB) used >> 00:00.341 Cache Monitor: 26% (2.1 MB) used >> binder> Waiting for VM initialized >> Initial VMStartEvent received: VMStartEvent in thread main >> 00:00.442 Cache Monitor: 29% (2.4 MB) used >> 00:00.543 Cache Monitor: 32% (2.6 MB) used >> 00:00.644 Cache Monitor: 33% (2.7 MB) used >> 00:00.745 Cache Monitor: 34% (2.7 MB) used >> 00:00.845 Cache Monitor: 35% (2.8 MB) used >> 00:00.946 Cache Monitor: 35% (2.8 MB) used >> 00:01.047 Cache Monitor: 33% (2.7 MB) used >> 00:01.149 Cache Monitor: 34% (2.7 MB) used >> 00:01.250 Cache Monitor: 34% (2.7 MB) used >> 00:01.351 Cache Monitor: 36% (3 MB) used >> 00:01.452 Cache Monitor: 39% (3.2 MB) used >> 00:01.554 Cache Monitor: 41% (3.3 MB) used >> 00:01.655 Cache Monitor: 43% (3.5 MB) used >> 00:01.756 Cache Monitor: 45% (3.6 MB) used >> 00:01.858 Cache Monitor: 47% (3.8 MB) used >> 00:01.959 Cache Monitor: 49% (4 MB) used >> debugee.stdout> ### TRACE 1: DEBUGGEE PASSED >> >> The second one does not eat memory that aggressively, the difference between lowest and highest amounts is just 2MB. > >> A question by Igor Ignatyev: >> >> > looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. >> >> **The answer:** >> >> 8058176 is creating lots of i2/c2i adapters and eats the code cache very intensively. I added a code cache monitor to >> the test ([0e99acf](https://github.com/openjdk/jdk/commit/0e99acf3efb28108d79780aa7825af9d24cd84b3)), and it eats 8M of >> cache almost immediately: <...> >> breakpointTheOtherStratum, in its turn, was hanging, I guess, because of some logical problems (debuggee hanged, >> debugger crashed, etc.). Not a performance/cache exhaustion. Please check similar output for the >> breakpointTheOtherStratum ([b826254](https://github.com/openjdk/jdk/commit/b826254e5040a6af1305a2cd9242cd1069ff504a)): >> <...> > > the reason I suggested to close [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176) as CNR is b/c this test is > problem-listed due to [8208257](https://bugs.openjdk.java.net/browse/JDK-8208257) and > [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176). so the absence of the failure in your rerun would mean > that both 8058176 and 8208257 can not be reproduced, hence both should be closed. I also would like to change my > statement about changing 8208257's title, I think it would be better to restore the original title, open a new issue to > unproblem-list the test, close both 8208257 and 8058176 as CNR. adding `randomness` k/w can be done as a part of this > new issue or by a separate RFE. or are you saying that `breakpointOtherStratum` test is not (and most probably has never been) affected by [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176)? ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From enikitin at openjdk.java.net Tue Sep 22 21:09:56 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 22 Sep 2020 21:09:56 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:52:04 GMT, Igor Ignatyev wrote: > or are you saying that `breakpointOtherStratum` test is not (and most probably has never been) affected by > [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176)? Yep. Well, I am still not very confident in how the code cache works, but in case of i2c_c2i I can clearly see how the adapters are created and the memory gets claimed. The breakpointOtherStratum does something different, and uses the code cache modestly. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From iignatyev at openjdk.java.net Tue Sep 22 21:19:52 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 22 Sep 2020 21:19:52 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: <7EcKs05vx-5cjor80XI0ZUmy0lZmQVYHXSr9IamX_ks=.9d4bb849-20f5-405d-bb30-03dfe1364434@github.com> On Tue, 22 Sep 2020 21:06:48 GMT, Evgeny Nikitin wrote: > > or are you saying that `breakpointOtherStratum` test is not (and most probably has never been) affected by > > [8058176](https://bugs.openjdk.java.net/browse/JDK-8058176)? > > Yep. Well, I am still not very confident in how the code cache works, but in case of i2c_c2i I can clearly see how the > adapters are created and the memory gets claimed. The breakpointOtherStratum does something different, and uses the > code cache modestly. I see, when, although, I still think it would be somewhat cleaner to restore 8208257's title and use separate issue(s) to unproblem-list and add k/w, I wouldn't insist on that. the patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From iignatyev at openjdk.java.net Tue Sep 22 21:19:51 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 22 Sep 2020 21:19:51 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:13:13 GMT, Evgeny Nikitin wrote: > Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). > > I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it > from ProblemList.txt. > Second change is marking the test with randomness keyword from the > [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). > Tested using mach5 on the 4 platforms, 50 runs each. Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From github.com+4676506+javeleon at openjdk.java.net Wed Sep 23 06:35:12 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Wed, 23 Sep 2020 06:35:12 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Tue, 22 Sep 2020 10:51:16 GMT, Erik ?sterlund wrote: > > > I would like to hear answer to @dholmes-ora question in JBS: > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" > > > > > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? > > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something > we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I > would like to better understand the problem domain before taking this further. Thanks for your quick follow-up. 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for implementing part of the management API. 2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security implications from such an approach leaving any VM running anything Truffle more vulnerable? 3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From chagedorn at openjdk.java.net Wed Sep 23 07:34:05 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 23 Sep 2020 07:34:05 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 11:45:51 GMT, Roland Westrelin wrote: > Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html That looks good to me! As we have discussed offline, we should probably do the additional clean-ups of this code in a separate RFE. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/176 From roland at openjdk.java.net Wed Sep 23 07:34:06 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 23 Sep 2020 07:34:06 GMT Subject: RFR: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 07:30:24 GMT, Christian Hagedorn wrote: >> Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html > > That looks good to me! > > As we have discussed offline, we should probably do the additional clean-ups of this code in a separate RFE. @chhagedorn @neliasso Thanks for the reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/176 From github.com+8792647+robcasloz at openjdk.java.net Wed Sep 23 07:36:46 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 23 Sep 2020 07:36:46 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 20:35:05 GMT, Vladimir Kozlov wrote: > Nice! > > Did you try to run mach5 testing with IGV stress enabled by using --jvm-args "" mach5 option? Thanks Vladimir! I tried something similar (but hackier) on tier1 and all tests passed, will try a more systematic run with multiple seeds, etc. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From roland at openjdk.java.net Wed Sep 23 07:37:56 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 23 Sep 2020 07:37:56 GMT Subject: Integrated: 8252696: Loop unswitching may cause out of bound array load to be executed In-Reply-To: References: Message-ID: On Tue, 15 Sep 2020 11:45:51 GMT, Roland Westrelin wrote: > Review thread so far: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039853.html This pull request has now been integrated. Changeset: 3fe5886b Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/3fe5886b Stats: 24 lines in 3 files changed: 6 ins; 1 del; 17 mod 8252696: Loop unswitching may cause out of bound array load to be executed Reviewed-by: neliasso, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/176 From chagedorn at openjdk.java.net Wed Sep 23 10:45:03 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 23 Sep 2020 10:45:03 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: <0SnXiN756cLrgglHnxd7OuYOWtgljtB8yCEEsVI6YzA=.7da4aeca-af80-4ee7-a642-3606a0f4557a@github.com> On Tue, 22 Sep 2020 19:27:20 GMT, Roberto Casta?eda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Define 'StressSeed' option as 'uint' rather than 'uintx' Maybe you could add an additional HelloWorld test which only runs with your new flags to sanity check them without any other flags. src/hotspot/share/opto/phaseX.cpp line 1153: > 1151: DEBUG_ONLY(uint num_processed = 0;) > 1152: NOT_PRODUCT(init_verifyPhaseIterGVN();) > 1153: if (StressIGVN) C->shuffle(&_worklist); You should add curly braces. You could also move `shuffle` to `PhaseIterGVN`. src/hotspot/share/opto/compile.cpp line 4462: > 4460: void Compile::shuffle(Unique_Node_List* l) { > 4461: if (l->size() < 2) return; > 4462: for (uint i = l->size() - 1; i >= 1; i--) { You can remove the if-check as the loop check already covers it (loop is only executed if size >= 2). ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/242 From neliasso at openjdk.java.net Wed Sep 23 11:12:57 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 23 Sep 2020 11:12:57 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: <_0-zfIDPieC0Xnc17GaSSsS7Sz9EEUrfjRyqWDtphfU=.298bacde-f330-486a-8bea-03ff1523d00c@github.com> On Tue, 22 Sep 2020 15:33:11 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 >> bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized >> instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the >> perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types >> (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > This pull request is an incremental work over [pull request-61](https://github.com/openjdk/jdk/pull/61). > It shares leaf level assembler routines with it, to build the standalone patch following common code patch needs to be > applied on top of this PR. http://cr.openjdk.java.net/~jbhateja/8252848/common_code_with_8252847.diff Can you explain why 32 bytes are such a distinct performance cliff? Is there any performance difference between doing a single 64 bytes masked copy or two 32 bytes? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jzhu at openjdk.java.net Wed Sep 23 11:18:22 2020 From: jzhu at openjdk.java.net (Joshua Zhu) Date: Wed, 23 Sep 2020 11:18:22 GMT Subject: RFR: 8253048: AArch64: When CallLeaf, no need to preserve callee-saved registers in caller In-Reply-To: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> References: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> Message-ID: On Fri, 11 Sep 2020 12:23:30 GMT, Joshua Zhu wrote: > I noticed all Floating-Point and SIMD Registers are defined as SOC > registers in c calling convention on AArch64. > As AArch64 ABI tells, the bottom 64 bits of v8-v15 are callee-saved. > > When using CallRuntime, with the help of existing flag "exclude_soe" and > function add_call_kills(), SOE registers are killed by the call > because values that could show up in the RegisterMap aren't live in > callee saved registers. > But CallLeaf and CallLeafNoFP are ok because they don't have safepoint > and debug info. > > Therefore I submit this patch that aligns save-policy in c calling > convention with AArch64 ABI. It could help eliminate unnecessary SOE > registers spilling in caller across CallLeafNode. > > I wrote a simple test case: > http://cr.openjdk.java.net/~jzhu/8253048/Test.java > Original OptoAssembly is: > http://cr.openjdk.java.net/~jzhu/8253048/old_OptoAssembly > With the patch, unnecessary spillings are eliminated: > http://cr.openjdk.java.net/~jzhu/8253048/new_OptoAssembly > > And when a vector is alive across CallLeaf, with the help of existing > FatProjectionNode and RA, the whole vector register ( length > 64-bit ) > is still spilled to stack as usual. > > A test case using VectorAPI is written to verify: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector.java > Test patch: > http://cr.openjdk.java.net/~jzhu/8253048/patch > OptoAssembly dump: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector_OptoAssembly > > I also searched all occurrences of "V8-V15" in aarch64 codes. > The stubs for sin/cos don't save v10 before usage. > Therefore I replace it with caller-save register v24. > > Jtreg Testing: hotspot_all_no_apps, jdk_core and langtools:tier1 > > Could you please help review this change? > > Best Regards, > Joshua > > --------- Gentle ping. May I ask reviewers to approve this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/129 From adinn at openjdk.java.net Wed Sep 23 11:23:34 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 23 Sep 2020 11:23:34 GMT Subject: RFR: 8253048: AArch64: When CallLeaf, no need to preserve callee-saved registers in caller In-Reply-To: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> References: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> Message-ID: On Fri, 11 Sep 2020 12:23:30 GMT, Joshua Zhu wrote: > I noticed all Floating-Point and SIMD Registers are defined as SOC > registers in c calling convention on AArch64. > As AArch64 ABI tells, the bottom 64 bits of v8-v15 are callee-saved. > > When using CallRuntime, with the help of existing flag "exclude_soe" and > function add_call_kills(), SOE registers are killed by the call > because values that could show up in the RegisterMap aren't live in > callee saved registers. > But CallLeaf and CallLeafNoFP are ok because they don't have safepoint > and debug info. > > Therefore I submit this patch that aligns save-policy in c calling > convention with AArch64 ABI. It could help eliminate unnecessary SOE > registers spilling in caller across CallLeafNode. > > I wrote a simple test case: > http://cr.openjdk.java.net/~jzhu/8253048/Test.java > Original OptoAssembly is: > http://cr.openjdk.java.net/~jzhu/8253048/old_OptoAssembly > With the patch, unnecessary spillings are eliminated: > http://cr.openjdk.java.net/~jzhu/8253048/new_OptoAssembly > > And when a vector is alive across CallLeaf, with the help of existing > FatProjectionNode and RA, the whole vector register ( length > 64-bit ) > is still spilled to stack as usual. > > A test case using VectorAPI is written to verify: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector.java > Test patch: > http://cr.openjdk.java.net/~jzhu/8253048/patch > OptoAssembly dump: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector_OptoAssembly > > I also searched all occurrences of "V8-V15" in aarch64 codes. > The stubs for sin/cos don't save v10 before usage. > Therefore I replace it with caller-save register v24. > > Jtreg Testing: hotspot_all_no_apps, jdk_core and langtools:tier1 > > Could you please help review this change? > > Best Regards, > Joshua > > --------- Marked as reviewed by adinn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/129 From thartmann at openjdk.java.net Wed Sep 23 11:30:30 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 23 Sep 2020 11:30:30 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v4] In-Reply-To: References: Message-ID: <7YEM0R_L8fMtxRESD7O79pvhoaXHIRaXeCShUoP88tc=.6ae93e8b-020a-4b95-a03a-90358ae2099a@github.com> On Tue, 22 Sep 2020 13:31:05 GMT, Roberto Casta?eda Lozano wrote: >> Remove unused notion of "PhiNode-to-copy degradation", where PhiNodes can be degraded to copies by setting their >> RegionNode to NULL. Remove corresponding `PhiNode::is_copy()` test, which always returned NULL (false). Assert that >> PhiNodes have an associated RegionNode in `PhiNode::Ideal()`. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Clean up unused PhiNode-to-copy degradation > > Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be > degraded to copies by setting their RegionNode to NULL. Remove corresponding > PhiNode::is_copy() test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in PhiNode::Ideal(). Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/275 From roland at openjdk.java.net Wed Sep 23 11:02:04 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 23 Sep 2020 11:02:04 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching Message-ID: During unswitching, PhaseIdealLoop::create_slow_version_of_loop() calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one for each loops, to clone some predicates above each loop. That code is fragile as it (implicitly) requires the fast loop to be processed first. I propose calling PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time and have it handle both loops in a single pass. ------------- Commit messages: - remove trailing whitespaces - cleanup suggested by Christian - refactoring cloning of predicates when unswitching Changes: https://git.openjdk.java.net/jdk/pull/317/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253524 Stats: 113 lines in 3 files changed: 20 ins; 49 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/317.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/317/head:pull/317 PR: https://git.openjdk.java.net/jdk/pull/317 From thartmann at openjdk.java.net Wed Sep 23 11:44:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 23 Sep 2020 11:44:10 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 19:27:20 GMT, Roberto Casta?eda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Define 'StressSeed' option as 'uint' rather than 'uintx' We should think about adding the Stress* flags to some tier in the CI. src/hotspot/share/opto/c2_globals.hpp line 55: > 53: "Randomize worklist traversal in IGVN") \ > 54: \ > 55: product(bool, GenerateStressSeed, false, DIAGNOSTIC, \ Is this flag really required? We could simply generate the seed if StressSeed has not been specified on the command line (see `FLAG_IS_DEFAULT` macro). src/hotspot/share/opto/node.cpp line 2333: > 2331: > 2332: //----------------------------------------------------------------------------- > 2333: void Node_Array::swap(uint i, uint j) { You can use the swap method from globalDefinitions.hpp. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/242 From jzhu at openjdk.java.net Wed Sep 23 11:53:51 2020 From: jzhu at openjdk.java.net (Joshua Zhu) Date: Wed, 23 Sep 2020 11:53:51 GMT Subject: RFR: 8253048: AArch64: When CallLeaf, no need to preserve callee-saved registers in caller In-Reply-To: References: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> Message-ID: On Wed, 23 Sep 2020 11:21:09 GMT, Andrew Dinn wrote: >> I noticed all Floating-Point and SIMD Registers are defined as SOC >> registers in c calling convention on AArch64. >> As AArch64 ABI tells, the bottom 64 bits of v8-v15 are callee-saved. >> >> When using CallRuntime, with the help of existing flag "exclude_soe" and >> function add_call_kills(), SOE registers are killed by the call >> because values that could show up in the RegisterMap aren't live in >> callee saved registers. >> But CallLeaf and CallLeafNoFP are ok because they don't have safepoint >> and debug info. >> >> Therefore I submit this patch that aligns save-policy in c calling >> convention with AArch64 ABI. It could help eliminate unnecessary SOE >> registers spilling in caller across CallLeafNode. >> >> I wrote a simple test case: >> http://cr.openjdk.java.net/~jzhu/8253048/Test.java >> Original OptoAssembly is: >> http://cr.openjdk.java.net/~jzhu/8253048/old_OptoAssembly >> With the patch, unnecessary spillings are eliminated: >> http://cr.openjdk.java.net/~jzhu/8253048/new_OptoAssembly >> >> And when a vector is alive across CallLeaf, with the help of existing >> FatProjectionNode and RA, the whole vector register ( length > 64-bit ) >> is still spilled to stack as usual. >> >> A test case using VectorAPI is written to verify: >> http://cr.openjdk.java.net/~jzhu/8253048/TestVector.java >> Test patch: >> http://cr.openjdk.java.net/~jzhu/8253048/patch >> OptoAssembly dump: >> http://cr.openjdk.java.net/~jzhu/8253048/TestVector_OptoAssembly >> >> I also searched all occurrences of "V8-V15" in aarch64 codes. >> The stubs for sin/cos don't save v10 before usage. >> Therefore I replace it with caller-save register v24. >> >> Jtreg Testing: hotspot_all_no_apps, jdk_core and langtools:tier1 >> >> Could you please help review this change? >> >> Best Regards, >> Joshua >> >> --------- > > Marked as reviewed by adinn (Reviewer). Aph and Adinn, Thanks a lot for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/129 From chagedorn at openjdk.java.net Wed Sep 23 12:01:39 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 23 Sep 2020 12:01:39 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 08:30:45 GMT, Roland Westrelin wrote: > During unswitching, PhaseIdealLoop::create_slow_version_of_loop() > calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one > for each loops, to clone some predicates above each loop. That code is > fragile as it (implicitly) requires the fast loop to be processed > first. I propose calling > PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time > and have it handle both loops in a single pass. Very nice clean-up! This makes it much easier. Apart from some minor code style comments, it looks good to me! src/hotspot/share/opto/loopPredicate.cpp line 233: > 231: // 'new_predicate_proj' and rewires the control edges of data nodes in > 232: // the loop from the old predicates to the new cloned predicates. > 233: void PhaseIdealLoop::clone_skeleton_predicates_to_unswitched_loop(IdealLoopTree *loop, const Node_List &old_new, > Deoptimization::DeoptReason reason, Asterisk: `IdealLoopTree*` src/hotspot/share/opto/loopnode.hpp line 1438: > 1436: void clone_skeleton_predicates_to_unswitched_loop(IdealLoopTree *loop, const Node_List &old_new, > Deoptimization::DeoptReason reason, 1437: ProjNode* old_predicate_proj, ProjNode* > iffast, ProjNode* ifslow); 1438: void check_created_predicate_for_unswitching(const Node *new_entry) const > PRODUCT_RETURN; Move the asterisks to the types. src/hotspot/share/opto/loopPredicate.cpp line 350: > 348: > 349: #ifndef PRODUCT > 350: void PhaseIdealLoop::check_created_predicate_for_unswitching(const Node *new_entry) const { Asterisk: `Node*` ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Wed Sep 23 12:14:01 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 23 Sep 2020 12:14:01 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: > During unswitching, PhaseIdealLoop::create_slow_version_of_loop() > calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one > for each loops, to clone some predicates above each loop. That code is > fragile as it (implicitly) requires the fast loop to be processed > first. I propose calling > PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time > and have it handle both loops in a single pass. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: code style ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/317/files - new: https://git.openjdk.java.net/jdk/pull/317/files/2e26a7cb..5ab947eb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/317.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/317/head:pull/317 PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Wed Sep 23 12:14:10 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 23 Sep 2020 12:14:10 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 11:51:43 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> code style > > src/hotspot/share/opto/loopPredicate.cpp line 350: > >> 348: >> 349: #ifndef PRODUCT >> 350: void PhaseIdealLoop::check_created_predicate_for_unswitching(const Node *new_entry) const { > > Asterisk: `Node*` @chhagedorn thanks for the review (and offline comments). Should be fixed now. ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From eosterlund at openjdk.java.net Wed Sep 23 12:22:48 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 23 Sep 2020 12:22:48 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Wed, 23 Sep 2020 06:32:14 GMT, Allan Gregersen wrote: > > > > I would like to hear answer to @dholmes-ora question in JBS: > > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" > > > > > > > > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current > > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the > > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it > > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than > > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other > > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? > > > > > > > > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not > > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so > > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual > > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging > > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in > > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be > > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives > > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read > > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something > > we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But > > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I > > would like to better understand the problem domain before taking this further. > > Thanks for your quick follow-up. > > 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing > Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for > implementing part of the management API. > > 2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include > iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that > whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI > approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security > implications from such an approach leaving any VM running anything Truffle more vulnerable? > > 3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that. So I understand it, you really have 2 cases: 1) Using the debugger 2) To support other APIs that need a stack trace So if you use JVMTI for the debugging (like everybody else), that seems to be a solved problem. As for the second use case, I hope you can use java.lang.StackWalker? It should give you all the info you could dream of. If you can't use all classes in java.lang.* then I fear that you are in a lot of trouble using HotSpot in general. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From chagedorn at openjdk.java.net Wed Sep 23 13:03:40 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 23 Sep 2020 13:03:40 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 12:14:01 GMT, Roland Westrelin wrote: >> During unswitching, PhaseIdealLoop::create_slow_version_of_loop() >> calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one >> for each loops, to clone some predicates above each loop. That code is >> fragile as it (implicitly) requires the fast loop to be processed >> first. I propose calling >> PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time >> and have it handle both loops in a single pass. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > code style Looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/317 From github.com+8792647+robcasloz at openjdk.java.net Wed Sep 23 13:07:21 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 23 Sep 2020 13:07:21 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v4] In-Reply-To: <7YEM0R_L8fMtxRESD7O79pvhoaXHIRaXeCShUoP88tc=.6ae93e8b-020a-4b95-a03a-90358ae2099a@github.com> References: <7YEM0R_L8fMtxRESD7O79pvhoaXHIRaXeCShUoP88tc=.6ae93e8b-020a-4b95-a03a-90358ae2099a@github.com> Message-ID: <92rYdXESAAB4UgBrFIGvERu0bL7SS8bJ7wTx19I40yA=.7423994e-b6a9-4b5c-933b-a593a5e18142@github.com> On Wed, 23 Sep 2020 11:27:24 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Wed Sep 23 14:20:36 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 23 Sep 2020 14:20:36 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v4] In-Reply-To: References: Message-ID: <3UU_QiZMIvAkgi4cOL4EiP43MVzLTUC2piCwoI9JcC8=.8977c043-3694-4071-83ab-9b613d45056d@github.com> > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Add basic sanity test for stress IGVN options - Fix typo - Move shuffle() to PhaseIterGVN ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/242/files - new: https://git.openjdk.java.net/jdk/pull/242/files/e1131852..829b1a1a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=02-03 Stats: 74 lines in 7 files changed: 58 ins; 10 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Wed Sep 23 14:20:41 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 23 Sep 2020 14:20:41 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: <0SnXiN756cLrgglHnxd7OuYOWtgljtB8yCEEsVI6YzA=.7da4aeca-af80-4ee7-a642-3606a0f4557a@github.com> References: <0SnXiN756cLrgglHnxd7OuYOWtgljtB8yCEEsVI6YzA=.7da4aeca-af80-4ee7-a642-3606a0f4557a@github.com> Message-ID: <9Tp1dmoKbJKhlAjMhxiBujaWUU0GyrazvEXFAGXLAFY=.e0f02e68-72a0-4c1b-ba32-4c359c678138@github.com> On Wed, 23 Sep 2020 10:42:15 GMT, Christian Hagedorn wrote: > Maybe you could add an additional HelloWorld test which only runs with your new flags to sanity check them without any > other flags. Good idea, I just did that. > src/hotspot/share/opto/phaseX.cpp line 1153: > >> 1151: DEBUG_ONLY(uint num_processed = 0;) >> 1152: NOT_PRODUCT(init_verifyPhaseIterGVN();) >> 1153: if (StressIGVN) C->shuffle(&_worklist); > > You should add curly braces. You could also move `shuffle` to `PhaseIterGVN`. Done. > src/hotspot/share/opto/compile.cpp line 4462: > >> 4460: void Compile::shuffle(Unique_Node_List* l) { >> 4461: if (l->size() < 2) return; >> 4462: for (uint i = l->size() - 1; i >= 1; i--) { > > You can remove the if-check as the loop check already covers it (loop is only executed if size >= 2). Note that `size()` could be 0, leading `l->size() - 1` to underflow. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From kvn at openjdk.java.net Wed Sep 23 15:28:49 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 23 Sep 2020 15:28:49 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: <_0q3ksgDD2Wpu-a_aCt1zVOn6s7057srj3pPtc-Ro0I=.f3cafd74-a522-40b9-bb80-7006695ae750@github.com> On Wed, 23 Sep 2020 11:36:12 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Define 'StressSeed' option as 'uint' rather than 'uintx' > > src/hotspot/share/opto/c2_globals.hpp line 55: > >> 53: "Randomize worklist traversal in IGVN") \ >> 54: \ >> 55: product(bool, GenerateStressSeed, false, DIAGNOSTIC, \ > > Is this flag really required? We could simply generate the seed if StressSeed has not been specified on the command > line (see `FLAG_IS_DEFAULT` macro). Agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From jbhateja at openjdk.java.net Wed Sep 23 15:33:50 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 23 Sep 2020 15:33:50 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: <_0-zfIDPieC0Xnc17GaSSsS7Sz9EEUrfjRyqWDtphfU=.298bacde-f330-486a-8bea-03ff1523d00c@github.com> References: <_0-zfIDPieC0Xnc17GaSSsS7Sz9EEUrfjRyqWDtphfU=.298bacde-f330-486a-8bea-03ff1523d00c@github.com> Message-ID: On Wed, 23 Sep 2020 11:09:25 GMT, Nils Eliasson wrote: > Can you explain why 32 bytes are such a distinct performance cliff? > > Is there any performance difference between doing a single 64 bytes masked copy or two 32 bytes? Hi Nils, Copy for sizes <= 32 bytes can be done using one YMM register, AVX-512 vector length extension allows masked instructions to operate on YMM and XMM registers. Using newly added flag -XX:ArrayCopyPartialInlineSize=64 one can perform in-lining up to 64 bytes but since it will use a ZMM register CPU will operate at a lower frequency but it could still give better performance depending on the application. A single 64 byte masked copy may have a performance hit if for majority of the application runtime, CPU operates at highest frequency. There is a switchover penalty from higher frequency level to lower frequency level along with some hysteresis which forces subsequent instructions to operate a lower frequency for some cycles. Current implementation has been kept simple to avoid emitting too many instruction at call site considering arraycopy is a very high frequency operation. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From chagedorn at openjdk.java.net Wed Sep 23 16:51:04 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 23 Sep 2020 16:51:04 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: <9Tp1dmoKbJKhlAjMhxiBujaWUU0GyrazvEXFAGXLAFY=.e0f02e68-72a0-4c1b-ba32-4c359c678138@github.com> References: <0SnXiN756cLrgglHnxd7OuYOWtgljtB8yCEEsVI6YzA=.7da4aeca-af80-4ee7-a642-3606a0f4557a@github.com> <9Tp1dmoKbJKhlAjMhxiBujaWUU0GyrazvEXFAGXLAFY=.e0f02e68-72a0-4c1b-ba32-4c359c678138@github.com> Message-ID: On Wed, 23 Sep 2020 14:17:25 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/compile.cpp line 4462: >> >>> 4460: void Compile::shuffle(Unique_Node_List* l) { >>> 4461: if (l->size() < 2) return; >>> 4462: for (uint i = l->size() - 1; i >= 1; i--) { >> >> You can remove the if-check as the loop check already covers it (loop is only executed if size >= 2). > > Note that `size()` could be 0, leading `l->size() - 1` to underflow. You're right, it's `uint` and not `int` - then you can leave it as it is. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From kvn at openjdk.java.net Wed Sep 23 17:23:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 23 Sep 2020 17:23:02 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 12:14:01 GMT, Roland Westrelin wrote: >> During unswitching, PhaseIdealLoop::create_slow_version_of_loop() >> calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one >> for each loops, to clone some predicates above each loop. That code is >> fragile as it (implicitly) requires the fast loop to be processed >> first. I propose calling >> PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time >> and have it handle both loops in a single pass. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > code style Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From github.com+4346215+chumer at openjdk.java.net Wed Sep 23 18:22:25 2020 From: github.com+4346215+chumer at openjdk.java.net (Christian Humer) Date: Wed, 23 Sep 2020 18:22:25 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Wed, 23 Sep 2020 12:19:52 GMT, Erik ?sterlund wrote: >>> > > I would like to hear answer to @dholmes-ora question in JBS: >>> > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" >>> > >>> > >>> > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current >>> > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the >>> > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it >>> > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than >>> > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other >>> > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? >>> >>> 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not >>> sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so >>> much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual >>> performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging >>> Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in >>> JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be >>> limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives >>> you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read >>> locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something >>> we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But >>> so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I >>> would like to better understand the problem domain before taking this further. >> >> Thanks for your quick follow-up. >> >> 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing >> Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for >> implementing part of the management API. 2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when >> the decision to implement and include iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that >> decision. I was assuming that whatever reason not to go with JVMTI back then would still hold true today. So say we >> wanted to adopt the JVMTI approach now. Would the design be an in-process always on and in-process JVMTI agent? Would >> there be security implications from such an approach leaving any VM running anything Truffle more vulnerable? 3. No >> need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that. > >> > > > I would like to hear answer to @dholmes-ora question in JBS: >> > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" >> > > >> > > >> > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current >> > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the >> > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it >> > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than >> > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other >> > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? >> > >> > >> > >> > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not >> > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so >> > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual >> > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging >> > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in >> > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be >> > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives >> > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read >> > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something >> > we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But >> > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I >> > would like to better understand the problem domain before taking this further. >> >> Thanks for your quick follow-up. >> >> 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing >> Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for >> implementing part of the management API. >> >> 2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include >> iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that >> whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI >> approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security >> implications from such an approach leaving any VM running anything Truffle more vulnerable? >> >> 3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that. > > So I understand it, you really have 2 cases: > 1) Using the debugger > 2) To support other APIs that need a stack trace > > So if you use JVMTI for the debugging (like everybody else), that seems to be a solved problem. > As for the second use case, I hope you can use java.lang.StackWalker? It should give you all the info you could dream > of. If you can't use all classes in java.lang.* then I fear that you are in a lot of trouble using HotSpot in general. Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If we could have gone a different route we would have. Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs. In Truffle we use the notion of guest and host stack frames. Guest stack frames represent a method activation in the guest language and host stack frames represent host Java method activations. A guest stack frame entry consists of the call location (Node) and guest frame (VirtualFrame) that contains the guest local variables. Truffle languages need to access guest frames of the current thread to construct a stack trace or to lazily access variables in a parent guest frame. There are two techniques to do this: 1. Have a separate stack data structure on the heap that keeps track of the guest frames for each thread. 2. Walk the alive local variables in Java host frames to access the guest frame and node call location. We use the technique (1) to implement Truffle guest stack traces on a JVM without JVMCI support. This is pretty simple and allows us to walk the guest stack for any thread we need. But, there are downsides with this though: * For each method invocation we have additional overhead for writing the external data structure. * The frame always escapes the current compilation scope and can therefore not be escape analyzed by Graal. Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized VMs, so technique (1) is not good enough. JVMCI exposes stack walking APIs for Truffle that allows us to access the host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from the host frames from certain known and alive local variables. We also have special logic to reconstruct read-only guest frames from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code. We are using the technique (2) successfully for many years, but now with the growing maturity of Truffle we have new requirements: 1. We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is needed to allow languages to walk all alive objects (e.g. Ruby needs that) and to compute the retained size of a truffle guest language context. 2. We need to be able to read locals from other threads to produce the guest stack trace of other threads in the Truffle debugger. This was not a big issue before, because we were mostly dealing with single-threaded languages (JavaScript). The Truffle debugger should not be confused with the Java host debugger. The Truffle debugger works based on the Truffle instrumentation framework and cannot debug Java host code. It only shows guest stack frames and statements and is entirely agnostic to which Java methods were used to implement it and on which Java VM it runs on. It is entirely built with Java, without the use of JVMTI, this allows us to debug guest code without having the Java debugger attached. It allows us to on-demand enable debugging in a production scenario when it is needed and only for a guest context that needs it without slowing down others (e.g. in an app server). Truffle debugging works on SubstrateVM (native-image) which has currently no support for JVMTI. Enabling and not using the debugger also comes without any peak performance overhead (some memory and warmup overhead). To summarize: 1. We cannot use the StackWalker API as it does not allow us to access the local variables we need. 2. We cannot manually push/pop guest language frames, as this would be too bad for performance. 3. We cannot use JVMTI because: 3a. We need it to implement language features, not just debugger features. 3b. There is no way to enable it on demand for an individual guest application (we run multiple guest applications per host VM). 3c. Using JVMTI would slow down the host VM. Therefore our best idea was to introduce this new JVMCI API. We are of course open to other suggestions, if they solve our problem. This is also not an entirely new feature, this PR is an extension to the existing JVMCI functionality to walk the stack frames with local variable access. I hope these clarifications were helpful. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From eosterlund at openjdk.java.net Wed Sep 23 18:57:02 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 23 Sep 2020 18:57:02 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: On Wed, 23 Sep 2020 18:16:11 GMT, Christian Humer wrote: >>> > > > I would like to hear answer to @dholmes-ora question in JBS: >>> > > > "Do we really need yet another stack dumping interface in the VM? Why isn't a debugger using JVM TI?" >>> > > >>> > > >>> > > One reason for having both the new getStackFrames API (set of threads) as well as the existing iterateFrames (current >>> > > thread only) API in JVMCI is that Truffle would want a deopt-free read-only view of the values in a frame, which to the >>> > > best of our knowledge is not possible through JVMTI. Only in rare cases, materialization of frames is required, so it >>> > > boils down to the performance hit caused by deopting frames, which is even more of a concern with a set of threads than >>> > > for the single current thread case. Another potential issue with a JVMTI-based approach is that there might be other >>> > > drawbacks to having an always-on (or even late attached) JVMTI agent in a GraalVM? >>> > >>> > >>> > >>> > 1. You are describing that the main reason is performance. But you also say this is to be used by a debugger? So, not >>> > sure performance as a primary motive really makes sense then. Not sure why performance of debugging Truffle must be so >>> > much faster than debugging Java code (which I have not heard anyone complain about). And if this really was an actual >>> > performance problem, it seems like we would want a generic fix then, not a special Truffle stack walker for debugging >>> > Truffle code alone, to be maintained separately. 2. We are talking about JVMTI, not JVMCI. iterateFrames is defined in >>> > JVMCI, and that is something completely different, which I don't think any of us had in mind. It seems indeed to be >>> > limited to the current frame. I'm talking about e.g. JVMTI GetStackTrace and the JVMTI GetLocal* functions. It gives >>> > you a stack trace for any thread (not just the current one), and allows you to retrieve locals. 3. When you just read >>> > locals, (as you describe is your use case), there is no need to deoptimize anything. So yeah, that's just not something >>> > we do, unless you change the locals, which you said you are not. Please let me know if there is anything I missed. But >>> > so far it seems to me that the mentioned JVMTI functionality is all you really need for a debugger. What did I miss? I >>> > would like to better understand the problem domain before taking this further. >>> >>> Thanks for your quick follow-up. >>> >>> 1. It's not entirely made for the debugger use-case. For example in some guest languages we need this for implementing >>> Thread#getStackTrace or similar. In Espresso (Java as a Truffle guest language) we would need this also for >>> implementing part of the management API. >>> >>> 2. I know that you suggested JVMTI and no JVMCI. Since I wasn't around when the decision to implement and include >>> iterateFrames into JVMCI was made, I'm unaware of the exact reasoning behind that decision. I was assuming that >>> whatever reason not to go with JVMTI back then would still hold true today. So say we wanted to adopt the JVMTI >>> approach now. Would the design be an in-process always on and in-process JVMTI agent? Would there be security >>> implications from such an approach leaving any VM running anything Truffle more vulnerable? >>> >>> 3. No need for deoptimize anything when reading locals through JVMTI is good. Thanks for clarifying that. >> >> So I understand it, you really have 2 cases: >> 1) Using the debugger >> 2) To support other APIs that need a stack trace >> >> So if you use JVMTI for the debugging (like everybody else), that seems to be a solved problem. >> As for the second use case, I hope you can use java.lang.StackWalker? It should give you all the info you could dream >> of. If you can't use all classes in java.lang.* then I fear that you are in a lot of trouble using HotSpot in general. > > Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If > we could have gone a different route we would have. > Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is > either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on > top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle > uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first > Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs. In Truffle we use the notion > of guest and host stack frames. Guest stack frames represent a method activation in the guest language and host stack > frames represent host Java method activations. A guest stack frame entry consists of the call location (Node) and guest > frame (VirtualFrame) that contains the guest local variables. Truffle languages need to access guest frames of the > current thread to construct a stack trace or to lazily access variables in a parent guest frame. There are two > techniques to do this: 1. Have a separate stack data structure on the heap that keeps track of the guest frames for > each thread. 2. Walk the alive local variables in Java host frames to access the guest frame and node call location. > We use the technique (1) to implement Truffle guest stack traces on a JVM without JVMCI support. This is pretty simple > and allows us to walk the guest stack for any thread we need. But, there are downsides with this though: > * For each method invocation we have additional overhead for writing the external data structure. > * The frame always escapes the current compilation scope and can therefore not be escape analyzed by Graal. > > Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized > VMs, so technique (1) is not good enough. JVMCI exposes stack walking APIs for Truffle that allows us to access the > host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from the host > frames from certain known and alive local variables. We also have special logic to reconstruct read-only guest frames > from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code. We are using the > technique (2) successfully for many years, but now with the growing maturity of Truffle we have new requirements: 1. > We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is > needed to allow languages to walk all alive objects (e.g. Ruby needs that) and to compute the retained size of a > truffle guest language context. 2. We need to be able to read locals from other threads to produce the guest stack > trace of other threads in the Truffle debugger. This was not a big issue before, because we were mostly dealing with > single-threaded languages (JavaScript). The Truffle debugger should not be confused with the Java host debugger. The > Truffle debugger works based on the Truffle instrumentation framework and cannot debug Java host code. It only shows > guest stack frames and statements and is entirely agnostic to which Java methods were used to implement it and on which > Java VM it runs on. It is entirely built with Java, without the use of JVMTI, this allows us to debug guest code > without having the Java debugger attached. It allows us to on-demand enable debugging in a production scenario when it > is needed and only for a guest context that needs it without slowing down others (e.g. in an app server). Truffle > debugging works on SubstrateVM (native-image) which has currently no support for JVMTI. Enabling and not using the > debugger also comes without any peak performance overhead (some memory and warmup overhead). To summarize: 1. We > cannot use the StackWalker API as it does not allow us to access the local variables we need. 2. We cannot manually > push/pop guest language frames, as this would be too bad for performance. 3. We cannot use JVMTI because: 3a. We need > it to implement language features, not just debugger features. 3b. There is no way to enable it on demand for an > individual guest application (we run multiple guest applications per host VM). 3c. Using JVMTI would slow down the > host VM. Therefore our best idea was to introduce this new JVMCI API. We are of course open to other suggestions, if > they solve our problem. This is also not an entirely new feature, this PR is an extension to the existing JVMCI > functionality to walk the stack frames with local variable access. I hope these clarifications were helpful. java.lang.StackWalker does expose locals as well. What am I missing? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From dnsimon at openjdk.java.net Wed Sep 23 19:31:37 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 23 Sep 2020 19:31:37 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: <-oU_AT5DuFwfL-rgxqGIrN7pAL0NPWzlrgPewnfTCyw=.6a50f1a0-a089-47b7-9687-68da101d5206@github.com> On Wed, 23 Sep 2020 18:54:33 GMT, Erik ?sterlund wrote: >> Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If >> we could have gone a different route we would have. >> Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is >> either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on >> top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle >> uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first >> Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs. In Truffle we use the notion >> of guest and host stack frames. Guest stack frames represent a method activation in the guest language and host stack >> frames represent host Java method activations. A guest stack frame consists of the caller location and guest local >> variables. Truffle languages need to access guest frames of the current thread to construct a stack trace or to lazily >> access variables in a parent guest frame. There are two techniques to do this: 1. Have a separate stack data structure >> on the heap that keeps track of the guest frames for each thread. 2. Walk the live local variables in Java host frames >> to access the guest frame and caller location. We use the technique (1) to implement Truffle guest stack traces on a >> JVM without JVMCI support. This is pretty simple and allows us to walk the guest stack for any thread we need. But, >> there are downsides with this: >> * For each method invocation we have additional overhead for maintaining the extra heap data structure. >> * The frame object always escapes the current compilation scope and can therefore not be escape analyzed by Graal. >> >> Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized >> VMs, so technique (1) is not good enough. JVMCI currently exposes stack walking APIs for Truffle that allows us to >> access the host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from >> the host frames from certain known and live local variables. We also have special logic to reconstruct read-only guest >> frames from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code. We are using >> the technique (2) successfully for many years, but now with the growing maturity of Truffle we have new requirements: >> 1. We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is >> needed to allow languages to walk all live objects (e.g. Ruby needs that) and to compute the size of the live objects >> of a truffle language. 2. We need to be able to read locals from other threads to produce the guest stack trace of >> other threads in the Truffle debugger. This was not a big issue before, because we were mostly dealing with >> single-threaded languages (JavaScript). The Truffle debugger should not be confused with the Java host debugger. The >> Truffle debugger works based on the Truffle instrumentation framework and cannot debug Java host code. It only shows >> guest stack frames and statements and is entirely agnostic to which Java methods were used to implement it and on which >> Java VM it runs on. It is entirely built with Java, without the use of JVMTI, this allows us to debug guest code >> without having the Java debugger attached. It allows us to on-demand enable debugging in a production scenario when it >> is needed and only for a guest language instance that needs it without slowing down other code in host VM (e.g. in an >> app server). Truffle debugging works on SubstrateVM (native-image) which has currently no support for JVMTI. Enabling >> and not using the debugger also comes without any peak performance overhead (some memory and warmup overhead). To >> summarize: 1. We cannot use the StackWalker API as it does not allow us to access local variables. 2. We cannot >> manually allocate extra objects for guest language frames, as this would hurt performance. 3. We cannot use JVMTI >> because: 3a. We need it to implement language features, not just debugger features. 3b. There is no way to enable it on >> demand for an individual guest application (we run multiple guest applications per host VM). 3c. There is no Java API >> that allows it be used in the same process. Therefore our best idea was to introduce this new JVMCI API. We are of >> course open to other suggestions, if they solve our problem. This is also not an entirely new feature, this PR is an >> extension to the existing JVMCI functionality to walk the stack frames with local variable access. I hope these >> clarifications were helpful. > > java.lang.StackWalker does expose locals as well. What am I missing? It seems as though the `java.lang.LiveStackFrame` and `java.lang.LiveStackFrameInfo` classes are not public. However, it may indeed provide what's needed for the Truffle use cases. Do you know how/where this internal interface is used currently? In any case, the Truffle team will investigate further before coming back to this pull request. Thanks for bringing up what may be a much simpler solution @fisk . ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From dnsimon at openjdk.java.net Wed Sep 23 19:56:49 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 23 Sep 2020 19:56:49 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> Message-ID: <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8XGxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> On Wed, 23 Sep 2020 18:54:33 GMT, Erik ?sterlund wrote: >> Tuning in to provide some background on why Truffle needs this and why we spent a lot of time to stabilize this PR. If >> we could have gone a different route we would have. >> Truffle introduces the separation of guest and host language. As host language, we understand the Java host VM. This is >> either HotSpot (relevant for this PR) or SubstrateVM (Native Image). Guest languages are interpreters implemented on >> top of Truffle, like JavaScript, Ruby, or Python, but also Espresso our Java implementation based on Truffle. Truffle >> uses Graal and JVMCI to optimize these guest languages to optimized machine code using a technique called the first >> Futamura projection. This Graal compilation is limited to JDKs that provide JVMCI APIs. In Truffle we use the notion >> of guest and host stack frames. Guest stack frames represent a method activation in the guest language and host stack >> frames represent host Java method activations. A guest stack frame consists of the caller location and guest local >> variables. Truffle languages need to access guest frames of the current thread to construct a stack trace or to lazily >> access variables in a parent guest frame. There are two techniques to do this: 1. Have a separate stack data structure >> on the heap that keeps track of the guest frames for each thread. 2. Walk the live local variables in Java host frames >> to access the guest frame and caller location. We use the technique (1) to implement Truffle guest stack traces on a >> JVM without JVMCI support. This is pretty simple and allows us to walk the guest stack for any thread we need. But, >> there are downsides with this: >> * For each method invocation we have additional overhead for maintaining the extra heap data structure. >> * The frame object always escapes the current compilation scope and can therefore not be escape analyzed by Graal. >> >> Both of these issues are deal-breakers, performance-wise. With Truffle we want to be competitive with other specialized >> VMs, so technique (1) is not good enough. JVMCI currently exposes stack walking APIs for Truffle that allows us to >> access the host frame local variables of the current thread. This allows us to lazily reconstruct the guest frames from >> the host frames from certain known and live local variables. We also have special logic to reconstruct read-only guest >> frames from optimized Truffle+Graal compiled methods without the need to invalidate the optimized code. We are using >> the technique (2) successfully for many years, but now with the growing maturity of Truffle we have new requirements: >> 1. We need to be able to walk all the root pointers of a guest language. This includes all active guest frames. This is >> needed to allow languages to walk all live objects (e.g. Ruby needs that) and to compute the size of the live objects >> of a truffle language. 2. We need to be able to read locals from other threads to produce the guest stack trace of >> other threads in the Truffle debugger. This was not a big issue before, because we were mostly dealing with >> single-threaded languages (JavaScript). The Truffle debugger should not be confused with the Java host debugger. The >> Truffle debugger works based on the Truffle instrumentation framework and cannot debug Java host code. It only shows >> guest stack frames and statements and is entirely agnostic to which Java methods were used to implement it and on which >> Java VM it runs on. It is entirely built with Java, without the use of JVMTI, this allows us to debug guest code >> without having the Java debugger attached. It allows us to on-demand enable debugging in a production scenario when it >> is needed and only for a guest language instance that needs it without slowing down other code in host VM (e.g. in an >> app server). Truffle debugging works on SubstrateVM (native-image) which has currently no support for JVMTI. Enabling >> and not using the debugger also comes without any peak performance overhead (some memory and warmup overhead). To >> summarize: 1. We cannot use the StackWalker API as it does not allow us to access local variables. 2. We cannot >> manually allocate extra objects for guest language frames, as this would hurt performance. 3. We cannot use JVMTI >> because: 3a. We need it to implement language features, not just debugger features. 3b. There is no way to enable it on >> demand for an individual guest application (we run multiple guest applications per host VM). 3c. There is no Java API >> that allows it be used in the same process. Therefore our best idea was to introduce this new JVMCI API. We are of >> course open to other suggestions, if they solve our problem. This is also not an entirely new feature, this PR is an >> extension to the existing JVMCI functionality to walk the stack frames with local variable access. I hope these >> clarifications were helpful. > > java.lang.StackWalker does expose locals as well. What am I missing? @fisk @coleenp , it appears as though StackWalker can only be used for the current thread. Am I missing some other, potentially internal, API that extends StackWalker to work on other threads? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From eosterlund at openjdk.java.net Wed Sep 23 20:13:21 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 23 Sep 2020 20:13:21 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8XGxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> Message-ID: <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> On Wed, 23 Sep 2020 19:54:11 GMT, Doug Simon wrote: >> java.lang.StackWalker does expose locals as well. What am I missing? > > @fisk @coleenp , it appears as though StackWalker can only be used for the current thread. Am I missing some other, > potentially internal, API that extends StackWalker to allow one thread to walk the stack of another thread? You are right; the java.lang.StackWalker does not have a thread parameter. If one is needed, I imagine we can add one (using a handshake). However, I was under the impression that only the debugger case needed this for other remote threads, in which case JVMTI seems like the natural solution. So yeah, is the non-debug case in need of remote stack traces with locals? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From dnsimon at openjdk.java.net Wed Sep 23 20:26:11 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 23 Sep 2020 20:26:11 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Wed, 23 Sep 2020 20:10:37 GMT, Erik ?sterlund wrote: >> @fisk @coleenp , it appears as though StackWalker can only be used for the current thread. Am I missing some other, >> potentially internal, API that extends StackWalker to allow one thread to walk the stack of another thread? > > You are right; the java.lang.StackWalker does not have a thread parameter. If one is needed, I imagine we can add one > (using a handshake). However, I was under the impression that only the debugger case needed this for other remote > threads, in which case JVMTI seems like the natural solution. So yeah, is the non-debug case in need of remote stack > traces with locals? I'll let @chumer answer that question. However, I have another one of my own. As far as I can see, the only use of `java.lang.LiveStackFrame` and `java.lang.LiveStackFrameInfo` in the JDK code base are in the [LocalsAndOperands](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/test/jdk/java/lang/StackWalker/LocalsAndOperands.java) test where they are used via reflection. Do you know if there are/were plans to make these classes public? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From jrose at openjdk.java.net Wed Sep 23 22:11:32 2020 From: jrose at openjdk.java.net (John R Rose) Date: Wed, 23 Sep 2020 22:11:32 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Wed, 23 Sep 2020 20:23:18 GMT, Doug Simon wrote: >> You are right; the java.lang.StackWalker does not have a thread parameter. If one is needed, I imagine we can add one >> (using a handshake). However, I was under the impression that only the debugger case needed this for other remote >> threads, in which case JVMTI seems like the natural solution. So yeah, is the non-debug case in need of remote stack >> traces with locals? > > I'll let @chumer answer that question. > > However, I have another one of my own. As far as I can see, the only use of `java.lang.LiveStackFrame` and > `java.lang.LiveStackFrameInfo` in the JDK code base are in the > [LocalsAndOperands](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/test/jdk/java/lang/StackWalker/LocalsAndOperands.java) > test where they are used via reflection. Do you know if there are/were plans to make these classes public? The special case we support in the `StackWalker` API is intentionally limited, because a thread examining its own stack is the least risky and most performant scenario. The `StackWalker::walk` API point, in particular, is carefully designed so that its internal implementation can internally use unsafe "dangling" pointers from the thread into its own stack. This reduces copying and buffering, which is obviously the least expensive way to "take a quick peek" at what's on the stack. It is reasonable to ask to extend such functionality to a second, uncooperative thread, but this brings in lots of extra baggage: - How does the requesting thread get permission to look inside the target thread? (New security analysis.) - At what point does the target thread get its state taken as a snapshot? Any random moment? - How is the target thread "held still" while it is being sampled? (And then, "Where is this term 'safepoint' defined in the JVM specifications?") - Can a target thread refuse or defer the request, to defend some particular encapsulation? - How is that state stored, and what are the time and space costs for such storage? - What happens if the requesting thread just wants to look at a few bits? Do we still buffer up a whole backtrace? - Or, is the target thread required to execute callbacks provided by the requesting thread, with a temporary view, and if so, that limits are there on such callbacks? - Can the observation process ever cause the target thread to fail, or will any and all failures (OOME, SOE, etc.) be attributed to the requesting thread? - What happens if the requesting thread makes two requests in a row: Are there any guarantees about relations between the two sets of results? (To be fair, this is also an issue with the self-walking case.) - What happens if the requesting thread asks to change a value in a frame or pop or re-invoke or replace a frame? (Not allowed in the self-walking case either, but a plausible extension.) If only "just adding a thread parameter" were a straightforward extension? Instead, we have serious user model issues (see above), and serious implementation issues (see the PR). I think we could perhaps add cross-thread access to the current `StackWalker` API, if we came up with answers to the above. I think, in order to engineer it correctly, we would want to factor it as the composition of a self-walking request, *plus* a cross-call mechanism which would allow one thread to ask another thread to run a function. Jumbling these complex operations together into a big pile of new code would be the wrong way to do it. The self-walking API is pretty well understood, and there is a good literature on cross-call mechanisms too. Let's break the problem up. BTW, the current `StackWalker` API could certainly accept minor extensions to inspect locals, and/or to perform frame replacement, as hinted above. The JVM currently benefits from performing on-stack replacement when it can tell that a slow loop is worth (re-)optimizing as a fast loop. There's no reason the JDK libraries (say, the streams runtime, in particular) shouldn't have a shot at doing something similar. That would require internal JDK hooks self-inspect and replace loops with improved "customizations", on the fly. All of the above comments apply only to what might be called the self-inspecting, self-reflective, or "introspective" modes of stack walking. Debuggers usually don't do this (except in one-world environments like Lisp and SmallTalk), but rather operate from the side, through a privileged channel "under the virtual metal" like JVMTI. I suppose for those use cases, JVMTI is plenty good. If there is some trick for self-attachment (either direct or through a conspirator process), then some introspection is also possible, via JVMTI. For best performance, a more "one world" implementation is desirable, but this implies that we create a whole category of "debugging/monitoring code". Such debugging/monitoring code would (like today's runtime internals like those that use `Unsafe`) have privileges beyond regular application code. It might also have eBPF-like limitations on resource usage, so that its executions could be hidden "under the metal" of regular executions. IMO these are promising ideas. They might help us define a better, more cooperative debugging/monitoring primitives. I raise the ideas here because I think there may be a root issue here: How can we use the JDK's on-line introspection APIs for more purposes? How can we inject privileged monitoring code into Java executions? Adding yet another stack walking mechanism to the JVM seems to me like an inefficient way to move, a little bit, in the direction of cooperative debugging/monitoring facilities in the JDK. Conversely, if we can create a way to do (privileged) cross-calls, then we won't need yet another stack walking mechanism. I guess this is where I end up: Please consider refactoring this into an extension (if any is needed) to the self-inspection API (`StackWalker`) and something a cross-call API. Then we should consider hooking it up to JVMCI. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From jzhu at openjdk.java.net Thu Sep 24 00:57:59 2020 From: jzhu at openjdk.java.net (Joshua Zhu) Date: Thu, 24 Sep 2020 00:57:59 GMT Subject: Integrated: 8253048: AArch64: When CallLeaf, no need to preserve callee-saved registers in caller In-Reply-To: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> References: <9K43aYMMk0zdBfxGBwRA336lvjqGZyLFtZcQKVWLOec=.4a194ce2-db80-4dcc-9ef2-ed2a0d2cd088@github.com> Message-ID: On Fri, 11 Sep 2020 12:23:30 GMT, Joshua Zhu wrote: > I noticed all Floating-Point and SIMD Registers are defined as SOC > registers in c calling convention on AArch64. > As AArch64 ABI tells, the bottom 64 bits of v8-v15 are callee-saved. > > When using CallRuntime, with the help of existing flag "exclude_soe" and > function add_call_kills(), SOE registers are killed by the call > because values that could show up in the RegisterMap aren't live in > callee saved registers. > But CallLeaf and CallLeafNoFP are ok because they don't have safepoint > and debug info. > > Therefore I submit this patch that aligns save-policy in c calling > convention with AArch64 ABI. It could help eliminate unnecessary SOE > registers spilling in caller across CallLeafNode. > > I wrote a simple test case: > http://cr.openjdk.java.net/~jzhu/8253048/Test.java > Original OptoAssembly is: > http://cr.openjdk.java.net/~jzhu/8253048/old_OptoAssembly > With the patch, unnecessary spillings are eliminated: > http://cr.openjdk.java.net/~jzhu/8253048/new_OptoAssembly > > And when a vector is alive across CallLeaf, with the help of existing > FatProjectionNode and RA, the whole vector register ( length > 64-bit ) > is still spilled to stack as usual. > > A test case using VectorAPI is written to verify: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector.java > Test patch: > http://cr.openjdk.java.net/~jzhu/8253048/patch > OptoAssembly dump: > http://cr.openjdk.java.net/~jzhu/8253048/TestVector_OptoAssembly > > I also searched all occurrences of "V8-V15" in aarch64 codes. > The stubs for sin/cos don't save v10 before usage. > Therefore I replace it with caller-save register v24. > > Jtreg Testing: hotspot_all_no_apps, jdk_core and langtools:tier1 > > Could you please help review this change? > > Best Regards, > Joshua > > --------- This pull request has now been integrated. Changeset: ba174af3 Author: Joshua Zhu URL: https://git.openjdk.java.net/jdk/commit/ba174af3 Stats: 22 lines in 2 files changed: 0 ins; 0 del; 22 mod 8253048: AArch64: When CallLeaf, no need to preserve callee-saved registers in caller Reviewed-by: adinn, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/129 From thartmann at openjdk.java.net Thu Sep 24 05:54:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 24 Sep 2020 05:54:00 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 12:14:01 GMT, Roland Westrelin wrote: >> During unswitching, PhaseIdealLoop::create_slow_version_of_loop() >> calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one >> for each loops, to clone some predicates above each loop. That code is >> fragile as it (implicitly) requires the fast loop to be processed >> first. I propose calling >> PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time >> and have it handle both loops in a single pass. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > code style Looks good. src/hotspot/share/opto/loopPredicate.cpp line 298: > 296: // Clone loop predicates to cloned loops when unswitching a loop. > 297: void PhaseIdealLoop::clone_predicates_to_unswitched_loop(IdealLoopTree* loop, const Node_List& old_new, ProjNode*& > iffast, ProjNode*& ifslow) { 298: LoopNode* head = loop->_head->as_Loop(); Extra whitespace after `head` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/317 From eosterlund at openjdk.java.net Thu Sep 24 06:41:43 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 24 Sep 2020 06:41:43 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Wed, 23 Sep 2020 22:06:33 GMT, John R Rose wrote: >> I'll let @chumer answer that question. >> >> However, I have another one of my own. As far as I can see, the only use of `java.lang.LiveStackFrame` and >> `java.lang.LiveStackFrameInfo` in the JDK code base are in the >> [LocalsAndOperands](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/test/jdk/java/lang/StackWalker/LocalsAndOperands.java) >> test where they are used via reflection. Do you know if there are/were plans to make these classes public? > > The special case we support in the `StackWalker` API is intentionally limited, because a thread examining its own stack > is the least risky and most performant scenario. The `StackWalker::walk` API point, in particular, is carefully > designed so that its internal implementation can internally use unsafe "dangling" pointers from the thread into its own > stack. This reduces copying and buffering, which is obviously the least expensive way to "take a quick peek" at what's > on the stack. It is reasonable to ask to extend such functionality to a second, uncooperative thread, but this brings > in lots of extra baggage: > - How does the requesting thread get permission to look inside the target thread? (New security analysis.) > - At what point does the target thread get its state taken as a snapshot? Any random moment? > - How is the target thread "held still" while it is being sampled? (And then, "Where is this term 'safepoint' defined in > the JVM specifications?") > - Can a target thread refuse or defer the request, to defend some particular encapsulation? > - How is that state stored, and what are the time and space costs for such storage? > - What happens if the requesting thread just wants to look at a few bits? Do we still buffer up a whole backtrace? > - Or, is the target thread required to execute callbacks provided by the requesting thread, with a temporary view, and if > so, that limits are there on such callbacks? > - Can the observation process ever cause the target thread to fail, or will any and all failures (OOME, SOE, etc.) be > attributed to the requesting thread? > - What happens if the requesting thread makes two requests in a row: Are there any guarantees about relations between > the two sets of results? (To be fair, this is also an issue with the self-walking case.) > - What happens if the requesting thread asks to change a value in a frame or pop or re-invoke or replace a frame? (Not > allowed in the self-walking case either, but a plausible extension.) > > If only "just adding a thread parameter" were a straightforward extension? Instead, we have serious user model issues > (see above), and serious implementation issues (see the PR). > I think we could perhaps add cross-thread access to the current `StackWalker` API, if we came up with answers to the > above. I think, in order to engineer it correctly, we would want to factor it as the composition of a self-walking > request, *plus* a cross-call mechanism which would allow one thread to ask another thread to run a function. Jumbling > these complex operations together into a big pile of new code would be the wrong way to do it. The self-walking API is > pretty well understood, and there is a good literature on cross-call mechanisms too. Let's break the problem up. > BTW, the current `StackWalker` API could certainly accept minor extensions to inspect locals, and/or to perform frame > replacement, as hinted above. The JVM currently benefits from performing on-stack replacement when it can tell that a > slow loop is worth (re-)optimizing as a fast loop. There's no reason the JDK libraries (say, the streams runtime, in > particular) shouldn't have a shot at doing something similar. That would require internal JDK hooks self-inspect and > replace loops with improved "customizations", on the fly. > > All of the above comments apply only to what might be called the self-inspecting, self-reflective, or "introspective" > modes of stack walking. Debuggers usually don't do this (except in one-world environments like Lisp and SmallTalk), > but rather operate from the side, through a privileged channel "under the virtual metal" like JVMTI. I suppose for > those use cases, JVMTI is plenty good. If there is some trick for self-attachment (either direct or through a > conspirator process), then some introspection is also possible, via JVMTI. For best performance, a more "one world" > implementation is desirable, but this implies that we create a whole category of "debugging/monitoring code". Such > debugging/monitoring code would (like today's runtime internals like those that use `Unsafe`) have privileges beyond > regular application code. It might also have eBPF-like limitations on resource usage, so that its executions could be > hidden "under the metal" of regular executions. IMO these are promising ideas. They might help us define a better, > more cooperative debugging/monitoring primitives. I raise the ideas here because I think there may be a root issue > here: How can we use the JDK's on-line introspection APIs for more purposes? How can we inject privileged monitoring > code into Java executions? Adding yet another stack walking mechanism to the JVM seems to me like an inefficient way > to move, a little bit, in the direction of cooperative debugging/monitoring facilities in the JDK. Conversely, if we > can create a way to do (privileged) cross-calls, then we won't need yet another stack walking mechanism. I guess this > is where I end up: Please consider refactoring this into an extension (if any is needed) to the self-inspection API > (`StackWalker`) and something a cross-call API. Then we should consider hooking it up to JVMCI. John there is a lot to be said here in the solution domain. But before we get there, I want to get answers about the problem domain, so I know if we are solving a real or imaginary problem. The crucial question it boils down to is: "is remote thread stack sampling with locals needed in the non-debugger case"? If so, we can start discussing the solution domain of that. But I suspect we already have all the APIs in place that are needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From roland at openjdk.java.net Thu Sep 24 06:49:37 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 06:49:37 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 13:00:40 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> code style > > Looks good to me! @chhagedorn @vnkozlov @TobiHartmann thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Thu Sep 24 06:49:36 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 06:49:36 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v3] In-Reply-To: References: Message-ID: > During unswitching, PhaseIdealLoop::create_slow_version_of_loop() > calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one > for each loops, to clone some predicates above each loop. That code is > fragile as it (implicitly) requires the fast loop to be processed > first. I propose calling > PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time > and have it handle both loops in a single pass. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: extra space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/317/files - new: https://git.openjdk.java.net/jdk/pull/317/files/5ab947eb..93f33e9b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/317.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/317/head:pull/317 PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Thu Sep 24 06:49:42 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 06:49:42 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v2] In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 05:49:53 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> code style > > src/hotspot/share/opto/loopPredicate.cpp line 298: > >> 296: // Clone loop predicates to cloned loops when unswitching a loop. >> 297: void PhaseIdealLoop::clone_predicates_to_unswitched_loop(IdealLoopTree* loop, const Node_List& old_new, ProjNode*& >> iffast, ProjNode*& ifslow) { 298: LoopNode* head = loop->_head->as_Loop(); > > Extra whitespace after `head` I'll remove the white space before I integrate the change. ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Thu Sep 24 06:58:17 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 06:58:17 GMT Subject: RFR: 8223051: support loops with long (64b) trip counts Message-ID: Last webrev was: https://cr.openjdk.java.net/~roland/8223051/webrev.03/ This PR includes a few minor changes: - The change in callnode.cpp that Vladimir requested in: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html - Extra comments that John requested in: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html - A couple extra counters to collect more detailed statistics 8252696 (Loop unswitching may cause out of bound array load to be executed) was the only bug that was uncovered by extended testing and it's fixed now. This was previously reviewed by Tobias, Vladimir and John. Given the last changes were either requested by reviewers or a straighforward improvement to statistics, and unless someone objects, I intend to push this in the next few days with the reviewer list I just mentioned. ------------- Commit messages: - trailing whitespaces - long counted loops Changes: https://git.openjdk.java.net/jdk/pull/318/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=318&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8223051 Stats: 902 lines in 11 files changed: 823 ins; 63 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/318.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/318/head:pull/318 PR: https://git.openjdk.java.net/jdk/pull/318 From tobias.hartmann at oracle.com Thu Sep 24 06:59:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 24 Sep 2020 08:59:11 +0200 Subject: C2 does not elide the zeroing of the array in String.repeat() In-Reply-To: <121011600695117@mail.yandex.ru> References: <121011600695117@mail.yandex.ru> Message-ID: Hi Sergey, thanks for the report. Adding some background: Array fills are supposed to be optimized by C2's -XX:+OptimizeFill optimization which has been fixed recently by JDK-8247307 [1]. But as it turned out, that optimization does not generate code as efficient as normal loop unrolling + vectorization (Superword). It is therefore currently disabled by default. Unfortunately, neither Superword nor OptimizeFill currently support zeroing elimination of a newly allocated array. I've filed JDK-8253577 [2] to keep track of this but it might take a while until someone has time to fix it. Best regards, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8247307 [2] https://bugs.openjdk.java.net/browse/JDK-8253577 On 21.09.20 15:57, ?????? ??????? wrote: > Hello, > > as it appears from https://shipilev.net/blog/2016/arrays-wisdom-ancients/ C2 sometimes can eliminate > zeroing of newly allocated array (particularly in ArrayList.toArray(T[])). > > However in case of String.repeat() VM does not elide the zeroing of the array even in case when > repeated String is represented with 1 byte: > > if (len == 1) { > final byte[] single = new byte[count]; > Arrays.fill(single, value[0]); > return new String(single, coder); > } > > Here we are sure that the array is localized and for sure will be completely filled, however zeroing is present. > > When I run the benchmark [1] with fresh-built JDK it gives > > (length) Mode Cnt Score Error Units > repeatOneByteString 8 avgt 50 14.020 ? 1.928 ns/op > repeatOneByteString 64 avgt 50 24.618 ? 2.712 ns/op > repeatOneByteString 128 avgt 50 36.555 ? 1.394 ns/op > repeatOneByteString 1024 avgt 50 134.731 ? 7.022 ns/op > > then if in String.repeat() I replace > > final byte[] single = new byte[count]; > > with > > final byte[] single = StringConcatHelper.newArray(count); > > where StringConcatHelper.newArray(int) delegates directly to UNSAFE.allocateUninitializedArray(Class, int), > the same benchmark demonstrates good improvement: > > (length) Mode Cnt Score Error Units > repeatOneByteString 8 avgt 50 12.545 ? 0.164 ns/op > repeatOneByteString 64 avgt 50 18.393 ? 0.686 ns/op > repeatOneByteString 128 avgt 50 25.550 ? 0.378 ns/op > repeatOneByteString 1024 avgt 50 90.454 ? 1.015 ns/op > > So the question is whether there's an issue in C2 (and whether it is fixeable) or not? > > Originally the question appeared in https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/068641.html > > Cheers, > Sergey Tsypanov > > 1. > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class MiscStringBenchmark { > > @Benchmark > public String repeatOneByteString(Data data) { > return data.oneByteString.repeat(data.length); > } > > @State(Scope.Thread) > public static class Data { > @Param({"8", "64", "128", "1024"}) > private int length; > private final String oneByteString = "a"; > > } > } > From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:29:10 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:29:10 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v5] In-Reply-To: References: Message-ID: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Castaneda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Generate random seed if 'StressSeed' is unset - Use generic swap() for shuffling ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/242/files - new: https://git.openjdk.java.net/jdk/pull/242/files/829b1a1a..91812e58 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=03-04 Stats: 25 lines in 7 files changed: 2 ins; 14 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:29:10 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:29:10 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: <_0q3ksgDD2Wpu-a_aCt1zVOn6s7057srj3pPtc-Ro0I=.f3cafd74-a522-40b9-bb80-7006695ae750@github.com> References: <_0q3ksgDD2Wpu-a_aCt1zVOn6s7057srj3pPtc-Ro0I=.f3cafd74-a522-40b9-bb80-7006695ae750@github.com> Message-ID: <6JkhzWnQBnn0GqYehzZPloB1f-jY1__q3deefS8TKsI=.15a00027-ddd9-47e6-878c-d0ef317640c3@github.com> On Wed, 23 Sep 2020 15:18:34 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/c2_globals.hpp line 55: >> >>> 53: "Randomize worklist traversal in IGVN") \ >>> 54: \ >>> 55: product(bool, GenerateStressSeed, false, DIAGNOSTIC, \ >> >> Is this flag really required? We could simply generate the seed if StressSeed has not been specified on the command >> line (see `FLAG_IS_DEFAULT` macro). > > Agree. Done. I didn't know about `FLAG_IS_DEFAULT`, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:29:11 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:29:11 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: References: Message-ID: <3d2t_mjGq-CxIDJFUH04qpWfL4C3vG04SUrnOQj31nY=.f0ec9f3a-6cc2-437a-80ee-2b21e7dfb740@github.com> On Wed, 23 Sep 2020 11:39:52 GMT, Tobias Hartmann wrote: >> Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Define 'StressSeed' option as 'uint' rather than 'uintx' > > src/hotspot/share/opto/node.cpp line 2333: > >> 2331: >> 2332: //----------------------------------------------------------------------------- >> 2333: void Node_Array::swap(uint i, uint j) { > > You can use the swap method from globalDefinitions.hpp. Done, thanks! I also moved swapping to `PhaseIterGVN` to simplify the PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:38:06 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:38:06 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: Empty commit to trigger jcheck after updating GitHub user name ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/242/files - new: https://git.openjdk.java.net/jdk/pull/242/files/91812e58..4ae51be7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=04-05 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:42:44 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:42:44 GMT Subject: RFR: 8252583: Clean up unused phi-to-copy degradation mechanism [v5] In-Reply-To: References: Message-ID: > Remove unused notion of "PhiNode-to-copy degradation", where PhiNodes can be degraded to copies by setting their > RegionNode to NULL. Remove corresponding `PhiNode::is_copy()` test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in `PhiNode::Ideal()`. Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: Empty commit to trigger jcheck after updating GitHub user name ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/275/files - new: https://git.openjdk.java.net/jdk/pull/275/files/91f30bae..b7656e1f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=275&range=03-04 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/275/head:pull/275 PR: https://git.openjdk.java.net/jdk/pull/275 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 07:44:14 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 07:44:14 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v3] In-Reply-To: <9Tp1dmoKbJKhlAjMhxiBujaWUU0GyrazvEXFAGXLAFY=.e0f02e68-72a0-4c1b-ba32-4c359c678138@github.com> References: <0SnXiN756cLrgglHnxd7OuYOWtgljtB8yCEEsVI6YzA=.7da4aeca-af80-4ee7-a642-3606a0f4557a@github.com> <9Tp1dmoKbJKhlAjMhxiBujaWUU0GyrazvEXFAGXLAFY=.e0f02e68-72a0-4c1b-ba32-4c359c678138@github.com> Message-ID: On Wed, 23 Sep 2020 14:14:53 GMT, Roberto Castaneda Lozano wrote: >> Maybe you could add an additional HelloWorld test which only runs with your new flags to sanity check them without any >> other flags. > >> Maybe you could add an additional HelloWorld test which only runs with your new flags to sanity check them without any >> other flags. > > Good idea, I just did that. Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each main run of the IGVN loop. Also add 'StressSeed=N' option to specify the seed. If the seed is not specified, a random one is generated. In either case, the seed is logged if 'LogCompilation' is enabled. The new options are declared as production+diagnostic for consistency with the existing 'StressLCM' and 'StressGCM' options. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+4676506+javeleon at openjdk.java.net Thu Sep 24 08:04:52 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Thu, 24 Sep 2020 08:04:52 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 06:38:10 GMT, Erik ?sterlund wrote: >> The special case we support in the `StackWalker` API is intentionally limited, because a thread examining its own stack >> is the least risky and most performant scenario. The `StackWalker::walk` API point, in particular, is carefully >> designed so that its internal implementation can internally use unsafe "dangling" pointers from the thread into its own >> stack. This reduces copying and buffering, which is obviously the least expensive way to "take a quick peek" at what's >> on the stack. It is reasonable to ask to extend such functionality to a second, uncooperative thread, but this brings >> in lots of extra baggage: >> - How does the requesting thread get permission to look inside the target thread? (New security analysis.) >> - At what point does the target thread get its state taken as a snapshot? Any random moment? >> - How is the target thread "held still" while it is being sampled? (And then, "Where is this term 'safepoint' defined in >> the JVM specifications?") >> - Can a target thread refuse or defer the request, to defend some particular encapsulation? >> - How is that state stored, and what are the time and space costs for such storage? >> - What happens if the requesting thread just wants to look at a few bits? Do we still buffer up a whole backtrace? >> - Or, is the target thread required to execute callbacks provided by the requesting thread, with a temporary view, and if >> so, that limits are there on such callbacks? >> - Can the observation process ever cause the target thread to fail, or will any and all failures (OOME, SOE, etc.) be >> attributed to the requesting thread? >> - What happens if the requesting thread makes two requests in a row: Are there any guarantees about relations between >> the two sets of results? (To be fair, this is also an issue with the self-walking case.) >> - What happens if the requesting thread asks to change a value in a frame or pop or re-invoke or replace a frame? (Not >> allowed in the self-walking case either, but a plausible extension.) >> >> If only "just adding a thread parameter" were a straightforward extension? Instead, we have serious user model issues >> (see above), and serious implementation issues (see the PR). >> I think we could perhaps add cross-thread access to the current `StackWalker` API, if we came up with answers to the >> above. I think, in order to engineer it correctly, we would want to factor it as the composition of a self-walking >> request, *plus* a cross-call mechanism which would allow one thread to ask another thread to run a function. Jumbling >> these complex operations together into a big pile of new code would be the wrong way to do it. The self-walking API is >> pretty well understood, and there is a good literature on cross-call mechanisms too. Let's break the problem up. >> BTW, the current `StackWalker` API could certainly accept minor extensions to inspect locals, and/or to perform frame >> replacement, as hinted above. The JVM currently benefits from performing on-stack replacement when it can tell that a >> slow loop is worth (re-)optimizing as a fast loop. There's no reason the JDK libraries (say, the streams runtime, in >> particular) shouldn't have a shot at doing something similar. That would require internal JDK hooks self-inspect and >> replace loops with improved "customizations", on the fly. >> >> All of the above comments apply only to what might be called the self-inspecting, self-reflective, or "introspective" >> modes of stack walking. Debuggers usually don't do this (except in one-world environments like Lisp and SmallTalk), >> but rather operate from the side, through a privileged channel "under the virtual metal" like JVMTI. I suppose for >> those use cases, JVMTI is plenty good. If there is some trick for self-attachment (either direct or through a >> conspirator process), then some introspection is also possible, via JVMTI. For best performance, a more "one world" >> implementation is desirable, but this implies that we create a whole category of "debugging/monitoring code". Such >> debugging/monitoring code would (like today's runtime internals like those that use `Unsafe`) have privileges beyond >> regular application code. It might also have eBPF-like limitations on resource usage, so that its executions could be >> hidden "under the metal" of regular executions. IMO these are promising ideas. They might help us define a better, >> more cooperative debugging/monitoring primitives. I raise the ideas here because I think there may be a root issue >> here: How can we use the JDK's on-line introspection APIs for more purposes? How can we inject privileged monitoring >> code into Java executions? Adding yet another stack walking mechanism to the JVM seems to me like an inefficient way >> to move, a little bit, in the direction of cooperative debugging/monitoring facilities in the JDK. Conversely, if we >> can create a way to do (privileged) cross-calls, then we won't need yet another stack walking mechanism. I guess this >> is where I end up: Please consider refactoring this into an extension (if any is needed) to the self-inspection API >> (`StackWalker`) and something a cross-call API. Then we should consider hooking it up to JVMCI. > > John there is a lot to be said here in the solution domain. But before we get there, I want to get answers about the > problem domain, so I know if we are solving a real or imaginary problem. The crucial question it boils down to is: "is > remote thread stack sampling with locals needed in the non-debugger case"? If so, we can start discussing the solution > domain of that. But I suspect we already have all the APIs in place that are needed. A lot of good comments here. Thanks! I agree that we should look at the problem domain first. Hence, let's look at the use-cases that was brought up once more. 1. The Truffle debugger (not to be confused with a Java host debugger) in general needs to access all live stack traces for all guest-language threads and read/write access to local variables. 2. Some Truffle guest languages need to access all live objects (e.g. Ruby). Espresso also needs this for implementing e.g. getReferringObjects (through the debugger though) etc. Onto discussing potential solutions: After discussing the capabilities of JVMTI internally, it seems that the current implementation of getting locals might not be able to return anything for escape-analyzed objects. This obviously poses a serious limitation for Truffle given the fact that this is a corner-stone in GraalVM. I take it this played a big role in introducing iterateFrames (current thread only) in JVMCI a while back. Even for the debugger case we would need JVMTI to guarantee the following: 1. A safe suspension mechanism for all target threads (what if two debuggers are connected at the same time?) that would span the entirety of getAllStackTraces + fetching all locals for all frames. I don't see how something like suspendThreadList would provide the safe guards that we would need here. 2. A bulk getAllLocals to avoid fetching all stack traces + retrieving all locals in bulk for all frames. Given that Truffle/GraalVM will continue to support Java 8 for quite some time, this new API should probably be exposed through JVMCI and backported like it was done for iterateFrames regardless of the underlying implementation. Note: In the current PR we need to refactor materializeVirtualObjects into using VM operations to guarantee that we run stack walking and sanity checks for locating the frame in question at a safe point. I'll hold my horses a bit on that until there is a consensus on where this is going. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 08:14:26 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 08:14:26 GMT Subject: Integrated: 8252583: Clean up unused phi-to-copy degradation mechanism In-Reply-To: References: Message-ID: On Mon, 21 Sep 2020 06:27:41 GMT, Roberto Castaneda Lozano wrote: > Remove unused notion of "PhiNode-to-copy degradation", where PhiNodes can be degraded to copies by setting their > RegionNode to NULL. Remove corresponding `PhiNode::is_copy()` test, which always returned NULL (false). Assert that > PhiNodes have an associated RegionNode in `PhiNode::Ideal()`. This pull request has now been integrated. Changeset: f3ea0d36 Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/f3ea0d36 Stats: 30 lines in 7 files changed: 1 ins; 20 del; 9 mod 8252583: Clean up unused phi-to-copy degradation mechanism Remove unused notion of 'PhiNode-to-copy degradation', where PhiNodes can be degraded to copies by setting their RegionNode to NULL. Remove corresponding PhiNode::is_copy() test, which always returned NULL (false). Assert that PhiNodes have an associated RegionNode in PhiNode::Ideal(). Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/275 From vlivanov at openjdk.java.net Thu Sep 24 08:33:15 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 24 Sep 2020 08:33:15 GMT Subject: RFR: 8223051: support loops with long (64b) trip counts In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 09:08:59 GMT, Roland Westrelin wrote: > Last webrev was: > > https://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > This PR includes a few minor changes: > > - The change in callnode.cpp that Vladimir requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html > > - Extra comments that John requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html > > - A couple extra counters to collect more detailed statistics > > 8252696 (Loop unswitching may cause out of bound array load to be > executed) was the only bug that was uncovered by extended testing and > it's fixed now. > > This was previously reviewed by Tobias, Vladimir and John. Given the > last changes were either requested by reviewers or a straighforward > improvement to statistics, and unless someone objects, I intend to > push this in the next few days with the reviewer list I just > mentioned. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/318 From roland at openjdk.java.net Thu Sep 24 08:35:42 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 08:35:42 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v4] In-Reply-To: References: Message-ID: <0_69sUIzecztKagp1SCJAOuC-2CHaBR7NlajMu17HcE=.8ea28142-8fa1-48c0-8db8-a675e23d62a6@github.com> > During unswitching, PhaseIdealLoop::create_slow_version_of_loop() > calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one > for each loops, to clone some predicates above each loop. That code is > fragile as it (implicitly) requires the fast loop to be processed > first. I propose calling > PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time > and have it handle both loops in a single pass. Roland Westrelin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: extra space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/317/files - new: https://git.openjdk.java.net/jdk/pull/317/files/93f33e9b..33fb2fe2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=317&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/317.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/317/head:pull/317 PR: https://git.openjdk.java.net/jdk/pull/317 From thartmann at openjdk.java.net Thu Sep 24 08:40:30 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 24 Sep 2020 08:40:30 GMT Subject: RFR: 8223051: support loops with long (64b) trip counts In-Reply-To: References: Message-ID: <_jbO4YMbCC5NzOwyuWkNwJ9h0gNzo7l4cgLR9AspaAg=.06097f6c-7368-4788-9cd4-4b29f4399aea@github.com> On Wed, 23 Sep 2020 09:08:59 GMT, Roland Westrelin wrote: > Last webrev was: > > https://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > This PR includes a few minor changes: > > - The change in callnode.cpp that Vladimir requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html > > - Extra comments that John requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html > > - A couple extra counters to collect more detailed statistics > > 8252696 (Loop unswitching may cause out of bound array load to be > executed) was the only bug that was uncovered by extended testing and > it's fixed now. > > This was previously reviewed by Tobias, Vladimir and John. Given the > last changes were either requested by reviewers or a straighforward > improvement to statistics, and unless someone objects, I intend to > push this in the next few days with the reviewer list I just > mentioned. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/318 From chagedorn at openjdk.java.net Thu Sep 24 09:13:58 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 24 Sep 2020 09:13:58 GMT Subject: RFR: 8253524: C2: Refactor code that clones predicates during loop unswitching [v4] In-Reply-To: <0_69sUIzecztKagp1SCJAOuC-2CHaBR7NlajMu17HcE=.8ea28142-8fa1-48c0-8db8-a675e23d62a6@github.com> References: <0_69sUIzecztKagp1SCJAOuC-2CHaBR7NlajMu17HcE=.8ea28142-8fa1-48c0-8db8-a675e23d62a6@github.com> Message-ID: On Thu, 24 Sep 2020 08:35:42 GMT, Roland Westrelin wrote: >> During unswitching, PhaseIdealLoop::create_slow_version_of_loop() >> calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one >> for each loops, to clone some predicates above each loop. That code is >> fragile as it (implicitly) requires the fast loop to be processed >> first. I propose calling >> PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time >> and have it handle both loops in a single pass. > > Roland Westrelin has refreshed the contents of this pull request, and previous commits have been removed. The > incremental views will show differences compared to the previous content of the PR. Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From roland at openjdk.java.net Thu Sep 24 10:02:57 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 24 Sep 2020 10:02:57 GMT Subject: Integrated: 8253524: C2: Refactor code that clones predicates during loop unswitching In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 08:30:45 GMT, Roland Westrelin wrote: > During unswitching, PhaseIdealLoop::create_slow_version_of_loop() > calls PhaseIdealLoop::clone_predicates_to_unswitched_loop() twice, one > for each loops, to clone some predicates above each loop. That code is > fragile as it (implicitly) requires the fast loop to be processed > first. I propose calling > PhaseIdealLoop::clone_predicates_to_unswitched_loop() a single time > and have it handle both loops in a single pass. This pull request has now been integrated. Changeset: b1e2f026 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/b1e2f026 Stats: 113 lines in 3 files changed: 20 ins; 49 del; 44 mod 8253524: C2: Refactor code that clones predicates during loop unswitching Reviewed-by: chagedorn, kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/317 From github.com+4676506+javeleon at openjdk.java.net Thu Sep 24 10:42:17 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Thu, 24 Sep 2020 10:42:17 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 08:02:13 GMT, Allan Gregersen wrote: >> John there is a lot to be said here in the solution domain. But before we get there, I want to get answers about the >> problem domain, so I know if we are solving a real or imaginary problem. The crucial question it boils down to is: "is >> remote thread stack sampling with locals needed in the non-debugger case"? If so, we can start discussing the solution >> domain of that. But I suspect we already have all the APIs in place that are needed. > > A lot of good comments here. Thanks! > > I agree that we should look at the problem domain first. Hence, let's look at the use-cases that was brought up once > more. > 1. The Truffle debugger (not to be confused with a Java host debugger) in general needs to access all live stack traces > for all guest-language threads and read/write access to local variables. > 2. Some Truffle guest languages need to access all live objects (e.g. Ruby). Espresso also needs this for implementing > e.g. getReferringObjects (through the debugger though) etc. > Onto discussing potential solutions: > > After discussing the capabilities of JVMTI internally, it seems that the current implementation of getting locals might > not be able to return anything for escape-analyzed objects. This obviously poses a serious limitation for Truffle given > the fact that this is a corner-stone in GraalVM. I take it this played a big role in introducing iterateFrames (current > thread only) in JVMCI a while back. Even for the debugger case we would need JVMTI to guarantee the following: 1. A > safe suspension mechanism for all target threads (what if two debuggers are connected at the same time?) that would > span the entirety of getAllStackTraces + fetching all locals for all frames. I don't see how something like > suspendThreadList would provide the safe guards that we would need here. 2. A bulk getAllLocals to avoid fetching all > stack traces + retrieving all locals in bulk for all frames. Given that Truffle/GraalVM will continue to support Java > 8 for quite some time, this new API should probably be exposed through JVMCI and backported like it was done for > iterateFrames regardless of the underlying implementation. Note: In the current PR we need to refactor > materializeVirtualObjects into using VM operations to guarantee that we run stack walking and sanity checks for > locating the frame in question at a safe point. I'll hold my horses a bit on that until there is a consensus on where > this is going. It?s important to note that this PR is about making changes to JVMCI which is internal API and not part of the JavaSE API. With that framing let's address the questions that came up: - is remote thread stack sampling with locals needed in the non-debugger case? Yes (see previous comment). What existing APIs are in place for this? - How does the requesting thread get permission to look inside the target thread? (New security analysis.) No permission needed - JVMCI is a privileged API that has ?boot loader? access. - At what point does the target thread get its state taken as a snapshot? Any random moment? The snapshot of threads is taken at a safepoint (much like Thread.getAllStackTraces() which does not mention safepoints either) - How is the target thread ?held still? while it is being sampled? (And then, ?Where is this term ?safepoint? defined in the JVM specifications??) See above. - Can a target thread refuse or defer the request, to defend some particular encapsulation? No - see above. - How is that state stored, and what are the time and space costs for such storage? The state is stored within [HotSpotStackFrameReference](https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotStackFrameReference.java) instances. its about the same time and cost as java.lang.LiveStackFrameInfo. - What happens if the requesting thread just wants to look at a few bits? Do we still buffer up a whole backtrace? In the suggested API you can 1) limit the number of frames to capture, 2) skip the first N top frames 3) add an array of methods as argument that any frames seen on stack before those are not included and 4) add an array of methods as arguments controlling the exact methods to include in the result. - Or, is the target thread required to execute callbacks provided by the requesting thread, with a temporary view, and if so, that limits are there on such callbacks? No callbacks. - Can the observation process ever cause the target thread to fail, or will any and all failures (OOME, SOE, etc.) be attributed to the requesting thread? No. All failures will be attributed to the requesting thread. - What happens if the requesting thread makes two requests in a row: Are there any guarantees about relations between the two sets of results? (To be fair, this is also an issue with the self-walking case.) No guarantees. - What happens if the requesting thread asks to change a value in a frame or pop or re-invoke or replace a frame? (Not allowed in the self-walking case either, but a plausible extension.) No support for this. The snapshot is read-only. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 11:15:14 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 11:15:14 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() Message-ID: Remove unused method `PhaseIterGVN::init_worklist(Node *)`. ------------- Commit messages: - 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() Changes: https://git.openjdk.java.net/jdk/pull/334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=334&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253586 Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/334/head:pull/334 PR: https://git.openjdk.java.net/jdk/pull/334 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 11:15:14 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 11:15:14 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 11:04:24 GMT, Roberto Castaneda Lozano wrote: > Remove unused method `PhaseIterGVN::init_worklist(Node *)`. Tested on `hs-tier1`. ------------- PR: https://git.openjdk.java.net/jdk/pull/334 From github.com+8792647+robcasloz at openjdk.java.net Thu Sep 24 11:15:14 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Thu, 24 Sep 2020 11:15:14 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 11:07:40 GMT, Roberto Castaneda Lozano wrote: >> Remove unused method `PhaseIterGVN::init_worklist(Node *)`. > > Tested on `hs-tier1`. Remove unused method PhaseIterGVN::init_worklist(Node *). ------------- PR: https://git.openjdk.java.net/jdk/pull/334 From eosterlund at openjdk.java.net Thu Sep 24 11:18:55 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 24 Sep 2020 11:18:55 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 08:02:13 GMT, Allan Gregersen wrote: > A lot of good comments here. Thanks! > > I agree that we should look at the problem domain first. Hence, let's look at the use-cases that was brought up once > more. > 1. The Truffle debugger (not to be confused with a Java host debugger) in general needs to access all live stack traces > for all guest-language threads and read/write access to local variables. Right, and these needs should be satisfied by the JVMTI APIs that were designed to do exactly this. Right? > 2. Some Truffle guest languages need to access all live objects (e.g. Ruby). Espresso also needs this for implementing > e.g. getReferringObjects (through the debugger though) etc. This sounds like a completely different use case. You want to get the stack locals to traverse all objects. But stacks are only one root set out of many. Keeping the HotSpot root sets in sync outside the VM, across multiple JDK versions, does not sound like something you want to do unless you really have to. Fortunately, this is yet another wheel that has already been invented. For example, you can use JVMTI FollowReferences to perform a traversal through the heap from roots (that we maintain in the VM), and get callbacks for each visited object. You can also use JVMTI IterateThroughHeap to iterate over the heap and report the objects found with callbacks. Over the years, there have been many fun interactions with especially (concurrent) reference processing (weak/phantom references, finalizers, class unloading, which root sets should and should not be included, that constantly changes) that I really don't think you want to re-discover, by performing yet another heap walk, outside of the VM. I think just plugging in to this wheel sounds more desirable for this use case. I also don't get how you can even get a snapshot of live objects, if you allow threads to run concurrently. You mention it is so important to get a snapshot of all thread locals for this to work the way you want it to. But it's seemingly not just the roots you need a snapshot of, if you allow the object graph to concurrently mutate. The JVMTI APIs solve all of this. You should be able to just plug in to that. Right? > > Onto discussing potential solutions: > > After discussing the capabilities of JVMTI internally, it seems that the current implementation of getting locals might > not be able to return anything for escape-analyzed objects. This obviously poses a serious limitation for Truffle given > the fact that this is a corner-stone in GraalVM. I take it this played a big role in introducing iterateFrames (current > thread only) in JVMCI a while back. The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes this variable and presumably plays by the same rules. > Even for the debugger case we would need JVMTI to guarantee the following: > > 1. A safe suspension mechanism for all target threads (what if two debuggers are connected at the same time?) that > would span the entirety of getAllStackTraces + fetching all locals for all frames. I don't see how something like > suspendThreadList would provide the safe guards that we would need here. > > 2. A bulk getAllLocals to avoid fetching all stack traces + retrieving all locals in bulk for all frames. Let me guess... these requirements you just stated are for getting a stable set of roots for the heap walk you want to perform outside of the VM? Or is there any other client that needs the snapshotness of the thread dumps? If I'm right, then there are many more roots that can mutate other than stacks... and the edges in the object graph can also mutate. How are you going to ensure proper traversal of all objects without injecting GC barriers into mutating accesses? I think what you really want is an API that can give you a snapshot heap iteration from roots (including stacks but also a whole bunch of other roots that you really don't want to know about). And JVMTI provides that API, as discussed above. Let's see if my guessing was right... > Given that Truffle/GraalVM will continue to support Java 8 for quite some time, this new API should probably be exposed > through JVMCI and backported like it was done for iterateFrames regardless of the underlying implementation. Let's continue discussing the problem domain first. Obviously, JVMTI has been around for ages. So let's see what problems we really have to solve, once our understanding of the problems you try to solve converge. > Note: In the current PR we need to refactor materializeVirtualObjects into using VM operations to guarantee that we run > stack walking and sanity checks for locating the frame in question at a safe point. I'll hold my horses a bit on that > until there is a consensus on where this is going. I also can't find the definition of some functions and wonder if all files are really in the PR. Seems like they are not, unless the GUI is bugged out. But let's leave the whole discussion of the code in this PR to a point where we know what code if any is actually going to change. Hope this helps. I hope we can start to converge a bit in the problem domain. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From github.com+4676506+javeleon at openjdk.java.net Thu Sep 24 12:03:09 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Thu, 24 Sep 2020 12:03:09 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 11:16:17 GMT, Erik ?sterlund wrote: >The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, >and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes >this variable and presumably plays by the same rules. A few questions pops into my mind: 1. What is the performance implications on setting _jvmti_can_access_local_variables capability? More specifically, does this capability kill off escape analysis for a GraalVM? 2. When we have multiple Truffle contexts running, including but not limited to different guest languages, can we make sure that escape analysis is only switched off on very specific parts of the host system with this set and can we enable/disable this capability on the fly? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From eosterlund at openjdk.java.net Thu Sep 24 12:22:22 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 24 Sep 2020 12:22:22 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 12:00:22 GMT, Allan Gregersen wrote: > >The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, > >and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes > >this variable and presumably plays by the same rules. > > > > A few questions pops into my mind: > > > > 1. What is the performance implications on setting _jvmti_can_access_local_variables capability? More specifically, > does this capability kill off escape analysis for a GraalVM? Yes, it will disable escape analysis... when you are running in the debugger. > 2. When we have multiple Truffle contexts running, including but not limited to different guest languages, can we make > sure that escape analysis is only switched off on very specific parts of the host system with this set and can we > enable/disable this capability on the fly? Why do you care to isolate the performance cost of debugging so tightly? Are you trying to enable some kind of use case where you are mixing development and production execution in the same JVM, so some code can run at optimal performance in production, driving a heavy production workload, while someone is single stepping other buggy development code in the debugger? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From thartmann at openjdk.java.net Thu Sep 24 12:25:55 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 24 Sep 2020 12:25:55 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: <7ql9TL3pdU7j9EflfUAgcjMK5odFK-UmWdC2P0wE01I=.01960615-9c6c-4645-9192-4237a4c7ff13@github.com> On Thu, 24 Sep 2020 11:04:24 GMT, Roberto Castaneda Lozano wrote: > Remove unused method `PhaseIterGVN::init_worklist(Node *)`. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/334 From thartmann at openjdk.java.net Thu Sep 24 12:28:31 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 24 Sep 2020 12:28:31 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: <3zDQE_kSd3vjEf0NuPixbcU8Q5Yzfcc_VwPC47IaFBE=.e08ad948-16f4-4ffa-a612-d0c79a9ba5d4@github.com> On Thu, 24 Sep 2020 07:38:06 GMT, Roberto Castaneda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Empty commit to trigger jcheck after updating GitHub user name Thanks for making these changes. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/242 From chagedorn at openjdk.java.net Thu Sep 24 13:36:17 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 24 Sep 2020 13:36:17 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 07:38:06 GMT, Roberto Castaneda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Empty commit to trigger jcheck after updating GitHub user name Looks good to me! Thanks for adding the additional test. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/242 From chagedorn at openjdk.java.net Thu Sep 24 13:39:11 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 24 Sep 2020 13:39:11 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: <9orVvCLRM3qmafhPdZ0W0tq62CyaLjwhCYEHlgoRxEs=.a6b73b3f-aace-4038-a047-f1daae6e1a52@github.com> On Thu, 24 Sep 2020 11:04:24 GMT, Roberto Castaneda Lozano wrote: > Remove unused method `PhaseIterGVN::init_worklist(Node *)`. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/334 From neliasso at openjdk.java.net Thu Sep 24 15:37:21 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 24 Sep 2020 15:37:21 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 11:04:24 GMT, Roberto Castaneda Lozano wrote: > Remove unused method `PhaseIterGVN::init_worklist(Node *)`. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/334 From github.com+4676506+javeleon at openjdk.java.net Thu Sep 24 16:41:28 2020 From: github.com+4676506+javeleon at openjdk.java.net (Allan Gregersen) Date: Thu, 24 Sep 2020 16:41:28 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: On Thu, 24 Sep 2020 12:19:54 GMT, Erik ?sterlund wrote: >>>The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, >>>and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes >>>this variable and presumably plays by the same rules. >> >> A few questions pops into my mind: >> >> 1. What is the performance implications on setting _jvmti_can_access_local_variables capability? More specifically, >> does this capability kill off escape analysis for a GraalVM? >> 2. When we have multiple Truffle contexts running, including but not limited to different guest languages, can we make >> sure that escape analysis is only switched off on very specific parts of the host system with this set and can we >> enable/disable this capability on the fly? > >> >The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, >> >and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes >> >this variable and presumably plays by the same rules. >> >> >> >> A few questions pops into my mind: >> >> >> >> 1. What is the performance implications on setting _jvmti_can_access_local_variables capability? More specifically, >> does this capability kill off escape analysis for a GraalVM? > > Yes, it will disable escape analysis... when you are running in the debugger. > >> 2. When we have multiple Truffle contexts running, including but not limited to different guest languages, can we make >> sure that escape analysis is only switched off on very specific parts of the host system with this set and can we >> enable/disable this capability on the fly? > > Why do you care to isolate the performance cost of debugging so tightly? Are you trying to enable some kind of use case > where you are mixing development and production execution in the same JVM, so some code can run at optimal performance > in production, driving a heavy production workload, while someone is single stepping other buggy development code in > the debugger? As a result of a useful offline discussion with @fisk, we?ve decided to pause effort on this PR and investigate what it would take to enhance the StackWalker API to serve all the Truffle use cases. Stay tuned for updates? ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From kvn at openjdk.java.net Thu Sep 24 18:53:38 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 24 Sep 2020 18:53:38 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 07:38:06 GMT, Roberto Castaneda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Empty commit to trigger jcheck after updating GitHub user name test/hotspot/jtreg/compiler/arguments/TestStressIGVNOptions.java line 32: > 30: * compiler.arguments.TestStressIGVNOptions > 31: * @run main/othervm -XX:+StressIGVN -XX:StressSeed=42 > 32: * compiler.arguments.TestStressIGVNOptions Please, add next to run test when C2 is enabled as you did for other 2 tests: @requires vm.compiler2.enabled ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From enikitin at openjdk.java.net Thu Sep 24 19:35:38 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 24 Sep 2020 19:35:38 GMT Subject: RFR: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 21:17:04 GMT, Igor Ignatyev wrote: >> Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). >> >> I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it >> from ProblemList.txt. >> Second change is marking the test with randomness keyword from the >> [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). >> Tested using mach5 on the 4 platforms, 50 runs each. > > Marked as reviewed by iignatyev (Reviewer). Closed in favour of a new bug (the [JDK-8253607](https://bugs.openjdk.java.net/browse/JDK-8253607)) and it's PR #345. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From enikitin at openjdk.java.net Thu Sep 24 19:35:38 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 24 Sep 2020 19:35:38 GMT Subject: Withdrawn: 8208257: [mlvm] Add randomness keyword to vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: References: Message-ID: <30qfJceI795LAfYlHSFW4mjXDxwBk8MCx7CNoPCPfGg=.7f6ea09e-acb9-4ef8-bd4d-8e157e2ba47e@github.com> On Tue, 22 Sep 2020 20:13:13 GMT, Evgeny Nikitin wrote: > Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). > > I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it > from ProblemList.txt. > Second change is marking the test with randomness keyword from the > [JDK-8243427](https://bugs.openjdk.java.net/browse/JDK-8243427) (using reproducible random for mlvm tests). > Tested using mach5 on the 4 platforms, 50 runs each. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/309 From enikitin at openjdk.java.net Thu Sep 24 19:36:56 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 24 Sep 2020 19:36:56 GMT Subject: RFR: 8253607: [mlvm] meth/func/jdi/breakpointOtherStratum: un-problemlist and add randomness keyword Message-ID: Created as a replacement for the #309 (a new issue has been opened). Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). I tried to reproduce the test failure multiple times with different VM parameters, but it always passes. I suggest removing it from ProblemList.txt. Second change is marking the test with randomness keyword from the JDK-8243427 (using reproducible random for mlvm tests). Tested using mach5 on the 4 platforms, 50 runs each. ------------- Commit messages: - 8253607: [mlvm] meth/func/jdi/breakpointOtherStratum: un-problemlist and add randomness keyword Changes: https://git.openjdk.java.net/jdk/pull/345/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=345&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253607 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/345.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/345/head:pull/345 PR: https://git.openjdk.java.net/jdk/pull/345 From mchung at openjdk.java.net Thu Sep 24 23:45:05 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 24 Sep 2020 23:45:05 GMT Subject: RFR: JDK-8253001: [JVMCI] Add API for getting stacktraces independently of current thread In-Reply-To: References: <0lND5NvCLWbPFWv_9eP-V_q_I4uXGanWg5ubT5ZMiZg=.86357e65-c1e8-41cb-9645-624870a46436@github.com> <_luUft9Qs0GYU6GwUs1PLMDGel1salhJg8X Gxsc3h0k=.c175a93b-0736-4549-8316-ac01692f6b9e@github.com> <9tn1QTeKWk7DB7fWgyw7O_PSuN5Y7nTUmvvZSuuWqHM=.e9a7dbbe-c8ec-40eb-aee4-74e08892db08@github.com> Message-ID: <5VH0MfSnKMvVfYzBAxb-OrdHeU6HAt1-wM7uEXYPn-o=.227e3d57-d63f-47bd-8007-dbf1ad376ba6@github.com> On Thu, 24 Sep 2020 16:39:10 GMT, Allan Gregersen wrote: >>> >The problem you describe is what _jvmti_can_access_local_variables capability is for. A debugger will have this set, >>> >and hence while debugging, we just don't scalarize stuff, so all objects are simply materialized. JVMCI already exposes >>> >this variable and presumably plays by the same rules. >>> >>> >>> >>> A few questions pops into my mind: >>> >>> >>> >>> 1. What is the performance implications on setting _jvmti_can_access_local_variables capability? More specifically, >>> does this capability kill off escape analysis for a GraalVM? >> >> Yes, it will disable escape analysis... when you are running in the debugger. >> >>> 2. When we have multiple Truffle contexts running, including but not limited to different guest languages, can we make >>> sure that escape analysis is only switched off on very specific parts of the host system with this set and can we >>> enable/disable this capability on the fly? >> >> Why do you care to isolate the performance cost of debugging so tightly? Are you trying to enable some kind of use case >> where you are mixing development and production execution in the same JVM, so some code can run at optimal performance >> in production, driving a heavy production workload, while someone is single stepping other buggy development code in >> the debugger? > > As a result of a useful offline discussion with @fisk, we?ve decided to pause effort on this PR and investigate what it > would take to enhance the StackWalker API to serve all the Truffle use cases. Stay tuned for updates? I suggest to send out the proposed extension to `StackWalker` API after the initial investigation for discussion. A discussion on the proposed API change would be useful before sending out a PR for code review. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/110 From iignatyev at openjdk.java.net Fri Sep 25 03:05:37 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 25 Sep 2020 03:05:37 GMT Subject: RFR: 8253607: [mlvm] meth/func/jdi/breakpointOtherStratum: un-problemlist and add randomness keyword In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 19:30:35 GMT, Evgeny Nikitin wrote: > Created as a replacement for the #309 (a new issue has been opened). > Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). > > I tried to reproduce the test failure multiple times with different VM parameters, but it always passes. I suggest > removing it from ProblemList.txt. > Second change is marking the test with randomness keyword from the JDK-8243427 (using reproducible random for mlvm > tests). > Tested using mach5 on the 4 platforms, 50 runs each. Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/345 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 06:53:09 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 06:53:09 GMT Subject: RFR: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: <_viqZJySTUGOL4G4-XG1ej0iVxHgYqF4_noBTuee100=.5e06b144-e38d-423d-8eb2-7f0d6724c2cc@github.com> On Thu, 24 Sep 2020 15:34:06 GMT, Nils Eliasson wrote: >> Remove unused method `PhaseIterGVN::init_worklist(Node *)`. > > Looks good! Thanks Tobias, Christian, and Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/334 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 06:57:17 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 06:57:17 GMT Subject: Integrated: 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 11:04:24 GMT, Roberto Castaneda Lozano wrote: > Remove unused method `PhaseIterGVN::init_worklist(Node *)`. This pull request has now been integrated. Changeset: dcde95ba Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/dcde95ba Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod 8253586: C2: Clean up unused PhaseIterGVN::init_worklist() Remove unused method PhaseIterGVN::init_worklist(Node *). Reviewed-by: thartmann, chagedorn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/334 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 07:37:39 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 07:37:39 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v7] In-Reply-To: References: Message-ID: <8rhPxCcwRs2FRbAcxNqblmzVAaiMehhqSyY4WAjh6G0=.77e2d0c0-432b-4a40-bd72-79a40ac288e1@github.com> > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add missing @requires annotation to test case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/242/files - new: https://git.openjdk.java.net/jdk/pull/242/files/4ae51be7..e8aa59ef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=05-06 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 07:37:39 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 07:37:39 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 18:48:39 GMT, Vladimir Kozlov wrote: >> Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Empty commit to trigger jcheck after updating GitHub user name > > test/hotspot/jtreg/compiler/arguments/TestStressIGVNOptions.java line 32: > >> 30: * compiler.arguments.TestStressIGVNOptions >> 31: * @run main/othervm -XX:+StressIGVN -XX:StressSeed=42 >> 32: * compiler.arguments.TestStressIGVNOptions > > Please, add next to run test when C2 is enabled as you did for other 2 tests: > @requires vm.compiler2.enabled Done, thanks for catching this up! ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From shade at openjdk.java.net Fri Sep 25 07:43:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 25 Sep 2020 07:43:05 GMT Subject: RFR: 8253631: Remove unimplemented CompileBroker methods after JEP-165 Message-ID: It would seem that JEP-175 implementation task (JDK-8137167) introduced declarations without any implementations in CompilerBroker: static DirectivesStack* dirstack(); static void set_dirstack(DirectivesStack* stack); static void print_directives(outputStream* st); This can be cleaned up. Testing: - [x] Linux x86_64 fastdebug build - [x] Text searches for `dirstack` and `print_directives` in `src/hotspot` ------------- Commit messages: - 8253631: Remove unimplemented CompileBroker methods after JEP-165 Changes: https://git.openjdk.java.net/jdk/pull/353/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=353&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253631 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/353.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/353/head:pull/353 PR: https://git.openjdk.java.net/jdk/pull/353 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 07:54:06 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 07:54:06 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v6] In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 13:33:36 GMT, Christian Hagedorn wrote: >> Roberto Castaneda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Empty commit to trigger jcheck after updating GitHub user name > > Looks good to me! Thanks for adding the additional test. > > Nice! > > Did you try to run mach5 testing with IGV stress enabled by using --jvm-args "" mach5 option? > > Thanks Vladimir! I tried something similar (but hackier) on tier1 and all tests passed, will try a more systematic run > with multiple seeds, etc. I ran `tier1_compiler` with `-XX:+StressIGVN` for 100 repetitions and did not trigger any failure. ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From github.com+8792647+robcasloz at openjdk.java.net Fri Sep 25 07:54:06 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 07:54:06 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v8] In-Reply-To: References: Message-ID: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. Roberto Castaneda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge master - Add missing @requires annotation to test case - Empty commit to trigger jcheck after updating GitHub user name - Generate random seed if 'StressSeed' is unset - Use generic swap() for shuffling - Add basic sanity test for stress IGVN options - Fix typo - Move shuffle() to PhaseIterGVN - Define 'StressSeed' option as 'uint' rather than 'uintx' - Apply minor rearrangements to simplify the patch - ... and 4 more: https://git.openjdk.java.net/jdk/compare/37b70282...c2c31c3e ------------- Changes: https://git.openjdk.java.net/jdk/pull/242/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=07 Stats: 229 lines in 10 files changed: 224 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 PR: https://git.openjdk.java.net/jdk/pull/242 From thartmann at openjdk.java.net Fri Sep 25 07:56:26 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 25 Sep 2020 07:56:26 GMT Subject: RFR: 8253631: Remove unimplemented CompileBroker methods after JEP-165 In-Reply-To: References: Message-ID: <3YE2WttCGEeFiYFqTolQw8bwGh121H-PQJIlMlPn-Y8=.291b846f-f38e-41bc-8b62-bb036e57c93f@github.com> On Fri, 25 Sep 2020 07:37:01 GMT, Aleksey Shipilev wrote: > It would seem that JEP-175 implementation task (JDK-8137167) introduced declarations without any implementations in > CompilerBroker: > static DirectivesStack* dirstack(); > static void set_dirstack(DirectivesStack* stack); > static void print_directives(outputStream* st); > > This can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug build > - [x] Text searches for `dirstack` and `print_directives` in `src/hotspot` Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/353 From shade at openjdk.java.net Fri Sep 25 07:57:32 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 25 Sep 2020 07:57:32 GMT Subject: RFR: 8253633: Remove unimplemented TieredThresholdPolicy::set_carry_if_neccessary Message-ID: The definition seems to be removed with JDK-8203883, but the declaration was left behind. Testing: - [x] Linux x86_64 fastdebug build - [x] Text search for `set_carry_if_necessary` in `src/hotspot` ------------- Commit messages: - 8253633: Remove unimplemented TieredThresholdPolicy::set_carry_if_necessary Changes: https://git.openjdk.java.net/jdk/pull/355/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=355&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253633 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/355.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/355/head:pull/355 PR: https://git.openjdk.java.net/jdk/pull/355 From roberto.castaneda.lozano at oracle.com Fri Sep 25 08:02:44 2020 From: roberto.castaneda.lozano at oracle.com (Roberto Castaneda Lozano) Date: Fri, 25 Sep 2020 10:02:44 +0200 Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v8] In-Reply-To: References: Message-ID: "Merge master" (c2c31c3) addresses a trivial conflict caused by the integration of JDK-8253586 (https://github.com/openjdk/jdk/pull/334). On 2020-09-25 09:54, Roberto Castaneda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Castaneda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull > request now contains 14 commits: > > - Merge master > - Add missing @requires annotation to test case > - Empty commit to trigger jcheck after updating GitHub user name > - Generate random seed if 'StressSeed' is unset > - Use generic swap() for shuffling > - Add basic sanity test for stress IGVN options > - Fix typo > - Move shuffle() to PhaseIterGVN > - Define 'StressSeed' option as 'uint' rather than 'uintx' > - Apply minor rearrangements to simplify the patch > - ... and 4 more: https://git.openjdk.java.net/jdk/compare/37b70282...c2c31c3e > > ------------- > > Changes: https://git.openjdk.java.net/jdk/pull/242/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=242&range=07 > Stats: 229 lines in 10 files changed: 224 ins; 0 del; 5 mod > Patch: https://git.openjdk.java.net/jdk/pull/242.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/242/head:pull/242 > > PR: https://git.openjdk.java.net/jdk/pull/242 > From thartmann at openjdk.java.net Fri Sep 25 08:12:32 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 25 Sep 2020 08:12:32 GMT Subject: RFR: 8253633: Remove unimplemented TieredThresholdPolicy::set_carry_if_neccessary In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 07:49:53 GMT, Aleksey Shipilev wrote: > The definition seems to be removed with JDK-8203883, but the declaration was left behind. > > Testing: > - [x] Linux x86_64 fastdebug build > - [x] Text search for `set_carry_if_necessary` in `src/hotspot` Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/355 From cjashfor at linux.ibm.com Fri Sep 25 09:06:17 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Fri, 25 Sep 2020 02:06:17 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <5ac1ac19-af1c-4ee6-b478-873031710081@linux.ibm.com> Message-ID: <7eec7c4d-2402-3fc6-55ad-aec2980ff77d@linux.ibm.com> Due to the switchover to Git from Mercurial, this patch set got a new thread on the mailing lists on 9/24, but it has the same title. as this thread. - Corey On 9/7/20 3:08 AM, Doerr, Martin wrote: > Hi Corey, > > thanks for investigating. > > Note that we use xlclang++ on AIX. It may possibly understand the directives as gcc on linux. > Gcc 7.3.1 is the minimum for BE linux. > But if you protect your code by #ifdef VM_LITTLE_ENDIAN no compiler except gcc >= 7.4.0 should ever look at it. > > Best regards, > Martin > > >> -----Original Message----- >> From: Corey Ashford >> Sent: Dienstag, 1. September 2020 02:17 >> To: Doerr, Martin >> Cc: Michihiro Horie ; hotspot-compiler- >> dev at openjdk.java.net; core-libs-dev ; >> Kazunori Ogata ; joserz at br.ibm.com >> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and >> API for Base64 decoding >> >> On 8/27/20 8:07 AM, Doerr, Martin wrote: >>>>> I will use __attribute__ ((align(16))) instead of __vector, and make >>>> them arrays of 16 unsigned char. >>> Maybe __vectors works as expected, too, now. Whatever we use, I'd >> appreciate to double-check the alignment e.g. by using gdb. >>> I don't remember what we had tried and why it didn't work as desired. >> >> >> I just now tried on gcc-7.5.0, declaring a __vector at 1, 2, 3, 8, 9, >> and 15 byte offsets in a struct, trying to force a misalignment, but the >> compiler realigned all of them on 16-byte boundaries. >> >> If someone decides to make the intrinsic work on AIX (big endian), and >> compiles with 7.3.1, I don't know what will happen w.r.t. alignment, so >> to be on the safe side, I will make the declarations 16-byte unsigned >> char arrays with an align attribute. >> >> Looking a bit deeper, I see that the __vector type comes out of the C >> preprocessor as: __attribute__((altivec(vector__))). It's part of the >> compiler's basic set of predefined macros, and can be seen using this >> command: >> >> % gcc -dM -E - < /dev/null | grep __vector >> >> #define __vector __attribute__((altivec(vector__))) >> >> Some information here: >> https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Type-Attributes.html >> >> I don't know if this is helpful or not, but it might answer part of your >> question about the meaning of __vector. >> >> Regards, >> >> - Corey From cjashfor at linux.ibm.com Fri Sep 25 09:07:10 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Fri, 25 Sep 2020 02:07:10 -0700 Subject: [EXTERNAL] Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <91b1717e-f9f4-0b5c-d410-e25507206812@linux.ibm.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> <4bc83479-1ed9-8cd8-22a0-1f19f315df7e@oracle.com> <196a4e58-0710-2f3e-6d1b-e78ab03a185d@oracle.com> <91b1717e-f9f4-0b5c-d410-e25507206812@linux.ibm.com> Message-ID: <3cdec280-a4aa-1791-5eb5-b9102a642256@linux.ibm.com> Note, this patch set is now on a new thread in the mailing list, due to the switchover from Mercurial to Git. Regards, - Corey On 9/9/20 4:32 PM, Corey Ashford wrote: > On 9/9/20 2:04 PM, Roger Riggs wrote: >> Hi Corey, >> >> Right,? the continue was so it would go back and check if the >> conversion was >> complete.? An alternative would be to repeat the check and return if >> there was >> no bytes left to process. > > Another issue I just discovered is that the way the loop is structured, > decodeBlock could be called multiple times in the event that isMIME is > true, and in that case, decodeBlock will try to write into dst[] > starting at offset 0 again. > > My original intention was for the intrinsic to be called a single time > because it never attempted process bytes in the isMIME==true case, and > because of that, the offset into the destination buffer would always be > zero.? With this loop, on the second and later calls, the offset into > dst[] should be non-zero.? This means that I also need to pass dp into > decodeBlock.? That necessitates a change in the parameter passing down > to the intrinsic.? Not a big deal, but it is a ripple. > > I'll get working on it. > > The upside of this change is that it makes the decode and encode > intrinsics closely mirror each other, and handles the isMIME==true case > as a happy side-effect.? With the overhead of the call to the intrinsic, > it's not clear there will be a performance gain when isMIME==true, but a > benchmark should make that clear.? I'm guessing maybe 1.5X to 2X is > about the best that could be expected when linemax is the default 76. > > - Corey > >> >> Thanks, Roger >> >> On 9/9/20 3:13 PM, Corey Ashford wrote: >>> On 9/4/20 8:07 AM, Roger Riggs wrote: >>>> Hi Corey, >>>> >>>> The idea I had in mind is refactoring the fast path into the method >>>> you call decodeBlock. >>>> Base64: lines 751-768. >>>> >>>> It leaves all the unknown/illegal character handling to the Java code. >>>> And yes, it does not need to handle MIME, except to return on >>>> illegal characters. >>>> >>>> The patch is attached. >>> >>> Ah, I see what you mean now, and thanks for the patch!? The patch as >>> presented doesn't work, however, because the intrinsic processes >>> fewer bytes than are in the src buffer, and then executes a >>> "continue;", which then proceeds to loop infinitely because the >>> intrinsic won't process any more bytes after that. >>> >>> I tried dropping the continue, but that doesn't work because the Java >>> (non-intrinsic) code processes all of the bytes, and the line of code >>> following the loop accesses one byte after the end of the src buffer >>> causing an array bounds error. >>> >>> So this needs to be re-thought a little, but it shouldn't be too >>> difficult.? I will work on it. >>> >>> Regards, >>> >>> - Corey >>> >>>> >>>> Regards, Roger >>>> >>>> >>>> >>>> On 8/31/20 6:22 PM, Corey Ashford wrote: >>>>> On 8/29/20 1:19 PM, Corey Ashford wrote: >>>>>> Hi Roger, >>>>>> >>>>>> Thanks for your reply and thoughts!? Comments interspersed below: >>>>>> >>>>>> On 8/28/20 10:54 AM, Roger Riggs wrote: >>>>> ... >>>>>>> Comparing with the way that the Base64 encoder was intrinsified, the >>>>>>> method that is intrinsified should have a method body that does >>>>>>> the same function, so it is interchangable.? That likely will >>>>>>> just shift >>>>>>> the "fast path" code into the decodeBlock method. >>>>>>> Keeping the symmetry between encoder and decoder will >>>>>>> make it easier to maintain the code. >>>>>> >>>>>> Good point.? I'll investigate what this looks like in terms of the >>>>>> actual code, and will report back (perhaps in a new webrev). >>>>>> >>>>> >>>>> Having looked at this again, I don't think it makes sense. One >>>>> thing that differs significantly from the encodeBlock intrinsic is >>>>> that the decodeBlock intrinsic only needs to process a prefix of >>>>> the data, and so it can leave virtually any amount of data at the >>>>> end of the src buffer unprocessed, where as with the encodeBlock >>>>> intrinsic, if it exists, it must process the entire buffer. >>>>> >>>>> In the (common) case where the decodeBlock intrinsic returns not >>>>> having processed everything, it still needs to call the Java code, >>>>> and if that Java code is "replaced" by the intrinsic, it's >>>>> inaccessible. >>>>> >>>>> Is there something I'm overlooking here?? Basically I want the >>>>> decode API to behave differently than the encode API, mostly to >>>>> make the arch-specific intrinsic easier to implement. If that's not >>>>> acceptable, then I need to rethink the API, and also figure out how >>>>> to deal with the illegal character case. The latter could perhaps >>>>> be done by throwing an exception from the intrinsic, or maybe by >>>>> returning a negative length that specifies the index of the illegal >>>>> src byte, and then have the Java code throw the exception). >>>>> >>>>> Regards, >>>>> >>>>> - Corey >>>>> >>>> >>> >> > From shade at openjdk.java.net Fri Sep 25 10:13:14 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 25 Sep 2020 10:13:14 GMT Subject: Integrated: 8253633: Remove unimplemented TieredThresholdPolicy::set_carry_if_neccessary In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 07:49:53 GMT, Aleksey Shipilev wrote: > The definition seems to be removed with JDK-8203883, but the declaration was left behind. > > Testing: > - [x] Linux x86_64 fastdebug build > - [x] Text search for `set_carry_if_necessary` in `src/hotspot` This pull request has now been integrated. Changeset: 27d0a70b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/27d0a70b Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8253633: Remove unimplemented TieredThresholdPolicy::set_carry_if_neccessary Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/355 From shade at openjdk.java.net Fri Sep 25 10:13:20 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 25 Sep 2020 10:13:20 GMT Subject: Integrated: 8253631: Remove unimplemented CompileBroker methods after JEP-165 In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 07:37:01 GMT, Aleksey Shipilev wrote: > It would seem that JEP-175 implementation task (JDK-8137167) introduced declarations without any implementations in > CompilerBroker: > static DirectivesStack* dirstack(); > static void set_dirstack(DirectivesStack* stack); > static void print_directives(outputStream* st); > > This can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug build > - [x] Text searches for `dirstack` and `print_directives` in `src/hotspot` This pull request has now been integrated. Changeset: dc1ef583 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/dc1ef583 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8253631: Remove unimplemented CompileBroker methods after JEP-165 Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/353 From gdub at openjdk.java.net Fri Sep 25 10:13:23 2020 From: gdub at openjdk.java.net (Gilles Duboscq) Date: Fri, 25 Sep 2020 10:13:23 GMT Subject: Integrated: 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode In-Reply-To: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> References: <5y5FB4GGYWpMVxx5L_eysMLAFKvTc8JKhGA8BAjJSqs=.b99cd031-9b5c-4fff-be6a-4765b16358da@github.com> Message-ID: On Wed, 9 Sep 2020 08:18:11 GMT, Gilles Duboscq wrote: > [JDK-8232806](https://bugs.openjdk.java.net/browse/JDK-8232806) introduced the > jdk.internal.lambda.disableEagerInitialization system property to be able to disable eager initialization of lambda > classes. This was necessary to prevent side effects of class initializers triggered by such initialization in the > context of the GraalVM native image tool. However, the change as it is implemented means that the behaviour of > non-capturing lambdas depends on the value of `disableEagerInitialization`: when it is false (the default) such lambdas > are actually a singleton while when it is true, a fresh instance is returned every time. Programs should definitely > _not_ rely on reference equality since the Java spec does not guarantee it. However, in order to separate concern and > ease debugging such bad programs, `disableEagerInitialization` shouldn't influence the singleton vs. fresh instance > behaviour of lambdas in either direction. This pull request has now been integrated. Changeset: 1b79326c Author: Gilles Duboscq URL: https://git.openjdk.java.net/jdk/commit/1b79326c Stats: 194 lines in 4 files changed: 137 ins; 29 del; 28 mod 8242451: ensure semantics of non-capturing lambdas are preserved independent of execution mode Reviewed-by: mchung ------------- PR: https://git.openjdk.java.net/jdk/pull/93 From chagedorn at openjdk.java.net Fri Sep 25 10:14:51 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 25 Sep 2020 10:14:51 GMT Subject: RFR: 8253588: C1: assert(false) failed: unknown register compiler on x86_32 only with -XX:+TraceLinearScanLevel=4 Message-ID: [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some additional logging of intervals and its registers in various places. On 32-bit only, we could have two registers for an interval. A hi-register is only used when the interval has `_num_phys_regs` set to 2. In one such place ([L5448](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5448)), we log the hi-register `hint_regHi`. On [L5441](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5441), however, we can assign it an invalid register number when `_num_phys_regs` is 1. That was not a problem before JDK-8251093 as we only used `hint_regHi` later after a `_num_phys_regs == 2` check on [L5484](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5484). But the additional logging is performed earlier resulting in this assertion failure when trying to log the invalid `hint_regHi` register. Thanks, Christian ------------- Commit messages: - 8253588: C1: assert(false) failed: unknown register compiler on x86_32 only with -XX:+TraceLinearScanLevel=4 Changes: https://git.openjdk.java.net/jdk/pull/356/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=356&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253588 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/356.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/356/head:pull/356 PR: https://git.openjdk.java.net/jdk/pull/356 From jbhateja at openjdk.java.net Fri Sep 25 13:10:32 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 25 Sep 2020 13:10:32 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 16:39:15 GMT, Jatin Bhateja wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? Hi @vnkozlov , @neliasso Kindly let me know your feedback, If there are no more comments is it ok to integrate this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From neliasso at openjdk.java.net Fri Sep 25 15:42:13 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 25 Sep 2020 15:42:13 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 15:55:11 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. >> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. >> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop >> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block >> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. >> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user >> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy >> sizes above 4096 bytes. JMH Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8252847 : Modifying file permission to resolve jcheck failure. Looks good to me! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/61 From iveresov at openjdk.java.net Fri Sep 25 16:12:22 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 25 Sep 2020 16:12:22 GMT Subject: RFR: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. Message-ID: When running with ```-XX:TieredStopAtLevel={2|3}``` the policy tried to switch to OSR method of the same level if those are present, which caused constant deopting. The fix is to consider only higher levels for OSR switches. ------------- Commit messages: - Prevent switching to same level OSR. Changes: https://git.openjdk.java.net/jdk/pull/360/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=360&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253118 Stats: 9 lines in 1 file changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/360.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/360/head:pull/360 PR: https://git.openjdk.java.net/jdk/pull/360 From kvn at openjdk.java.net Fri Sep 25 19:07:17 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 25 Sep 2020 19:07:17 GMT Subject: RFR: 8252219: C2: Randomize IGVN worklist for stress testing [v8] In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 07:54:06 GMT, Roberto Castaneda Lozano wrote: >> Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each >> main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the >> seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as >> production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. > > Roberto Castaneda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull > request now contains 14 commits: > - Merge master > - Add missing @requires annotation to test case > - Empty commit to trigger jcheck after updating GitHub user name > - Generate random seed if 'StressSeed' is unset > - Use generic swap() for shuffling > - Add basic sanity test for stress IGVN options > - Fix typo > - Move shuffle() to PhaseIterGVN > - Define 'StressSeed' option as 'uint' rather than 'uintx' > - Apply minor rearrangements to simplify the patch > - ... and 4 more: https://git.openjdk.java.net/jdk/compare/37b70282...c2c31c3e Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From kvn at openjdk.java.net Fri Sep 25 21:08:50 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 25 Sep 2020 21:08:50 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 05:17:38 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7971: >> >>> 7969: BasicType type, int offset, bool use64byteVector) { >>> 7970: assert(MaxVectorSize >= 32, "vector length < 32"); >>> 7971: use64byteVector |= MaxVectorSize > 32 && AVX3Threshold == 0; >> >> When do you expect AVX3Threshold to be 0? > > As of now when user explicitly pass -XX:AVX3Threshold=0 , default value of AVX3Threshold is 4096. I don't like that you put special meaning on AVX3Threshold=0 and then have to add additional checks for it in places where you check its power of 2. And you don't check such setting in new tests. Actually checking for 0 and power of 2 should be done by flag's constraint. See CodeEntryAlignmentConstraintFunc as example. There is also this strange relation with MaxVectorSize. Also we should consider power level switch for 64 bytes AVX3 vectors. Does it make sense to use it if array length is small (< 4096 default)? ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From kvn at openjdk.java.net Fri Sep 25 21:08:49 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 25 Sep 2020 21:08:49 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 15:55:11 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. >> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. >> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop >> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block >> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. >> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user >> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy >> sizes above 4096 bytes. JMH Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8252847 : Modifying file permission to resolve jcheck failure. The main concern which is not clear in these changes is ZMM usage which will lower frequency and case performance regression for small arrays. That is why AVX3Threshold is set to 4096 bytes by default. Allowing and checking for 0 AVX3Threshold value contradicts that. Would be nice to have clear comment/explanation about that. I also propose to use Flag constraint() functionality for checking AVX3Threshold value instead of runtime checks everywhere. Separate RFE, please. src/hotspot/cpu/x86/assembler_x86.cpp line 2593: > 2591: > 2592: void Assembler::evmovdqu(XMMRegister dst, KRegister mask, Address src, int vector_len, int type) { > 2593: assert(VM_Version::supports_avx512vlbw(), ""); I suggest to add assert to these 2 new instruction to check 'type' value to make sure only expected types are passed. src/hotspot/cpu/x86/assembler_x86.cpp line 2596: > 2594: InstructionMark im(this); > 2595: bool wide = type == T_SHORT || type == T_LONG || type == T_CHAR; > 2596: bool bwinstr = type == T_BYTE || type == T_SHORT || type == T_CHAR; 'bwinstr' is used only once. You may as well directly set 'prefix' here. (Same in second instruction). src/hotspot/cpu/x86/assembler_x86.cpp line 2595: > 2593: assert(VM_Version::supports_avx512vlbw(), ""); > 2594: InstructionMark im(this); > 2595: bool wide = type == T_SHORT || type == T_LONG || type == T_CHAR; It looks strange but it is correct (I looked on existing evmovdqu* instructions). May be reorder - T_LONG last. Do you consider replacing existing evmovdqu* instructions with these two? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1264: > 1262: } > 1263: > 1264: #ifndef PRODUCT macroAssembler_x86.hpp become big. May be we should start thing about splitting arraycopy stubs into separate file. src/hotspot/cpu/x86/stubRoutines_x86.hpp line 36: > 34: enum platform_dependent_constants { > 35: code_size1 = 20000 LP64_ONLY(+10000), // simply increase if too small (assembler will crash if too small) > 36: code_size2 = 35300 LP64_ONLY(+21400) // simply increase if too small (assembler will crash if too small) This is big increase in size! src/hotspot/cpu/x86/vm_version_x86.cpp line 1167: > 1165: > 1166: if (!FLAG_IS_DEFAULT(AVX3Threshold)) { > 1167: if (AVX3Threshold != 0 && !is_power_of_2(AVX3Threshold)) { Consider flag's constraint() instead of runtime these checks. Separate RFE, please. src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.lir.amd64/src/org/graalvm/compiler/lir/amd64/AMD64ArrayCompareToOp.java line 87: > 85: super(TYPE); > 86: > 87: assert useAVX3Threshold == 0 || CodeUtil.isPowerOf2(useAVX3Threshold) : "AVX3Threshold must be power of 2"; You would need to upstream Graal changes. test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyConjoint.java line 33: > 31: * > 32: * @run main/othervm/timeout=600 -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions > 33: * -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=0 -XX:MaxVectorSize=32 -XX:+UnlockDiagnosticVMOptions ArrayCopyPartialInlineSize flag is not defiled in these changes. It seems they need to be included in 8252848 changes. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/61 From xliu at openjdk.java.net Sat Sep 26 06:29:46 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 26 Sep 2020 06:29:46 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent Message-ID: Node::dump(depth) indents 2 whitespaces for each level. The function isnot on until the depth of the ideal graph isnot greater than PrintIdealIndentThreshold (0 by default). ------------- Commit messages: - 8251464: make Node::dump(int depth) support indent Changes: https://git.openjdk.java.net/jdk/pull/371/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=371&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251464 Stats: 49 lines in 5 files changed: 26 ins; 0 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/371.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/371/head:pull/371 PR: https://git.openjdk.java.net/jdk/pull/371 From xxinliu at amazon.com Sat Sep 26 07:08:25 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Sat, 26 Sep 2020 07:08:25 +0000 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com>, <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> Message-ID: <1601104104594.5136@amazon.com> hi, Andrew, I remake this patch and publish it on github. https://github.com/openjdk/jdk/pull/371 Indeed, the previous output was a little messy. The problem was that I calculated _indent wrong in breadth-first traversal. Fixed in this revision. now it's strictly BFS order. https://bugs.openjdk.java.net/secure/attachment/90395/root.log This feature is intended to dump a small portion of ideal graph in the debugger. In that scenario, I think indentation does help my eyes. Thank you to share your experiences. yes, I tried it(sort -n file) and this tip is incredibly effective! The ordered nodes can help people to catch what they are looking for quickly. I don't want to break anybody's established workflow, so I reset the flag PrintIdealIndentThreshold to 0. It means node.dump won't use any indentation until we set it. https://openjdk.github.io/cr/?repo=jdk&pr=371&range=00#udiff-0 You guys must have some fancy gdb scripts, emacs lisp plugin or handy shell scripts etc. The only downside is they are personal arsenal and it may be not easy to maintain them sometime. On the other side of spectrum, starters like me need to bootstrap in long way. Sometimes, we need to reinvent the wheel. eg. I spent a lot of time to develop a function to query a node by idx. I finished it but eventually came across this handy function in node.cpp. that's what I did! // Call this from debugger with root node as default: Node* find_node(const int idx) { return Compile::current()->root()->find(idx); } That's why I'd like to put some debug functionalities to c2 codebase. I think we can collect those handy functions in an individual file. What do you think? I have another 2 candidates I plan to work on. 1. dump all node and sorted them by indices 2. dump a path from node a to node b. We can have a depth-first search along du or ud chains. thanks, --lx ________________________________________ From: Andrew Dinn Sent: Friday, September 4, 2020 2:27 AM To: Tobias Hartmann; Liu, Xin; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: [EXTERNAL] RFR: 8251464: make Node::dump(int depth) support indent CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Tobias/Xin On 04/09/2020 10:09, Tobias Hartmann wrote: > thanks for making these changes. That looks good to me. > But lets wait for some more opinions from other reviewers. I'll start by noting that this is not a review, merely user feedback. I use node dumps a lot when debugging C2 and, looking at the supplied example, I don't find the indentation helpful -- in fact, I actually find it slightly disrupts my reading the graph. Also, most of the time I feed graph dumps through a sort process so that nodes end up listed in id order, making it easier to track chains of links in both directions. That re-ordering makes the indentation much less useful. Of course, this is only a report of my way of working. I'm not against the patch per se, so long as it is easy to disable indentation (or have emacs remove it). regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From github.com+8792647+robcasloz at openjdk.java.net Mon Sep 28 06:48:25 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto Castaneda Lozano) Date: Mon, 28 Sep 2020 06:48:25 GMT Subject: Integrated: 8252219: C2: Randomize IGVN worklist for stress testing In-Reply-To: References: Message-ID: On Fri, 18 Sep 2020 10:28:28 GMT, Roberto Castaneda Lozano wrote: > Add `StressIGVN` option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each > main run of the IGVN loop. Also add `GenerateStressSeed` and `StressSeed=N` options to randomly generate or specify the > seed. In either case, the seed is logged if `LogCompilation` is enabled. The new options are declared as > production+diagnostic for consistency with the existing `StressLCM` and `StressGCM` options. This pull request has now been integrated. Changeset: fed3636f Author: Roberto Castaneda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/fed3636f Stats: 229 lines in 10 files changed: 224 ins; 0 del; 5 mod 8252219: C2: Randomize IGVN worklist for stress testing Add 'StressIGVN' option to let C2 randomize IGVN worklist order. When enabled, the worklist is shuffled before each main run of the IGVN loop. Also add 'StressSeed=N' option to specify the seed. If the seed is not specified, a random one is generated. In either case, the seed is logged if 'LogCompilation' is enabled. The new options are declared as production+diagnostic for consistency with the existing 'StressLCM' and 'StressGCM' options. Reviewed-by: kvn, chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/242 From adinn at redhat.com Mon Sep 28 09:07:49 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 28 Sep 2020 10:07:49 +0100 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <1601104104594.5136@amazon.com> References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> <1601104104594.5136@amazon.com> Message-ID: Hi Xin Liu, On 26/09/2020 08:08, Liu, Xin wrote: > This feature is intended to dump a small portion of ideal graph in > the debugger. In that scenario, I think indentation does help my > eyes. Yes, I think it is most useful when dumping subgraphs at depths between 2 and, say, 5. > Thank you to share your experiences. yes, I tried it(sort -n file) > and this tip is incredibly effective! The ordered nodes can help > people to catch what they are looking for quickly. Good. I'm glad sharing my working practice was useful to at least one person :-) > I don't want to break anybody's established workflow, so I reset the > flag PrintIdealIndentThreshold to 0. It means node.dump won't use any > indentation until we set it. > https://openjdk.github.io/cr/?repo=jdk&pr=371&range=00#udiff-0 That's very considerate of you. Thank you. > You guys must have some fancy gdb scripts, emacs lisp plugin or handy > shell scripts etc. The only downside is they are personal arsenal and > it may be not easy to maintain them sometime. On the other side of > spectrum, starters like me need to bootstrap in long way. Sometimes, I'm afraid I don't have a lot more to share. Mostly, I don't write scripts. I usually just write the dump output to a file and then process it with bash, sed and awk code written on the command line. I have written extensive elisp search and formatting functions in the past but not for parsing ideal graphs. > we need to reinvent the wheel. eg. I spent a lot of time to develop a > function to query a node by idx. I finished it but eventually came > across this handy function in node.cpp. that's what I did! > > // Call this from debugger with root node as default: Node* > find_node(const int idx) { return > Compile::current()->root()->find(idx); } Well, now you have taught me something I didn't know in return. Thanks for sharing ;-) > That's why I'd like to put some debug functionalities to c2 codebase. > I think we can collect those handy functions in an individual file. > What do you think? I have another 2 candidates I plan to work on. 1. > dump all node and sorted them by indices 2. dump a path from node a > to node b. We can have a depth-first search along du or ud chains. Option 1 is easily achieved by writing the graph dump to file and passing through sort -n so it's not a great step forward. Option 2 sounds like it would be more useful. Initially I was wondering what you would do when there are multiple paths. then I thought perhaps the command ought to list all paths in some well-defined order? That would make the case where the nodes are not connected uniform with the cases where there is one or more path i.e. print 0 paths, 1 path, 2 paths etc regards, Andrew Dinn ----------- From jbhateja at openjdk.java.net Mon Sep 28 12:21:01 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 28 Sep 2020 12:21:01 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v6] In-Reply-To: References: Message-ID: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847 : Review comments resolution ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/61/files - new: https://git.openjdk.java.net/jdk/pull/61/files/78c4fe73..2a606276 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=61&range=04-05 Stats: 493 lines in 9 files changed: 264 ins; 200 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/61.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/61/head:pull/61 PR: https://git.openjdk.java.net/jdk/pull/61 From joserz at linux.ibm.com Mon Sep 28 12:35:27 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Mon, 28 Sep 2020 09:35:27 -0300 Subject: 8230664: Fix TestInstanceKlassSize for PowerPC Message-ID: <20200928123527.GB6445@pacoca> Hello team! This is an attempt to fix bug JDK-8230664. Please, could you review it? PR: https://github.com/openjdk/jdk/pull/358 Bug: https://bugs.openjdk.java.net/browse/JDK-8230664 Thank you very much, Jose R Ziviani From martin.doerr at sap.com Mon Sep 28 18:11:16 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 28 Sep 2020 18:11:16 +0000 Subject: 8230664: Fix TestInstanceKlassSize for PowerPC In-Reply-To: <20200928123527.GB6445@pacoca> References: <20200928123527.GB6445@pacoca> Message-ID: Hi, @Jose: Thanks for finding this issue. @all: The problem is that the implementations in the following files don't fit together: src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java: public static boolean shouldStoreFingerprint() public boolean hasStoredFingerprint() src/hotspot/share/oops/instanceKlass.cpp: bool InstanceKlass::should_store_fingerprint(bool is_hidden_or_anonymous) bool InstanceKlass::has_stored_fingerprint() That breaks it on PPC64. Why do we require a fingerprint when built with "#if INCLUDE_AOT" and Arguments::is_dumping_archive(), but UseAOT is off? I could live with Jose's proposal, but a consistent implementation would be much better. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of joserz at linux.ibm.com > Sent: Montag, 28. September 2020 14:35 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Langer, Christoph > Subject: 8230664: Fix TestInstanceKlassSize for PowerPC > > Hello team! > > This is an attempt to fix bug JDK-8230664. Please, could you review it? > > PR: https://github.com/openjdk/jdk/pull/358 > Bug: https://bugs.openjdk.java.net/browse/JDK-8230664 > > Thank you very much, > > Jose R Ziviani From iignatyev at openjdk.java.net Mon Sep 28 19:53:14 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 28 Sep 2020 19:53:14 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures In-Reply-To: <8n8KvhWjyaRXASTq32-P5KXojlOFB2-JIvPTu88oJRg=.b32b5bd1-5ddc-43a6-bcf0-18b1bc24de26@github.com> References: <8n8KvhWjyaRXASTq32-P5KXojlOFB2-JIvPTu88oJRg=.b32b5bd1-5ddc-43a6-bcf0-18b1bc24de26@github.com> Message-ID: <4IaWMleIYgfWTF-WSCnPHVkUDs8EOInmn8zQlMdOkDM=.0f227f79-7419-4b29-a70a-778e11726968@github.com> On Thu, 10 Sep 2020 12:29:05 GMT, Evgeny Nikitin wrote: > Responding to the comments from pre-Skara thread: > > > test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java: > > I'd prefer invokeAndCompareArrays and invokeAndCheck to be as close as possible: have both of them to accept either > > boolean or Object as 2nd arg; print/throw the same error message > > the invokeAndCheck is very generic, it can be called with different objects and expect any kind of result, not only > boolean. Therefore its output format radically differs from what an array-comparator should show. I am not sure I understand what you mean... 1. granted you can't change `invokeAndCheck`'s 2nd argument to bool as there are other values being passed, but you can change `invokeAndCompareArrays` to accept an `Object` and compare expected and actual values by `Object::equals`. 2. even if you can't change output of these two methods to be the same (which I so far failed to see why), you still can change `invokeAndCheck`'s `message` var to include actual and expected values in the same way as `invokeAndCompareArrays` does. > > maybe I'm missing smth, but I don't understand why ArrayCodec supports only char and byte arrays; and hence I don't > > understand why you need ArrayCodec::of methods, as you can simply do new > > ArrayCoded(Arrays.stream(a).collect(Collectors.toList()) where a is an array of any type > > for Object arrays, one can use that. > for integer primitives one needs Arrays.stream(a).boxed.collect(Collectors.toList()), please note 'boxed' - it is > required and not generic. for bytes or chars, there is none (no overload methos in the Arrays.stream(a)); > To sum up, I can't see how with the given type system and utilities set I can make in a better, less wordy way. I've > added int and long overloads, support for String and Object arrays to make it more complete. you don't need `ArrayCodec::of(Object array)` anymore, do you? > > it seems that ArrayCodec should be an inner static class of ArrayDiff > > I would argue that - I find it useful for printing arrays (and this usage has been utilised in the > TestStringIntrinsics.java). In addition, I dont' like the practice of making such huge classes an inner classes as this > reduces readability and modularity. oki. > Other issues have been fixed. I added support for int, long, Object and String arrays. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From iignatyev at openjdk.java.net Mon Sep 28 20:03:18 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 28 Sep 2020 20:03:18 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 12:20:05 GMT, Evgeny Nikitin wrote: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Changes requested by iignatyev (Reviewer). test/lib-test/jdk/test/lib/format/ArrayDiffTest.java line 106: > 104: "[122]%n" + > 105: "[ 7, 8, 9, 10, 125, 12, 13]%n" + > 106: " ^^^"); as far as I remember, the commonly used practice is to align all these lines. (there are other places w/ the same "problem") test/lib/jdk/test/lib/format/ArrayCodec.java line 260: > 258: if (delta > 0) { > 259: element = Format.paddingForWidth(delta) + element; > 260: } wrong indent test/lib/jdk/test/lib/format/Format.java line 109: > 107: */ > 108: public static String paddingForWidth(int width) { > 109: return new String(" ").repeat(width); why not just? Suggestion: return " ".repeat(width); test/lib/jdk/test/lib/format/ArrayDiff.java line 83: > 81: public static class Defaults { > 82: final static int WIDTH = 80; > 83: final static int CONTEXT_BEFORE = 2; either these constants should be `public`, or `Defaults` class should be `package-private`, otherwise, you get a public class w/ no public fields. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From jrose at openjdk.java.net Mon Sep 28 20:50:08 2020 From: jrose at openjdk.java.net (John R Rose) Date: Mon, 28 Sep 2020 20:50:08 GMT Subject: RFR: 8223051: support loops with long (64b) trip counts In-Reply-To: References: Message-ID: On Wed, 23 Sep 2020 09:08:59 GMT, Roland Westrelin wrote: > Last webrev was: > > https://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > This PR includes a few minor changes: > > - The change in callnode.cpp that Vladimir requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html > > - Extra comments that John requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html > > - A couple extra counters to collect more detailed statistics > > 8252696 (Loop unswitching may cause out of bound array load to be > executed) was the only bug that was uncovered by extended testing and > it's fixed now. > > This was previously reviewed by Tobias, Vladimir and John. Given the > last changes were either requested by reviewers or a straighforward > improvement to statistics, and unless someone objects, I intend to > push this in the next few days with the reviewer list I just > mentioned. Good to go. Thanks for patiently working all the issues. ------------- Marked as reviewed by jrose (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/318 From jiefu at openjdk.java.net Mon Sep 28 23:55:35 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 28 Sep 2020 23:55:35 GMT Subject: RFR: 8253748: StressIGV tests fail with release VMs Message-ID: The following StressIGV tests fails with release VMs - compiler/arguments/TestStressIGVNOptions.java - compiler/debug/TestGenerateStressSeed.java - compiler/debug/TestStressIGVN.java The reason is that VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. The fix adds '-XX:+UnlockDiagnosticVMOptions' option. And compiler/debug/TestStressIGVN.java should be only available for debug VMs since it depends on -XX:+TraceIterativeGVN. ------------- Commit messages: - 8253748: StressIGV tests fail with release VMs Changes: https://git.openjdk.java.net/jdk/pull/390/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=390&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253748 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/390/head:pull/390 PR: https://git.openjdk.java.net/jdk/pull/390 From luhenry at openjdk.java.net Tue Sep 29 04:52:29 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Tue, 29 Sep 2020 04:52:29 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis Message-ID: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet available. For example, for Windows-AArch64 and macOS-AArch64. For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler feature. ------------- Commit messages: - 8253757: Add LLVM-based backend for hsdis Changes: https://git.openjdk.java.net/jdk/pull/392/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253757 Stats: 534 lines in 3 files changed: 534 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392 From thartmann at openjdk.java.net Tue Sep 29 06:03:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 29 Sep 2020 06:03:10 GMT Subject: RFR: 8253748: StressIGV tests fail with release VMs In-Reply-To: References: Message-ID: On Mon, 28 Sep 2020 23:47:48 GMT, Jie Fu wrote: > The following StressIGV tests fails with release VMs > - compiler/arguments/TestStressIGVNOptions.java > - compiler/debug/TestGenerateStressSeed.java > - compiler/debug/TestStressIGVN.java > > The reason is that VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. > > The fix adds '-XX:+UnlockDiagnosticVMOptions' option. > And compiler/debug/TestStressIGVN.java should be only available for debug VMs since it depends > on -XX:+TraceIterativeGVN. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/390 From jbhateja at openjdk.java.net Tue Sep 29 06:04:17 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 29 Sep 2020 06:04:17 GMT Subject: RFR: 8253721: Flag -XX:AVX3Threshold does not accept Zero value Message-ID: AVX3Threshold has been used in various places to enable emitting AVX3 specific instructions in case data size being worked over is greater than 4096 bytes. However, user is free to set the threshold value to Zero based on his workload. In such a case a compile time check is enough to trigger generation of AVX3 instructions. In other cases comparison is done at the runtime though JITed comparison instruction. Patch allows setting AVX3Threshold to a zero value. ------------- Commit messages: - 8253721: Flag -XX:AVX3Threshold does not accept Zero value. Changes: https://git.openjdk.java.net/jdk/pull/394/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=394&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253721 Stats: 23 lines in 7 files changed: 13 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/394/head:pull/394 PR: https://git.openjdk.java.net/jdk/pull/394 From jbhateja at openjdk.java.net Tue Sep 29 06:05:53 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 29 Sep 2020 06:05:53 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 20:52:28 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8252847 : Modifying file permission to resolve jcheck failure. > > test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyConjoint.java line 33: > >> 31: * >> 32: * @run main/othervm/timeout=600 -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions >> 33: * -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=0 -XX:MaxVectorSize=32 -XX:+UnlockDiagnosticVMOptions > > ArrayCopyPartialInlineSize flag is not defiled in these changes. > It seems they need to be included in 8252848 changes. Hi @vnkozlov, I have updated the pull request to cover your comments. Kindly review. New RFE JDK-8253721 has been created for AVX3Threshold flag related changes (PR-394). ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From jiefu at openjdk.java.net Tue Sep 29 06:41:12 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 29 Sep 2020 06:41:12 GMT Subject: RFR: 8253748: StressIGV tests fail with release VMs In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 06:00:42 GMT, Tobias Hartmann wrote: >> The following StressIGV tests fails with release VMs >> - compiler/arguments/TestStressIGVNOptions.java >> - compiler/debug/TestGenerateStressSeed.java >> - compiler/debug/TestStressIGVN.java >> >> The reason is that VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. >> >> The fix adds '-XX:+UnlockDiagnosticVMOptions' option. >> And compiler/debug/TestStressIGVN.java should be only available for debug VMs since it depends >> on -XX:+TraceIterativeGVN. > > Looks good and trivial. Thanks @TobiHartmann for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/390 From jiefu at openjdk.java.net Tue Sep 29 06:41:13 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 29 Sep 2020 06:41:13 GMT Subject: Integrated: 8253748: StressIGV tests fail with release VMs In-Reply-To: References: Message-ID: On Mon, 28 Sep 2020 23:47:48 GMT, Jie Fu wrote: > The following StressIGV tests fails with release VMs > - compiler/arguments/TestStressIGVNOptions.java > - compiler/debug/TestGenerateStressSeed.java > - compiler/debug/TestStressIGVN.java > > The reason is that VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. > > The fix adds '-XX:+UnlockDiagnosticVMOptions' option. > And compiler/debug/TestStressIGVN.java should be only available for debug VMs since it depends > on -XX:+TraceIterativeGVN. This pull request has now been integrated. Changeset: 9c17a35e Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/9c17a35e Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod 8253748: StressIGV tests fail with release VMs Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/390 From xliu at openjdk.java.net Tue Sep 29 07:05:01 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 29 Sep 2020 07:05:01 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: On Tue, 29 Sep 2020 04:36:16 GMT, Ludovic Henry wrote: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. src/utils/hsdis-llvm/README line 88: > 86: 2. change Makefile so that `LLVM_PRE` points to `llvm-project/llvm` from the previous step > 87: > 88: 3. `make` should build `hsdis-aarch64.dylib` Shall we have a Makefile in this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From aph at openjdk.java.net Tue Sep 29 09:24:46 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 29 Sep 2020 09:24:46 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: On Thu, 17 Sep 2020 06:03:57 GMT, Ard Biesheuvel wrote: >> @ardbiesheuvel : Ard, could you please ack this patch? Thanks. > > Acked-by: Ard Biesheuvel > If this feature is not auto-enabled when the SHA3 hardware feature is there, we will have one failure for the following > test: test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > 15 #-----testresult----- > 16 > description=file:/home/yangfei/github/jdk/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > 17 elapsed=31546 0:00:31.546 18 end=Mon Sep 21 10:27:58 CST 2020 > 19 environment=regtest > 20 execStatus=Failed. Execution failed: `main' threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is > expected to have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default > Any suggestions for this? I don't understand your question. There should be two acceptable results, either "Pass" or "Not supported". What else is possible? ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 29 10:15:11 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 29 Sep 2020 10:15:11 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes Message-ID: Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. ------------- Commit messages: - 8253636: C2: Adjust NodeClasses::_max_classes Changes: https://git.openjdk.java.net/jdk/pull/397/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=397&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253636 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/397.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/397/head:pull/397 PR: https://git.openjdk.java.net/jdk/pull/397 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 29 10:15:11 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 29 Sep 2020 10:15:11 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 09:57:38 GMT, Roberto Casta?eda Lozano wrote: > Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect > that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. Update NodeClasses::_max_classes to the max class id within the enumeration. Update comment and assertion to reflect that NodeClasses uses now 32 bits after the addition of Opaque1 in JDK-8229495. ------------- PR: https://git.openjdk.java.net/jdk/pull/397 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 29 10:15:11 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 29 Sep 2020 10:15:11 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: <_uU2t38hZhRUsfJUdK0EkWxGikZ9V_inROSA3p1d3ng=.192716ae-56bf-4897-a4f9-4d9fdc0fd7b6@github.com> On Tue, 29 Sep 2020 10:08:34 GMT, Roberto Casta?eda Lozano wrote: >> Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect >> that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. > > Update NodeClasses::_max_classes to the max class id within the > enumeration. Update comment and assertion to reflect that NodeClasses uses now > 32 bits after the addition of Opaque1 in JDK-8229495. Tested on hs-tier1-3, both debug and release. ------------- PR: https://git.openjdk.java.net/jdk/pull/397 From phedlin at openjdk.java.net Tue Sep 29 10:20:25 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 29 Sep 2020 10:20:25 GMT Subject: RFR: 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). Message-ID: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> This is just removing some dead/unused code. More importantly, this is a test of the new "issue create" support in Skara. ------------- Commit messages: - Deleting unused pipe_class definitions in adl-file (x86_64.ad). Changes: https://git.openjdk.java.net/jdk/pull/74/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=74&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253768 Stats: 30 lines in 1 file changed: 0 ins; 30 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/74.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/74/head:pull/74 PR: https://git.openjdk.java.net/jdk/pull/74 From phedlin at openjdk.java.net Tue Sep 29 10:20:26 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 29 Sep 2020 10:20:26 GMT Subject: RFR: 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). In-Reply-To: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> References: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> Message-ID: On Tue, 8 Sep 2020 13:05:54 GMT, Patric Hedlin wrote: > This is just removing some dead/unused code. More importantly, this is a test of the new "issue create" support in > Skara. Deleting unused pipe_class definitions in adl-file (x86_64.ad). ------------- PR: https://git.openjdk.java.net/jdk/pull/74 From neliasso at openjdk.java.net Tue Sep 29 11:16:38 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 29 Sep 2020 11:16:38 GMT Subject: RFR: 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). In-Reply-To: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> References: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> Message-ID: <-2asVmhWS_z1NXuuSyc3ZlPMtdHETF6a1M177dkwyBs=.50a805a0-51cc-4e1d-bc27-6231e03caf80@github.com> On Tue, 8 Sep 2020 13:05:54 GMT, Patric Hedlin wrote: > This is just removing some dead/unused code. More importantly, this is a test of the new "issue create" support in > Skara. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/74 From neliasso at openjdk.java.net Tue Sep 29 11:28:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 29 Sep 2020 11:28:12 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 09:57:38 GMT, Roberto Casta?eda Lozano wrote: > Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect > that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/397 From github.com+8792647+robcasloz at openjdk.java.net Tue Sep 29 11:32:37 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 29 Sep 2020 11:32:37 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 11:25:34 GMT, Nils Eliasson wrote: > Looks good! Thanks for reviewing, Nils. ------------- PR: https://git.openjdk.java.net/jdk/pull/397 From phedlin at openjdk.java.net Tue Sep 29 11:56:32 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 29 Sep 2020 11:56:32 GMT Subject: RFR: 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). In-Reply-To: <-2asVmhWS_z1NXuuSyc3ZlPMtdHETF6a1M177dkwyBs=.50a805a0-51cc-4e1d-bc27-6231e03caf80@github.com> References: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> <-2asVmhWS_z1NXuuSyc3ZlPMtdHETF6a1M177dkwyBs=.50a805a0-51cc-4e1d-bc27-6231e03caf80@github.com> Message-ID: On Tue, 29 Sep 2020 11:08:30 GMT, Nils Eliasson wrote: >> This is just removing some dead/unused code. More importantly, this is a test of the new "issue create" support in >> Skara. > > Looks good. Thanks for reviewing Nils. ------------- PR: https://git.openjdk.java.net/jdk/pull/74 From enikitin at openjdk.java.net Tue Sep 29 12:45:09 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 29 Sep 2020 12:45:09 GMT Subject: Integrated: 8253607: [mlvm] meth/func/jdi/breakpointOtherStratum: un-problemlist and add randomness keyword In-Reply-To: References: Message-ID: On Thu, 24 Sep 2020 19:30:35 GMT, Evgeny Nikitin wrote: > Created as a replacement for the #309 (a new issue has been opened). > Pre-Scara thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039557.html). > > I tried to reproduce the test failure multiple times with different VM parameters, but it always passes. I suggest > removing it from ProblemList.txt. > Second change is marking the test with randomness keyword from the JDK-8243427 (using reproducible random for mlvm > tests). > Tested using mach5 on the 4 platforms, 50 runs each. This pull request has now been integrated. Changeset: 6e5d4f33 Author: Evgeny Nikitin Committer: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/6e5d4f33 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod 8253607: [mlvm] meth/func/jdi/breakpointOtherStratum: un-problemlist and add randomness keyword Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/345 From kvn at openjdk.java.net Tue Sep 29 16:51:34 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 29 Sep 2020 16:51:34 GMT Subject: RFR: 8253721: Flag -XX:AVX3Threshold does not accept Zero value In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 05:53:39 GMT, Jatin Bhateja wrote: > AVX3Threshold has been used in various places to enable emitting AVX3 specific instructions in case data size being > worked over is greater than 4096 bytes. > However, user is free to set the threshold value to Zero based on his workload. > In such a case a compile time check is enough to trigger generation of AVX3 instructions. In other cases comparison is > done at the runtime though JITed comparison instruction. > Patch allows setting AVX3Threshold to a zero value. Looks good. Thank you for ding this. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/394 From thartmann at openjdk.java.net Tue Sep 29 16:58:00 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 29 Sep 2020 16:58:00 GMT Subject: RFR: 8253721: Flag -XX:AVX3Threshold does not accept Zero value In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 05:53:39 GMT, Jatin Bhateja wrote: > AVX3Threshold has been used in various places to enable emitting AVX3 specific instructions in case data size being > worked over is greater than 4096 bytes. > However, user is free to set the threshold value to Zero based on his workload. > In such a case a compile time check is enough to trigger generation of AVX3 instructions. In other cases comparison is > done at the runtime though JITed comparison instruction. > Patch allows setting AVX3Threshold to a zero value. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/394 From dnsimon at openjdk.java.net Tue Sep 29 17:28:24 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 29 Sep 2020 17:28:24 GMT Subject: RFR: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor Message-ID: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> This change prevents a call to `CompilerToVM.resolveMethod` with an argument representing constructor. Such a call triggers an assertion in a fastdebug VM. ------------- Commit messages: - 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor Changes: https://git.openjdk.java.net/jdk/pull/407/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=407&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252881 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/407.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/407/head:pull/407 PR: https://git.openjdk.java.net/jdk/pull/407 From neliasso at openjdk.java.net Tue Sep 29 20:27:59 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 29 Sep 2020 20:27:59 GMT Subject: RFR: 8253822: Remove unused exception_address_is_unpack_entry Message-ID: I have searched the code base without finding any use. Please review, Best regards, Nils ------------- Commit messages: - Remove unused exception_address_is_unpack_entry Changes: https://git.openjdk.java.net/jdk/pull/410/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=410&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253822 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/410.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/410/head:pull/410 PR: https://git.openjdk.java.net/jdk/pull/410 From mdoerr at openjdk.java.net Tue Sep 29 20:57:24 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 29 Sep 2020 20:57:24 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags Message-ID: Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. Most platform specific flags were never intended for official support. They are only there to diagnose issues and find workarounds. So flag kind "diagnostic" fits better for them. Note that I rearranged a couple of lines when looking at the diff. My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 ------------- Commit messages: - 8253690: [PPC64] Use flag kind diagnostic for platform specific flags Changes: https://git.openjdk.java.net/jdk/pull/413/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=413&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253690 Stats: 36 lines in 1 file changed: 11 ins; 10 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/413/head:pull/413 PR: https://git.openjdk.java.net/jdk/pull/413 From mdoerr at openjdk.java.net Tue Sep 29 20:59:53 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 29 Sep 2020 20:59:53 GMT Subject: RFR: 8253689: [s390] Use flag kind diagnostic for platform specific flags Message-ID: Current platform implementation (globals_s390.hpp) uses regular product flags for everything. These platform specific flags were never intended for official support. They are only there to diagnose issues and find workarounds. So flag kind "diagnostic" fits better. CSR: https://bugs.openjdk.java.net/browse/JDK-8253691 ------------- Commit messages: - 8253689: [s390] Use flag kind diagnostic for platform specific flags Changes: https://git.openjdk.java.net/jdk/pull/414/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=414&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253689 Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/414.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/414/head:pull/414 PR: https://git.openjdk.java.net/jdk/pull/414 From valeriep at openjdk.java.net Tue Sep 29 22:25:24 2020 From: valeriep at openjdk.java.net (Valerie Peng) Date: Tue, 29 Sep 2020 22:25:24 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 09:22:25 GMT, Andrew Haley wrote: >> Acked-by: Ard Biesheuvel > >> If this feature is not auto-enabled when the SHA3 hardware feature is there, we will have one failure for the following >> test: test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java >> 15 #-----testresult----- >> 16 >> description=file:/home/yangfei/github/jdk/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java >> 17 elapsed=31546 0:00:31.546 18 end=Mon Sep 21 10:27:58 CST 2020 >> 19 environment=regtest >> 20 execStatus=Failed. Execution failed: `main' threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is >> expected to have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default >> Any suggestions for this? > > I don't understand your question. There should be two acceptable results, either "Pass" or "Not supported". What else > is possible? I have looked at the java security changes, i.e. src/java.base/share/classes/sun/security/provider/SHA3.java. It looks fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From iignatyev at openjdk.java.net Wed Sep 30 00:00:36 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 00:00:36 GMT Subject: RFR: 8238737: remove DeoptimizeAllClassesRate from CTW library Message-ID: Hi all, after [8238247 ](https://bugs.openjdk.java.net/browse/JDK-8238247), `DeoptimizeAllClassesRate` property became completely useless, and as I mentioned in the [review](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036927.html), I didn't (and still don't) have any idea why we added this at the first place. I have looked into the old implementation of CTW (one in `/src/share/vm/classfile/classLoader.cpp`) both at the time Java impl. was added by [8012447](https://bugs.openjdk.java.net/browse/JDK-8012447) and the old impl. was removed by [8214917](https://bugs.openjdk.java.net/browse/JDK-8214917), and there is nothing that would suggest that we had/needed `DeoptimizeAllClassesRate`, it's not and has never been used by any of our tests, so I'm going to remove it. ------------- Commit messages: - 8238737: remove DeoptimizeAllClassesRate from CTW library Changes: https://git.openjdk.java.net/jdk/pull/418/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=418&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8238737 Stats: 14 lines in 2 files changed: 0 ins; 12 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/418.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/418/head:pull/418 PR: https://git.openjdk.java.net/jdk/pull/418 From luhenry at openjdk.java.net Wed Sep 30 00:58:41 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 30 Sep 2020 00:58:41 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Merge LLVM backend into hsdis and add Makefile support ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/392/files - new: https://git.openjdk.java.net/jdk/pull/392/files/4787a545..9127eb20 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=00-01 Stats: 1881 lines in 7 files changed: 757 ins; 1117 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392 From luhenry at openjdk.java.net Wed Sep 30 00:58:43 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 30 Sep 2020 00:58:43 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: On Tue, 29 Sep 2020 04:36:16 GMT, Ludovic Henry wrote: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. @navyxliu I've merged the sources into `src/utils/hsdis` and added support to build it in the Makefile. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From jiefu at openjdk.java.net Wed Sep 30 03:36:11 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 30 Sep 2020 03:36:11 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> On Tue, 29 Sep 2020 22:00:04 GMT, Erik Joelsson wrote: >> This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was >> used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the >> title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html >> >> If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a >> pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code >> has changed, except for that related to merging. > > Build changes look ok. Hi @PaulSandoz , This integration seems to miss https://github.com/openjdk/panama-vector/pull/1, which had fixed crashes on AVX512 machines. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From jbhateja at openjdk.java.net Wed Sep 30 05:12:01 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 30 Sep 2020 05:12:01 GMT Subject: Integrated: 8253721: Flag -XX:AVX3Threshold does not accept Zero value In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 05:53:39 GMT, Jatin Bhateja wrote: > AVX3Threshold has been used in various places to enable emitting AVX3 specific instructions in case data size being > worked over is greater than 4096 bytes. > However, user is free to set the threshold value to Zero based on his workload. > In such a case a compile time check is enough to trigger generation of AVX3 instructions. In other cases comparison is > done at the runtime though JITed comparison instruction. > Patch allows setting AVX3Threshold to a zero value. This pull request has now been integrated. Changeset: ac02afe9 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/ac02afe9 Stats: 23 lines in 7 files changed: 13 ins; 10 del; 0 mod 8253721: Flag -XX:AVX3Threshold does not accept Zero value Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/394 From roland at openjdk.java.net Wed Sep 30 06:46:29 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 30 Sep 2020 06:46:29 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors Message-ID: The code pattern in the test case is optimized as a trichotomy which is wrong given SubTypeCheckNode is a special kind of CmpNode that's not commutative. ------------- Commit messages: - comment - test - trichotomy opt should not be applied to subtype check Changes: https://git.openjdk.java.net/jdk/pull/422/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=422&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253566 Stats: 71 lines in 2 files changed: 70 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/422/head:pull/422 PR: https://git.openjdk.java.net/jdk/pull/422 From stuefe at openjdk.java.net Wed Sep 30 06:54:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 30 Sep 2020 06:54:58 GMT Subject: RFR: 8253689: [s390] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:52:17 GMT, Martin Doerr wrote: > Current platform implementation (globals_s390.hpp) uses regular product flags for everything. > These platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better. > > CSR: https://bugs.openjdk.java.net/browse/JDK-8253691 Hi Martin, looks simple enough. Make sure that all scripts and tests (also our internal? eg benchmarks) now pass -XX:+UnlockDiagnosticVMOptions. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/414 From stuefe at openjdk.java.net Wed Sep 30 07:01:49 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 30 Sep 2020 07:01:49 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:49:01 GMT, Martin Doerr wrote: > Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. > Most platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better for them. > > Note that I rearranged a couple of lines when looking at the diff. > My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 Hi Martin, make sure you check up on the places the switches are used and pass UnlockDiagnosticVMOptions. Cheers, Thomas src/hotspot/cpu/ppc/globals_ppc.hpp line 95: > 93: \ > 94: /* Power 8: Configure Data Stream Control Register. */ \ > 95: product(uint64_t, DSCR_PPC64, (uint64_t)-1, \ Has nothing to do with this issue. But I leave it up to you. src/hotspot/cpu/ppc/globals_ppc.hpp line 116: > 114: \ > 115: /* special instructions */ \ > 116: product(bool, SuperwordUseVSX, false, \ Why leave this one out? src/hotspot/cpu/ppc/globals_ppc.hpp line 150: > 148: \ > 149: product(bool, ZapMemory, false, "Write 0x0101... to empty memory." \ > 150: " Use this to ease debugging.") \ Future cleanup: this feels like it should be in shared code. The usual way to do this is to zap in DEBUG. ------------- PR: https://git.openjdk.java.net/jdk/pull/413 From xliu at openjdk.java.net Wed Sep 30 07:12:31 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 30 Sep 2020 07:12:31 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> Message-ID: On Wed, 30 Sep 2020 00:58:41 GMT, Ludovic Henry wrote: >> When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet >> available. For example, for Windows-AArch64 and macOS-AArch64. >> For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler >> feature. > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Merge LLVM backend into hsdis and add Makefile support src/utils/hsdis/hsdis.cpp line 79: > 77: > 78: #ifndef bool > 79: #define bool int if we switch to cpp, do you still this? ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From chagedorn at openjdk.java.net Wed Sep 30 07:25:15 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 30 Sep 2020 07:25:15 GMT Subject: RFR: 8253822: Remove unused exception_address_is_unpack_entry In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:14:25 GMT, Nils Eliasson wrote: > I have searched the code base without finding any use. > > Please review, > > Best regards, > Nils Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/410 From xliu at openjdk.java.net Wed Sep 30 07:29:52 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 30 Sep 2020 07:29:52 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> Message-ID: <76F67wutfIYeeKdNWfpWUd0EsZSbDDVJm-3bqTixFRE=.c4e31f33-272e-43d2-8bf2-af510634489a@github.com> On Wed, 30 Sep 2020 00:58:41 GMT, Ludovic Henry wrote: >> When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet >> available. For example, for Windows-AArch64 and macOS-AArch64. >> For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler >> feature. > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Merge LLVM backend into hsdis and add Makefile support src/utils/hsdis/Makefile line 198: > 196: $(TARGET_DIR)/libiberty/libiberty.a > 197: else > 198: LIBRARIES/amd64 = LLVMX86Disassembler LLVMX86AsmParser LLVMX86CodeGen LLVMCFGuard LLVMGlobalISel LLVMSelectionDAG \ To disassemble code, I don't think we have to link so many libraries. It looks like code only explicitly depends LLVMMCDisassembler and LLVMTarget here. If we do need to link those libraries, how about we just use `llvm-config --libs`. If we declare so many names here, the Makefile is subject to LLVM. In history, LLVM refactored a lot. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From chagedorn at openjdk.java.net Wed Sep 30 07:52:12 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 30 Sep 2020 07:52:12 GMT Subject: RFR: 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected Message-ID: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> In the testcase, we hit a dead data loop while dead nodes are being removed on a dead control path. A region node, lets say `r`, that represents a loop (has three inputs: self, a loop entry and backedge) but is not a loop node, yet, becomes dead when its entry control is replaced by top in the first IGVN run after parsing. All its phi nodes also become dead by replacing the corresponding entry control input by top. The problem is now that some phi nodes of `r` are processed by IGVN before the corresponding (dead) region node `r`. In `PhiNode::Ideal`, we actually check if there is a dead loop. But after some of the phis of `r` were already removed, `is_unsafe_data_reference()` on [L1939](https://github.com/openjdk/jdk/compare/master...chhagedorn:JDK-8251544#diff-efe6b3bde157b833249cd9a8d8b6645bL1939) returns false. As a result, we do not realize in `PhiNode::Ideal` for one of the remaining phis that it is actually dead and we apply the normal propagation in `PhiNode::Identity` which replaces the phi by its only non-top input. We later apply an additional optimization for a `LoadNode` input of an `AddINode` in which we replace the `LoadNode` by the `AddINode` itself (because of the data loop) and we end up with a dead data loop and fail with the dead loop assertion. The order in which the nodes are processed in IGVN is crucial. I could only reproduce this bug with a very specific CTW-like testcase which makes this quite an edge case. I can think of two ways ways how to fix this: 1. Delay phi nodes which have only one non-top input left and whose region node is not a loop node, has three inputs from which the entry control is top and the region has not been processed by IGVN, yet. 2. Extend the dead data loop check in `PhiNode::Ideal()` to already do an unreachable region check as done in `RegionNode::Ideal()`. The result can be cached as a region should not become reachable anymore once we figured out it is dead. I chose the second approach because I think it is preferable as we are not delaying IGVN and all other phi nodes can already use the information of a dead region before it is processed. This avoid any further unwanted optimizations on dead nodes. Thanks, Christian ------------- Commit messages: - 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected Changes: https://git.openjdk.java.net/jdk/pull/425/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=425&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251544 Stats: 264 lines in 3 files changed: 233 ins; 11 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/425/head:pull/425 PR: https://git.openjdk.java.net/jdk/pull/425 From shade at openjdk.java.net Wed Sep 30 07:52:24 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 30 Sep 2020 07:52:24 GMT Subject: RFR: 8238737: remove DeoptimizeAllClassesRate from CTW library In-Reply-To: References: Message-ID: <12cBl0SadW-mneAnsJ8Yetk5UeUr436IQunnrsGseP8=.24406ae2-3b40-465e-b1bd-8e48efed4d48@github.com> On Tue, 29 Sep 2020 23:53:38 GMT, Igor Ignatyev wrote: > Hi all, > > after [8238247 ](https://bugs.openjdk.java.net/browse/JDK-8238247), `DeoptimizeAllClassesRate` property became > completely useless, and as I mentioned in the > [review](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036927.html), I didn't (and still > don't) have any idea why we added this at the first place. I have looked into the old implementation of CTW (one in > `/src/share/vm/classfile/classLoader.cpp`) both at the time Java impl. was added by > [8012447](https://bugs.openjdk.java.net/browse/JDK-8012447) and the old impl. was removed by > [8214917](https://bugs.openjdk.java.net/browse/JDK-8214917), and there is nothing that would suggest that we had/needed > `DeoptimizeAllClassesRate`, it's not and has never been used by any of our tests, so I'm going to remove it. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/418 From felix.yang at huawei.com Wed Sep 30 08:25:54 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 30 Sep 2020 08:25:54 +0000 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: Hi, > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > retn at openjdk.java.net] On Behalf Of Andrew Haley > Sent: Tuesday, September 29, 2020 5:25 PM > To: core-libs-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > hotspot-compiler-dev at openjdk.java.net; security-dev at openjdk.java.net > Subject: Re: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic > > On Thu, 17 Sep 2020 06:03:57 GMT, Ard Biesheuvel > wrote: > > >> @ardbiesheuvel : Ard, could you please ack this patch? Thanks. > > > > Acked-by: Ard Biesheuvel > > > If this feature is not auto-enabled when the SHA3 hardware feature is > > there, we will have one failure for the following > > test: > > test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOp > > tionOnSupportedCPU.java > > 15 #-----testresult----- > > 16 > > description=file:/home/yangfei/github/jdk/test/hotspot/jtreg/compiler/ > > intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > > 17 elapsed=31546 0:00:31.546 18 end=Mon Sep 21 10:27:58 CST 2020 > > 19 environment=regtest > > 20 execStatus=Failed. Execution failed: `main' threw exception: > > java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to > > have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default > Any suggestions for this? > > I don't understand your question. There should be two acceptable results, > either "Pass" or "Not supported". What else is possible? This new test is similar to existing test cases in the same directory like: TestUseSHA256IntrinsicsOptionOnSupportedCPU.java Currently, we ran this new test using QEMU which supports the aarch64 SHA3 feature. This new test is expecting option 'UseSHA3Intrinsics' to be auto-enabled by the JVM when it detects the availability of SHA3 feature. But that was not satisfied since we explicitly disable this option even when SHA3 feature is available. + if (UseSHA && (_features & CPU_SHA3)) { + // Do not auto-enable UseSHA3Intrinsics until it has been fully tested on hardware + // if (FLAG_IS_DEFAULT(UseSHA3Intrinsics)) { + // FLAG_SET_DEFAULT(UseSHA3Intrinsics, true); + // } So I could thought of several choices: 1. Do not add this new test for now; 2. Keep this new test and add on extra requirement for it: @requires os.arch!="aarch64"; (We could remove this when UseSHA3Intrinsics has been fully tested on real hardware and thus could be auto-enabled.) 3. Auto-enable UseSHA3Intrinsics when SHA3 feature is available for now; Is there a better way? Thanks, Felix From felix.yang at huawei.com Wed Sep 30 08:28:13 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 30 Sep 2020 08:28:13 +0000 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: Hi, > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-retn at openjdk.java.net] On Behalf > Of Valerie Peng > Sent: Wednesday, September 30, 2020 6:25 AM > To: core-libs-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; > hotspot-compiler-dev at openjdk.java.net; security-dev at openjdk.java.net > Subject: Re: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic > > On Tue, 29 Sep 2020 09:22:25 GMT, Andrew Haley > wrote: > > >> Acked-by: Ard Biesheuvel > > > >> If this feature is not auto-enabled when the SHA3 hardware feature is > >> there, we will have one failure for the following > >> test: > >> test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsO > >> ptionOnSupportedCPU.java > >> 15 #-----testresult----- > >> 16 > >> description=file:/home/yangfei/github/jdk/test/hotspot/jtreg/compiler > >> /intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > >> 17 elapsed=31546 0:00:31.546 18 end=Mon Sep 21 10:27:58 CST 2020 > >> 19 environment=regtest > >> 20 execStatus=Failed. Execution failed: `main' threw exception: > >> java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to > >> have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default > Any suggestions for this? > > > > I don't understand your question. There should be two acceptable > > results, either "Pass" or "Not supported". What else is possible? > > I have looked at the java security changes, i.e. > src/java.base/share/classes/sun/security/provider/SHA3.java. It looks fine. Thanks for reviewing this part. Best Regards, Felix From mdoerr at openjdk.java.net Wed Sep 30 08:44:37 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 30 Sep 2020 08:44:37 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: <9hyUqSgPDlfExqPsXJ6ouwKqf0MDuOMGU-u2Hyn3__c=.22fc3e80-5d2e-4882-9a6f-0967b51f0ed1@github.com> On Wed, 30 Sep 2020 06:53:38 GMT, Thomas Stuefe wrote: >> Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. >> Most platform specific flags were never intended for official support. They are only there to diagnose issues and find >> workarounds. So flag kind "diagnostic" fits better for them. >> >> Note that I rearranged a couple of lines when looking at the diff. >> My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 > > src/hotspot/cpu/ppc/globals_ppc.hpp line 95: > >> 93: \ >> 94: /* Power 8: Configure Data Stream Control Register. */ \ >> 95: product(uint64_t, DSCR_PPC64, (uint64_t)-1, \ > > Has nothing to do with this issue. But I leave it up to you. Just cleanup, no real change. ------------- PR: https://git.openjdk.java.net/jdk/pull/413 From mdoerr at openjdk.java.net Wed Sep 30 08:49:13 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 30 Sep 2020 08:49:13 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 06:54:51 GMT, Thomas Stuefe wrote: >> Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. >> Most platform specific flags were never intended for official support. They are only there to diagnose issues and find >> workarounds. So flag kind "diagnostic" fits better for them. >> >> Note that I rearranged a couple of lines when looking at the diff. >> My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 > > src/hotspot/cpu/ppc/globals_ppc.hpp line 116: > >> 114: \ >> 115: /* special instructions */ \ >> 116: product(bool, SuperwordUseVSX, false, \ > > Why leave this one out? SuperwordUseVSX switches usage of vector registers in C2. It's more fundamental than simple instruction usage switches. It's comparable to UseAVX on x86 which is product, too. So I prefer keeping it product for now. Are you ok with it? > src/hotspot/cpu/ppc/globals_ppc.hpp line 150: > >> 148: \ >> 149: product(bool, ZapMemory, false, "Write 0x0101... to empty memory." \ >> 150: " Use this to ease debugging.") \ > > Future cleanup: this feels like it should be in shared code. The usual way to do this is to zap in DEBUG. Yeah, sounds like this functionality derserves some overwork. But I only want to make it diagnostic with this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/413 From mdoerr at openjdk.java.net Wed Sep 30 08:51:31 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 30 Sep 2020 08:51:31 GMT Subject: RFR: 8253689: [s390] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 06:52:15 GMT, Thomas Stuefe wrote: >> Current platform implementation (globals_s390.hpp) uses regular product flags for everything. >> These platform specific flags were never intended for official support. They are only there to diagnose issues and find >> workarounds. So flag kind "diagnostic" fits better. >> >> CSR: https://bugs.openjdk.java.net/browse/JDK-8253691 > > Hi Martin, > > looks simple enough. Make sure that all scripts and tests (also our internal? eg benchmarks) now > pass -XX:+UnlockDiagnosticVMOptions. > Cheers, Thomas Hi Thomas, thanks for the review. I haven't found any tests which use one of these switches. Our tests have passed. Best regards, Martin ------------- PR: https://git.openjdk.java.net/jdk/pull/414 From stuefe at openjdk.java.net Wed Sep 30 08:57:49 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 30 Sep 2020 08:57:49 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:49:01 GMT, Martin Doerr wrote: > Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. > Most platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better for them. > > Note that I rearranged a couple of lines when looking at the diff. > My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 Looks good Martin. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/413 From stuefe at openjdk.java.net Wed Sep 30 08:57:50 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 30 Sep 2020 08:57:50 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 08:44:58 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/globals_ppc.hpp line 116: >> >>> 114: \ >>> 115: /* special instructions */ \ >>> 116: product(bool, SuperwordUseVSX, false, \ >> >> Why leave this one out? > > SuperwordUseVSX switches usage of vector registers in C2. It's more fundamental than simple instruction usage switches. > It's comparable to UseAVX on x86 which is product, too. So I prefer keeping it product for now. Are you ok with it? Sure, just curious. Leave it up to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/413 From phedlin at openjdk.java.net Wed Sep 30 09:09:37 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 30 Sep 2020 09:09:37 GMT Subject: Integrated: 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). In-Reply-To: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> References: <8RH5KVjFcGjSjBiFPXHnUH0SnBgLI3aG1w0P0clumL4=.8ba3057a-e35f-42f9-83a0-c577a4103e1b@github.com> Message-ID: <9wJcxBSf6v72K9bfVHQoICOXT-ATp2D33TOSjk4pkxI=.059fb149-aa19-4c08-bd42-19c127b1cb53@github.com> On Tue, 8 Sep 2020 13:05:54 GMT, Patric Hedlin wrote: > This is just removing some dead/unused code. More importantly, this is a test of the new "issue create" support in > Skara. This pull request has now been integrated. Changeset: 04775f11 Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/04775f11 Stats: 30 lines in 1 file changed: 0 ins; 30 del; 0 mod 8253768: Deleting unused pipe_class definitions in adl-file (x86_64.ad). Reviewed-by: neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/74 From vlivanov at openjdk.java.net Wed Sep 30 09:35:09 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 30 Sep 2020 09:35:09 GMT Subject: RFR: 8238737: remove DeoptimizeAllClassesRate from CTW library In-Reply-To: References: Message-ID: <-soZie9MLR6uAeIIMpZsurARK6rY2JvhHgWtSOWiHE8=.91a51388-b2c1-4eba-b849-df75435f5cb2@github.com> On Tue, 29 Sep 2020 23:53:38 GMT, Igor Ignatyev wrote: > Hi all, > > after [8238247 ](https://bugs.openjdk.java.net/browse/JDK-8238247), `DeoptimizeAllClassesRate` property became > completely useless, and as I mentioned in the > [review](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036927.html), I didn't (and still > don't) have any idea why we added this at the first place. I have looked into the old implementation of CTW (one in > `/src/share/vm/classfile/classLoader.cpp`) both at the time Java impl. was added by > [8012447](https://bugs.openjdk.java.net/browse/JDK-8012447) and the old impl. was removed by > [8214917](https://bugs.openjdk.java.net/browse/JDK-8214917), and there is nothing that would suggest that we had/needed > `DeoptimizeAllClassesRate`, it's not and has never been used by any of our tests, so I'm going to remove it. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/418 From enikitin at openjdk.java.net Wed Sep 30 11:52:02 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 11:52:02 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v2] In-Reply-To: References: Message-ID: On Mon, 28 Sep 2020 19:55:44 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indent in ArrayCodec > > test/lib/jdk/test/lib/format/ArrayCodec.java line 260: > >> 258: if (delta > 0) { >> 259: element = Format.paddingForWidth(delta) + element; >> 260: } > > wrong indent Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 11:51:58 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 11:51:58 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v2] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Fix indent in ArrayCodec ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/ed213112..7e6c0a69 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 11:57:17 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 11:57:17 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Replace new space string with constant ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/7e6c0a69..402fb529 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 11:57:18 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 11:57:18 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: <1jTQLX0cQ3wAa3TkVMb7RwpNAt-jeQ7ZTGvwRQyw-LU=.078e33e2-d72f-47e4-ab10-9283c3155bac@github.com> On Mon, 28 Sep 2020 20:00:12 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace new space string with constant > > test/lib/jdk/test/lib/format/ArrayDiff.java line 83: > >> 81: public static class Defaults { >> 82: final static int WIDTH = 80; >> 83: final static int CONTEXT_BEFORE = 2; > > either these constants should be `public`, or `Defaults` class should be `package-private`, otherwise, you get a public > class w/ no public fields. Old cpp sins... fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 13:33:26 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 13:33:26 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: <6uP_GLWutOyBqeow_RfFYi_hoZfvyVCM44KiCUi0YUA=.15bc3210-1c1d-4de4-be79-84c3d13e1b68@github.com> On Mon, 28 Sep 2020 19:51:50 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace new space string with constant > > test/lib-test/jdk/test/lib/format/ArrayDiffTest.java line 106: > >> 104: "[122]%n" + >> 105: "[ 7, 8, 9, 10, 125, 12, 13]%n" + >> 106: " ^^^"); > > as far as I remember, the commonly used practice is to align all these lines. > > (there are other places w/ the same "problem") ... in almost every test :). Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 14:30:29 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 14:30:29 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: On Mon, 28 Sep 2020 20:00:35 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace new space string with constant > > Changes requested by iignatyev (Reviewer). > 1. granted you can't change `invokeAndCheck`'s 2nd argument to bool as there are other values being passed, but you can > change `invokeAndCompareArrays` to accept an `Object` and compare expected and actual values by `Object::equals`. Well, I think that would make `invokeAndCompareArrays` look less specific and confuse a reader by creating a false impression that it could be called with an Object as an expected result. In reality, the method is only called with 'equals' of different signatures, and 'equals' does always return boolean. It could've been named 'invokeEqualsOnArraysAndCompareThem` :) The other method, `invokeAndCheck` is different. It can call, for example, `String.concat("abc", "def")` and expect the String "abcdef" as result. It does really need to be generic. > 2. even if you can't change output of these two methods to be the same (which I so far failed to see why), you still > can change `invokeAndCheck`'s `message` var to include actual and expected values in the same way as > `invokeAndCompareArrays` does. `invokeAndCompareArrays` / `ArrayDiff` compares two huge arrays and present a nice short *slice* in the difference area. The short slice is possible because we do a limited task - compare arrays. `invokeAndCheck` , on the other hand, can have as parameters the `utf16`, a string of 10K symbols. Do we really need a 10K string as output in our log? > you don't need `ArrayCodec::of(Object array)` anymore, do you? Unfortunately, it is used in the ArrayCodec.format (which is used in TestStringIntrinsics.java) - to make it possible to call it with everything and not swamp the code with overloads. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From iignatyev at openjdk.java.net Wed Sep 30 14:53:22 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 14:53:22 GMT Subject: RFR: 8238737: remove DeoptimizeAllClassesRate from CTW library In-Reply-To: <-soZie9MLR6uAeIIMpZsurARK6rY2JvhHgWtSOWiHE8=.91a51388-b2c1-4eba-b849-df75435f5cb2@github.com> References: <-soZie9MLR6uAeIIMpZsurARK6rY2JvhHgWtSOWiHE8=.91a51388-b2c1-4eba-b849-df75435f5cb2@github.com> Message-ID: <5l9KRtPyWmILx1ycwDPvgVAs3KzPkoHB4GoT-hrSu6Y=.46f70f87-f959-42d4-8aaf-4e0eeb214c93@github.com> On Wed, 30 Sep 2020 09:32:47 GMT, Vladimir Ivanov wrote: >> Hi all, >> >> after [8238247 ](https://bugs.openjdk.java.net/browse/JDK-8238247), `DeoptimizeAllClassesRate` property became >> completely useless, and as I mentioned in the >> [review](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036927.html), I didn't (and still >> don't) have any idea why we added this at the first place. I have looked into the old implementation of CTW (one in >> `/src/share/vm/classfile/classLoader.cpp`) both at the time Java impl. was added by >> [8012447](https://bugs.openjdk.java.net/browse/JDK-8012447) and the old impl. was removed by >> [8214917](https://bugs.openjdk.java.net/browse/JDK-8214917), and there is nothing that would suggest that we had/needed >> `DeoptimizeAllClassesRate`, it's not and has never been used by any of our tests, so I'm going to remove it. > > Looks good. Aleksey, Vladimir, thank you for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/418 From iignatyev at openjdk.java.net Wed Sep 30 14:53:23 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 14:53:23 GMT Subject: Integrated: 8238737: remove DeoptimizeAllClassesRate from CTW library In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 23:53:38 GMT, Igor Ignatyev wrote: > Hi all, > > after [8238247 ](https://bugs.openjdk.java.net/browse/JDK-8238247), `DeoptimizeAllClassesRate` property became > completely useless, and as I mentioned in the > [review](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-February/036927.html), I didn't (and still > don't) have any idea why we added this at the first place. I have looked into the old implementation of CTW (one in > `/src/share/vm/classfile/classLoader.cpp`) both at the time Java impl. was added by > [8012447](https://bugs.openjdk.java.net/browse/JDK-8012447) and the old impl. was removed by > [8214917](https://bugs.openjdk.java.net/browse/JDK-8214917), and there is nothing that would suggest that we had/needed > `DeoptimizeAllClassesRate`, it's not and has never been used by any of our tests, so I'm going to remove it. This pull request has now been integrated. Changeset: 8b3d6768 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/8b3d6768 Stats: 14 lines in 2 files changed: 0 ins; 12 del; 2 mod 8238737: remove DeoptimizeAllClassesRate from CTW library Reviewed-by: shade, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/418 From iignatyev at openjdk.java.net Wed Sep 30 16:37:28 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 16:37:28 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 11:57:17 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Replace new space string with constant test/lib/jdk/test/lib/format/ArrayCodec.java line 157: > 155: return ArrayCodec.of((String[])array); > 156: } else if (!type.isPrimitive() && !type.isArray()) { > 157: return ArrayCodec.of((String[])array); Suggestion: return ArrayCodec.of((Object[])array); copy-paste typo? ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From iignatyev at openjdk.java.net Wed Sep 30 16:37:28 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 16:37:28 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 14:27:52 GMT, Evgeny Nikitin wrote: > The other method, `invokeAndCheck` is different. It can call, for example, `String.concat("abc", "def")` and expect the > String "abcdef" as result. It does really need to be generic. although I still don't see them as really being that much different, I'm fine w/ keeping them as-is. > `invokeAndCompareArrays` / `ArrayDiff` compares two huge arrays and present a nice short _slice_ in the difference > area. The short slice is possible because we do a limited task - compare arrays. `invokeAndCheck` , on the other hand, > can have as parameters the `utf16`, a string of 10K symbols. Do we really need a 10K string as output in our log? I think we do, as people tend to first look at exception's messages and only then look thru other logs, so having relative information in the exception is always a good thing. should 10K symbols become a problem (this though would also mean that you can't print the compared values w/ `Format.asLiteral`), we can revisit this. > > > you don't need `ArrayCodec::of(Object array)` anymore, do you? > > Unfortunately, it is used in the ArrayCodec.format (which is used in TestStringIntrinsics.java) - to make it possible > to call it with everything and not swamp the code with overloads. hm, I somehow missed that usage, but you don't need to repeat to the same switch over a component type in `ArrayDiff::of`, do you? ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From kvn at openjdk.java.net Wed Sep 30 18:05:38 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 30 Sep 2020 18:05:38 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 06:39:02 GMT, Roland Westrelin wrote: > The code pattern in the test case is optimized as a trichotomy which > is wrong given SubTypeCheckNode is a special kind of CmpNode that's > not commutative. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/422 From psandoz at openjdk.java.net Wed Sep 30 18:22:31 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 30 Sep 2020 18:22:31 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> Message-ID: On Wed, 30 Sep 2020 03:33:38 GMT, Jie Fu wrote: >> Build changes look ok. > > Hi @PaulSandoz , > > This integration seems to miss https://github.com/openjdk/panama-vector/pull/1, which had fixed crashes on AVX512 > machines. > Thanks. @DamonFool we can follow up later for that fix (and others in `vectorIntrinsics`), after this PR integrates. I don't want to perturb the code that has already been reviewed, which requires yet more additional review. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From enikitin at openjdk.java.net Wed Sep 30 19:49:18 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 19:49:18 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v4] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with three additional commits since the last revision: - Fix Object component type for ArrayCodec.of - Align strings in ArrayDiffTest - Make ArrayDif defaults really public ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/402fb529..803cd355 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=02-03 Stats: 45 lines in 3 files changed: 0 ins; 0 del; 45 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 19:49:19 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 19:49:19 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v3] In-Reply-To: References: Message-ID: <5_ehofREVqI1X6QzRkAZT1OoUEe9gd7g3EpS2tBhgJg=.d56ba641-667c-4ffa-a68c-55f9b6374b9b@github.com> On Wed, 30 Sep 2020 16:35:02 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace new space string with constant > > test/lib/jdk/test/lib/format/ArrayCodec.java line 157: > >> 155: return ArrayCodec.of((String[])array); >> 156: } else if (!type.isPrimitive() && !type.isArray()) { >> 157: return ArrayCodec.of((String[])array); > > Suggestion: > > return ArrayCodec.of((Object[])array); > > copy-paste typo? A good one, thanks. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From never at openjdk.java.net Wed Sep 30 19:58:17 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 30 Sep 2020 19:58:17 GMT Subject: RFR: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor In-Reply-To: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> References: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> Message-ID: On Tue, 29 Sep 2020 17:20:34 GMT, Doug Simon wrote: > This change prevents a call to `CompilerToVM.resolveMethod` with an argument representing a constructor. Such a call > triggers an assertion in a fastdebug VM. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/407 From dnsimon at openjdk.java.net Wed Sep 30 20:09:36 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 30 Sep 2020 20:09:36 GMT Subject: RFR: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor [v2] In-Reply-To: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> References: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> Message-ID: > This change prevents a call to `CompilerToVM.resolveMethod` with an argument representing a constructor. Such a call > triggers an assertion in a fastdebug VM. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: improved readability of comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/407/files - new: https://git.openjdk.java.net/jdk/pull/407/files/7726f71c..918c0e96 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=407&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=407&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/407.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/407/head:pull/407 PR: https://git.openjdk.java.net/jdk/pull/407 From dnsimon at openjdk.java.net Wed Sep 30 20:09:38 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 30 Sep 2020 20:09:38 GMT Subject: Integrated: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor In-Reply-To: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> References: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> Message-ID: On Tue, 29 Sep 2020 17:20:34 GMT, Doug Simon wrote: > This change prevents a call to `CompilerToVM.resolveMethod` with an argument representing a constructor. Such a call > triggers an assertion in a fastdebug VM. This pull request has now been integrated. Changeset: 424d7d64 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/424d7d64 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor Reviewed-by: never ------------- PR: https://git.openjdk.java.net/jdk/pull/407 From enikitin at openjdk.java.net Wed Sep 30 20:23:01 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 20:23:01 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v5] In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 16:34:10 GMT, Igor Ignatyev wrote: > I think we do, as people tend to first look at exception's messages and only then look thru other logs, so having > relative information in the exception is always a good thing. should 10K symbols become a problem (this though would > also mean that you can't print the compared values w/ `Format.asLiteral`), we can revisit this. Ok, I got it. I was blind and stupid, sorry. Fixed, please check the e6fb6d04cad > hm, I somehow missed that usage, but you don't need to repeat to the same switch over a component type in > `ArrayDiff::of`, do you? <sigh>... I can see no better solution here. The `ArrayDiff::of` checks that both component types are the same in addition, but I probably could have avoided duplicating the switch (we can ignore double guessing the type). The issue is that in ArrayDiff is forwarded to first and second, I would need to declare them as generic (ArrayCodec). The type system seem to not allow me to fix that nicely. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Wed Sep 30 20:23:01 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 30 Sep 2020 20:23:01 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v5] In-Reply-To: References: Message-ID: <-tlozxBhWYLfWzNnau-oENPWtFOywM8uZiDUmeVFCuc=.ce97c644-b5f8-4a62-b140-48e8d3aed0ac@github.com> > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Print all arguments in the invokeAndCheck's exception ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/803cd355..e6fb6d04 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=03-04 Stats: 11 lines in 1 file changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From lucy at openjdk.java.net Wed Sep 30 20:25:00 2020 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 30 Sep 2020 20:25:00 GMT Subject: RFR: 8253689: [s390] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:52:17 GMT, Martin Doerr wrote: > Current platform implementation (globals_s390.hpp) uses regular product flags for everything. > These platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better. > > CSR: https://bugs.openjdk.java.net/browse/JDK-8253691 This is an overdue cleanup. Thanks for taking care. The changes look good to me. Reviewed. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/414 From kvn at openjdk.java.net Wed Sep 30 20:30:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 30 Sep 2020 20:30:11 GMT Subject: RFR: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor [v2] In-Reply-To: References: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> Message-ID: <11o8hWh7RCPUxMQzkbE7ohKykMetsnjIKJosLtMjoAo=.16f031bc-0edc-4439-a391-9ba168039c9d@github.com> On Wed, 30 Sep 2020 19:55:39 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> improved readability of comment > > Marked as reviewed by never (Reviewer). @dougxc Can you explain why changes in TestResolvedJavaType.java test were not ported? ------------- PR: https://git.openjdk.java.net/jdk/pull/407 From lucy at openjdk.java.net Wed Sep 30 20:32:57 2020 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 30 Sep 2020 20:32:57 GMT Subject: RFR: 8253690: [PPC64] Use flag kind diagnostic for platform specific flags In-Reply-To: References: Message-ID: <5mj0DioQ7iOD_jgIFFOGT3c49HZuU0GPA5LabTDBMtw=.71f675c1-b748-422a-8d34-0f19bbe075e2@github.com> On Tue, 29 Sep 2020 20:49:01 GMT, Martin Doerr wrote: > Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. > Most platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better for them. > > Note that I rearranged a couple of lines when looking at the diff. > My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 This is an overdue cleanup. Thanks for taking care. The changes look good to me. Reviewed. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/413 From dnsimon at openjdk.java.net Wed Sep 30 20:40:47 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 30 Sep 2020 20:40:47 GMT Subject: RFR: 8252881: [JVMCI] ResolvedJavaType.resolveMethod fails in fastdebug when invoked with a constructor [v2] In-Reply-To: <11o8hWh7RCPUxMQzkbE7ohKykMetsnjIKJosLtMjoAo=.16f031bc-0edc-4439-a391-9ba168039c9d@github.com> References: <94mG00rdarnuSrsjlJ2cYFFOkn8pN8edfOylC3TqqTY=.dbf46b0c-1de4-4a2e-a8c2-ecafd680cd03@github.com> <11o8hWh7RCPUxMQzkbE7ohKykMetsnjIKJosLtMjoAo=.16f031bc-0edc-4439-a391-9ba168039c9d@github.com> Message-ID: On Wed, 30 Sep 2020 20:27:38 GMT, Vladimir Kozlov wrote: >> Marked as reviewed by never (Reviewer). > > @dougxc Can you explain why changes in TestResolvedJavaType.java test were not ported? Oversight. I'll fix this now. ------------- PR: https://git.openjdk.java.net/jdk/pull/407 From iignatyev at openjdk.java.net Wed Sep 30 21:21:26 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 21:21:26 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v5] In-Reply-To: References: Message-ID: <6vRhHgd3e0gnEqpA5d_R7PMM_QK2OQicLzjRM5tQ69k=.c7d9e131-533f-4418-a101-1890ecc9c4c9@github.com> On Wed, 30 Sep 2020 20:20:08 GMT, Evgeny Nikitin wrote: >>> The other method, `invokeAndCheck` is different. It can call, for example, `String.concat("abc", "def")` and expect the >>> String "abcdef" as result. It does really need to be generic. >> although I still don't see them as really being that much different, I'm fine w/ keeping them as-is. >> >> >>> `invokeAndCompareArrays` / `ArrayDiff` compares two huge arrays and present a nice short _slice_ in the difference >>> area. The short slice is possible because we do a limited task - compare arrays. `invokeAndCheck` , on the other hand, >>> can have as parameters the `utf16`, a string of 10K symbols. Do we really need a 10K string as output in our log? >> >> I think we do, as people tend to first look at exception's messages and only then look thru other logs, so having >> relative information in the exception is always a good thing. should 10K symbols become a problem (this though would >> also mean that you can't print the compared values w/ `Format.asLiteral`), we can revisit this. >>> >>> > you don't need `ArrayCodec::of(Object array)` anymore, do you? >>> >>> Unfortunately, it is used in the ArrayCodec.format (which is used in TestStringIntrinsics.java) - to make it possible >>> to call it with everything and not swamp the code with overloads. >> >> hm, I somehow missed that usage, but you don't need to repeat to the same switch over a component type in >> `ArrayDiff::of`, do you? > >> I think we do, as people tend to first look at exception's messages and only then look thru other logs, so having >> relative information in the exception is always a good thing. should 10K symbols become a problem (this though would >> also mean that you can't print the compared values w/ `Format.asLiteral`), we can revisit this. > > Ok, I got it. I was blind and stupid, sorry. Fixed, please check the e6fb6d04cad > >> hm, I somehow missed that usage, but you don't need to repeat to the same switch over a component type in >> `ArrayDiff::of`, do you? > > <sigh>... I can see no better solution here. The `ArrayDiff::of` checks that both component types are the same in > addition, but I probably could have avoided duplicating the switch (we can ignore double guessing the type). The issue > is that in ArrayDiff is forwarded to first and second, I would need to declare them as generic > (ArrayCodec). The type system seem to not allow me to fix that nicely. > > I think we do, as people tend to first look at exception's messages and only then look thru other logs, so having > > relative information in the exception is always a good thing. should 10K symbols become a problem (this though would > > also mean that you can't print the compared values w/ `Format.asLiteral`), we can revisit this. > > Ok, I got it. I was blind and stupid, sorry. Fixed, please check the > [e6fb6d0](https://github.com/openjdk/jdk/commit/e6fb6d04caddc4b410a594574f570cfcb9445e4a) I'd use `StringBuilder` to construct the message. > > hm, I somehow missed that usage, but you don't need to repeat to the same switch over a component type in > > `ArrayDiff::of`, do you? > > ... I can see no better solution here. The `ArrayDiff::of` checks that both component types are the same in > addition, but I probably could have avoided duplicating the switch (we can ignore double guessing the type). The issue > is that in ArrayDiff is forwarded to first and second, I would need to declare them as generic (ArrayCodec). The type > system seem to not allow me to fix that nicely. wouldn't the following patch do? --git a/test/lib/jdk/test/lib/format/ArrayDiff.java b/test/lib/jdk/test/lib/format/ArrayDiff.java --- a/test/lib/jdk/test/lib/format/ArrayDiff.java +++ b/test/lib/jdk/test/lib/format/ArrayDiff.java @@ -123,41 +123,10 @@ public class ArrayDiff implements Diff { if (!bothAreArrays || !componentTypesAreSame) { throw new IllegalArgumentException("Both arguments should be arrays of the same type"); } - - var type = first.getClass().getComponentType(); - if (type == byte.class) { - return new ArrayDiff<>( - ArrayCodec.of((byte[])first), - ArrayCodec.of((byte[])second), - width, contextBefore); - } else if (type == int.class) { - return new ArrayDiff<>( - ArrayCodec.of((int[])first), - ArrayCodec.of((int[])second), - width, contextBefore); - } else if (type == long.class) { - return new ArrayDiff<>( - ArrayCodec.of((long[])first), - ArrayCodec.of((long[])second), - width, contextBefore); - } else if (type == char.class) { - return new ArrayDiff<>( - ArrayCodec.of((char[])first), - ArrayCodec.of((char[])second), - width, contextBefore); - } else if (type == String.class) { - return new ArrayDiff<>( - ArrayCodec.of((String[])first), - ArrayCodec.of((String[])second), - width, contextBefore); - } else if (!type.isPrimitive() && !type.isArray()) { - return new ArrayDiff<>( - ArrayCodec.of((Object[])first), - ArrayCodec.of((Object[])second), - width, contextBefore); - } - - throw new IllegalArgumentException("Unsupported array component type: " + type); + return new ArrayDiff( + ArrayCodec.of(first), + ArrayCodec.of(second), + width, contextBefore); } /** ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From iveresov at openjdk.java.net Wed Sep 30 21:40:15 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 30 Sep 2020 21:40:15 GMT Subject: RFR: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 Message-ID: The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. ------------- Commit messages: - Fix the test to support all currently possible x64 CPU flags. Changes: https://git.openjdk.java.net/jdk/pull/442/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=442&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253869 Stats: 18 lines in 1 file changed: 13 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/442/head:pull/442 PR: https://git.openjdk.java.net/jdk/pull/442 From hohensee at amazon.com Wed Sep 30 21:47:41 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 30 Sep 2020 21:47:41 +0000 Subject: RFR: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 Message-ID: I'd add a corresponding comment (to the one in CPUInfoTest.java) in vm_version_x86.hpp about having to keep CPUInfoTest.java in sync with FEATURES_LIST. Otherwise lgtm. Thanks, Paul ?On 9/30/20, 2:41 PM, "hotspot-compiler-dev on behalf of Igor Veresov" wrote: The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. ------------- Commit messages: - Fix the test to support all currently possible x64 CPU flags. Changes: https://git.openjdk.java.net/jdk/pull/442/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=442&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253869 Stats: 18 lines in 1 file changed: 13 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/442/head:pull/442 PR: https://git.openjdk.java.net/jdk/pull/442 From luhenry at openjdk.java.net Wed Sep 30 21:48:38 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 30 Sep 2020 21:48:38 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: <76F67wutfIYeeKdNWfpWUd0EsZSbDDVJm-3bqTixFRE=.c4e31f33-272e-43d2-8bf2-af510634489a@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> <76F67wutfIYeeKdNWfpWUd0EsZSbDDVJm-3bqTixFRE=.c4e31f33-272e-43d2-8bf2-af510634489a@github.com> Message-ID: On Wed, 30 Sep 2020 07:26:56 GMT, Xin Liu wrote: >To disassemble code, I don't think we have to link so many libraries. It looks like code only explicitly depends >LLVMMCDisassembler and LLVMTarget here. This is the result of `llvm-config --libs x86 x86disassembler` and `llvm-config --libs aarch64 aarch64disassembler`. > If we do need to link those libraries, how about we just use llvm-config --libs. If we declare so many names here, the > Makefile is subject to LLVM. In history, LLVM refactored a lot. That should work in the general case. I did it this way originally because of cases where we need to use a cross-compiled LLVM. Then `llvm-config` and the related libraries would be compiled for the host platform. Then, the user can just specify the `LIBRARIES/*` variable by hand on the command line. > src/utils/hsdis/hsdis.cpp line 79: > >> 77: >> 78: #ifndef bool >> 79: #define bool int > > if we switch to cpp, do we still need this? Let me remove these defines. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From mikael at openjdk.java.net Wed Sep 30 21:53:19 2020 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Wed, 30 Sep 2020 21:53:19 GMT Subject: RFR: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 21:32:28 GMT, Igor Veresov wrote: > The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/442 From dcubed at openjdk.java.net Wed Sep 30 21:53:19 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 30 Sep 2020 21:53:19 GMT Subject: RFR: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 21:32:28 GMT, Igor Veresov wrote: > The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. I manually compared the two lists: $ diff /tmp/f[01] 7c7,8 < "avx512_vbmi2" --- > "avx512_vmbi" > "avx512_vmbi2" 29a31 > "hv" 32d33 < "mmxext" The old list had "avx512_vbmi2" and that's replaced by "avx512_vmbi" and "avx512_vmbi2" in the new list. The new list has "hv" as a new entry. The old list had "mmxext" as a deleted entry. I don't know if all that is intentional or not, but the test job passed so I'm approving this change. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/442 From iveresov at openjdk.java.net Wed Sep 30 21:53:20 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 30 Sep 2020 21:53:20 GMT Subject: Integrated: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 21:32:28 GMT, Igor Veresov wrote: > The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. This pull request has now been integrated. Changeset: 79d70f6b Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/79d70f6b Stats: 18 lines in 1 file changed: 13 ins; 0 del; 5 mod 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 Reviewed-by: mikael ------------- PR: https://git.openjdk.java.net/jdk/pull/442 From phh at openjdk.java.net Wed Sep 30 21:59:58 2020 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 30 Sep 2020 21:59:58 GMT Subject: RFR: 8253869: sun/hotspot/whitebox/CPUInfoTest.java fails after JDK-8239090 In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 21:50:47 GMT, Daniel D. Daugherty wrote: >> The fix for JDK-8239090 added printing of some additional CPU flags that make CPUInfoTest.java fail. > > I manually compared the two lists: > > $ diff /tmp/f[01] > 7c7,8 > < "avx512_vbmi2" > --- >> "avx512_vmbi" >> "avx512_vmbi2" > 29a31 >> "hv" > 32d33 > < "mmxext" > > The old list had "avx512_vbmi2" and that's replaced by > "avx512_vmbi" and "avx512_vmbi2" in the new list. > The new list has "hv" as a new entry. > The old list had "mmxext" as a deleted entry. > > I don't know if all that is intentional or not, but the > test job passed so I'm approving this change. Retroactive review: I'd add a corresponding comment (to the one in CPUInfoTest.java) in vm_version_x86.hpp about having to keep CPUInfoTest.java in sync with FEATURES_LIST. The old version of vm_version_x86.cpp included both "avx512_vmbi" and "avx512_vmbi2", but did not include "hv". ------------- PR: https://git.openjdk.java.net/jdk/pull/442 From iignatyev at openjdk.java.net Wed Sep 30 22:30:57 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 22:30:57 GMT Subject: RFR: 8253880: clean up sun/hotspot/tools/ctw/Utils class Message-ID: Hi all, could you please review this trivial clean up in sun/hotspot/tools/ctw/Utils which removes unused `PATH_SEPARATOR` field and unused imports? ------------- Commit messages: - 8253880: clean up sun/hotspot/tools/ctw/Utils class Changes: https://git.openjdk.java.net/jdk/pull/444/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=444&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253880 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/444.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/444/head:pull/444 PR: https://git.openjdk.java.net/jdk/pull/444 From kvn at openjdk.java.net Wed Sep 30 22:36:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 30 Sep 2020 22:36:13 GMT Subject: RFR: 8253880: clean up sun/hotspot/tools/ctw/Utils class In-Reply-To: References: Message-ID: <5yi7OIpTMnqKXDQ6DbzjGSxF3xn-DrrEaPp_770pJco=.f7485415-52be-444c-8011-ffa859f67907@github.com> On Wed, 30 Sep 2020 22:23:53 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this trivial clean up in sun/hotspot/tools/ctw/Utils which removes unused `PATH_SEPARATOR` > field and unused imports? Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/444 From luhenry at openjdk.java.net Wed Sep 30 22:42:39 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 30 Sep 2020 22:42:39 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v3] In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - Remove unecessary defines - Use llvm-config to list LLVM libraries ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/392/files - new: https://git.openjdk.java.net/jdk/pull/392/files/9127eb20..a551e247 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=01-02 Stats: 22 lines in 2 files changed: 4 ins; 15 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392 From iignatyev at openjdk.java.net Wed Sep 30 22:43:08 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 22:43:08 GMT Subject: RFR: 8253880: clean up sun/hotspot/tools/ctw/Utils class In-Reply-To: <5yi7OIpTMnqKXDQ6DbzjGSxF3xn-DrrEaPp_770pJco=.f7485415-52be-444c-8011-ffa859f67907@github.com> References: <5yi7OIpTMnqKXDQ6DbzjGSxF3xn-DrrEaPp_770pJco=.f7485415-52be-444c-8011-ffa859f67907@github.com> Message-ID: On Wed, 30 Sep 2020 22:33:29 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> could you please review this trivial clean up in sun/hotspot/tools/ctw/Utils which removes unused `PATH_SEPARATOR` >> field and unused imports? > > Trivial. thanks, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/444 From iignatyev at openjdk.java.net Wed Sep 30 22:43:10 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 30 Sep 2020 22:43:10 GMT Subject: Integrated: 8253880: clean up sun/hotspot/tools/ctw/Utils class In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 22:23:53 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this trivial clean up in sun/hotspot/tools/ctw/Utils which removes unused `PATH_SEPARATOR` > field and unused imports? This pull request has now been integrated. Changeset: 776acfd8 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/776acfd8 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod 8253880: clean up sun/hotspot/tools/ctw/Utils class Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/444 From luhenry at openjdk.java.net Wed Sep 30 22:53:04 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 30 Sep 2020 22:53:04 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v4] In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - Fix compiler warnings - Reduce amount of error messages when failing to initialize LLVM ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/392/files - new: https://git.openjdk.java.net/jdk/pull/392/files/a551e247..c497b30f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=02-03 Stats: 29 lines in 1 file changed: 19 ins; 2 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392