From vladimir.kozlov at oracle.com Tue Dec 1 00:42:16 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Nov 2015 16:42:16 -0800 Subject: RFR: 6869327: Add new C2 flag to keep safepoints in counted loops. In-Reply-To: <56585413.5060501@oracle.com> References: <56585413.5060501@oracle.com> Message-ID: <565CECE8.1040205@oracle.com> Looks good. Did you run it in JPRT? Add -XX:+IgnoreUrecognizedVMOptions flags to the list of flags (put it first) in the test because UseCountedLoopSafepoints is only C2 flag and TieredCompilation is Server VM flag. Thanks, Vladimir On 11/27/15 5:01 AM, Andreas Eriksson wrote: > Hi, > > Please review this change that adds a flag to keep a safepoint in counted loops. > > Currently C2 removes safepoints in counted loops. > This can force other safepointing threads to wait for the counted loop thread for long periods of time. > This change adds a flag, UseCountedLoopSafepoints, which keeps a safepoint in the loop. Its default value is false. > > Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. > https://bugs.openjdk.java.net/browse/JDK-6869327 > > Webrev: http://cr.openjdk.java.net/~aeriksso/6869327/webrev.00/ > > Thanks, > Andreas From vladimir.kozlov at oracle.com Tue Dec 1 00:57:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Nov 2015 16:57:26 -0800 Subject: RFR(S): 8140667: CompilerControl: tests incorrectly set states for excluded methods In-Reply-To: <93AEE95F-3BDB-4A20-9547-C276019E54A2@oracle.com> References: <93AEE95F-3BDB-4A20-9547-C276019E54A2@oracle.com> Message-ID: <565CF076.7020704@oracle.com> Good. Thanks, Vladimir On 11/25/15 1:58 PM, Pavel Punegov wrote: > Please review this fix for tests. > > Issue: CompilerOracle checks CompileCommands for being in excluded list > or compileonly list while deciding to compile method or not. > CompilerControl test framework assumes that these commands override each > other. > > This fix makes tests to behave in the same manner as CompilerOracle > does. Fix adds an internal singleton class to AbstractCommandBuilder > used for creating a test commands (-XX:CompileCommand and > CompileCommandFile) > > webrev: http://cr.openjdk.java.net/~ppunegov/8140667/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8140667 > > ? Thanks, > Pavel Punegov > From gilles.m.duboscq at oracle.com Tue Dec 1 10:14:30 2015 From: gilles.m.duboscq at oracle.com (Gilles Duboscq) Date: Tue, 1 Dec 2015 11:14:30 +0100 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <37AFDAAE-ADE8-471E-857D-0FE9A667E579@oracle.com> <565C952B.2030804@redhat.com> Message-ID: <565D7306.6050405@oracle.com> I had a look and my comments were addressed. Thank you Gilles On 30/11/15 19:30, Christian Thalinger wrote: > >> On Nov 30, 2015, at 8:27 AM, Andrew Haley wrote: >> >> On 11/30/2015 06:18 PM, Christian Thalinger wrote: >>> >>>> On Nov 30, 2015, at 8:12 AM, Andrew Haley wrote: >>>> >>>> On 11/30/2015 06:08 PM, Christian Thalinger wrote: >>>>>> So what should I do? Drop this for now? >>>>> As you like. I?m sure we cannot get it completely right but everything that we can integrate will help someone else to get something up and running. >>>> >>>> OK. http://cr.openjdk.java.net/~aph/8143072-2/ is waiting for approval. >>> >>> I don?t not see an email with that link before. Maybe you replied >>> to Gilles directly? >> >> Strangely enough, no. I thought I had, but... >> >>> Did you address all his comments? >> >> I believe so. >> >> BTW, it is (of course) 8143072. I can't change the Subject: without >> it breaking threading for some people. :-( > > I figured :-) Let?s wait for Gilles to take another look and then we are good to go. > >> >> Andrew. > From paul.sandoz at oracle.com Tue Dec 1 10:28:28 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 1 Dec 2015 11:28:28 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: References: <7e5e2f21-a462-4fb8-8cb2-52f4c9e303fb@default> Message-ID: <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> > On 30 Nov 2015, at 23:33, Paul Sandoz wrote: > > >> On 30 Nov 2015, at 23:05, Christian Tornqvist wrote: >> >> Because jtreg is the test framework that we use, we've been working hard to reduce the number of test frameworks in use. >> > > jtreg comes bundled with testng so what is there to reduce? > Here is an analogy: jtreg is to testng as launcher is to library They are complementary to each other i.e. think of testng as an implicit @library that helps one better organize tests and report errors. ? Would i be correct in stating that the HotSpot runtime team is taking a conservative position and does not want to deal with such a library, contrary to other areas of the JDK? Sorry to push back, but I don?t agree with that position (if correct). I am reluctant to change the tests. Please don?t think that complete pigheadedness on my part :-) I just don?t think it?s the right thing to do. If the HotSpot runtime team will not accept the use of TestNG then I suppose I could unblock by proposing to move the tests to the JDK repo, which I would also be reluctant to do since they caught an issue lying dormant for at least 8 years on certain platforms (not covered by the core testset) that existing hotspot tests never caught. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From chris.hegarty at oracle.com Tue Dec 1 10:41:10 2015 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Tue, 1 Dec 2015 10:41:10 +0000 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <5656F463.9080804@oracle.com> References: <5656F463.9080804@oracle.com> Message-ID: On 26 Nov 2015, at 12:00, Aleksey Shipilev wrote: > On 11/26/2015 12:55 PM, Paul Sandoz wrote: >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-jdk/ >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/ > > Both JDK and Hotspot changes look good to me. +1, both webrevs look good to me. Thanks for fixing my shortcut when moving Unsafe. -Chris. From andreas.eriksson at oracle.com Tue Dec 1 11:10:36 2015 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Tue, 1 Dec 2015 12:10:36 +0100 Subject: RFR: 6869327: Add new C2 flag to keep safepoints in counted loops. In-Reply-To: <565CECE8.1040205@oracle.com> References: <56585413.5060501@oracle.com> <565CECE8.1040205@oracle.com> Message-ID: <565D802C.8040600@oracle.com> On 2015-12-01 01:42, Vladimir Kozlov wrote: > Looks good. > > Did you run it in JPRT? Yes, a JPRT run with tests from hotspot/test/compiler/loopopts added looks good. (Except for TestCastIINoLoopLimitCheck.java failing because the fix for JDK-8141706 wasn't in jdk9-hs-comp.) > > Add -XX:+IgnoreUrecognizedVMOptions flags to the list of flags (put it > first) in the test because UseCountedLoopSafepoints is only C2 flag > and TieredCompilation is Server VM flag. Alright, will do. I'll go ahead and push this later this week (with that change), unless someone objects. Thanks, Andreas > > Thanks, > Vladimir > > On 11/27/15 5:01 AM, Andreas Eriksson wrote: >> Hi, >> >> Please review this change that adds a flag to keep a safepoint in >> counted loops. >> >> Currently C2 removes safepoints in counted loops. >> This can force other safepointing threads to wait for the counted >> loop thread for long periods of time. >> This change adds a flag, UseCountedLoopSafepoints, which keeps a >> safepoint in the loop. Its default value is false. >> >> Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. >> https://bugs.openjdk.java.net/browse/JDK-6869327 >> >> Webrev: http://cr.openjdk.java.net/~aeriksso/6869327/webrev.00/ >> >> Thanks, >> Andreas From paul.sandoz at oracle.com Tue Dec 1 11:13:27 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 1 Dec 2015 12:13:27 +0100 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <8978FD58-910C-478D-A842-7B68CDF79D9E@oracle.com> References: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> <1498072382.420789.1448918966462.JavaMail.zimbra@u-pem.fr> <8978FD58-910C-478D-A842-7B68CDF79D9E@oracle.com> Message-ID: HI John, Remi, AFAICT it?s implicitly supported but as John reminded me, it comes loaded with other semantics for array contents (which is currently also compiler dependent as C2 only supports that at the moment): http://hg.openjdk.java.net/jdk9/dev/hotspot/file/tip/src/share/vm/ci/ciField.cpp#l204 void ciField::initialize_from(fieldDescriptor* fd) { // Get the flags, offset, and canonical holder of the field. _flags = ciFlags(fd->access_flags()); _offset = fd->offset(); _holder = CURRENT_ENV->get_instance_klass(fd->field_holder()); // Check to see if the field is constant. bool is_final = this->is_final(); bool is_stable = FoldStableValues && this->is_stable(); if (_holder->is_initialized() && (is_final || is_stable)) { if (!this->is_static()) { // A field can be constant if it's a final static field or if // it's a final non-static field of a trusted class (classes in // java.lang.invoke and sun.invoke packages and subpackages). if (is_stable || trust_final_non_static_fields(_holder)) { _is_constant = true; return; } _is_constant = false; return; } For "final @Stable? on a non-arrays the documentation seems to imply the annotation is ignored: * Fields which are declared {@code final} may also be annotated as stable. * Since final fields already behave as stable values, such an annotation * indicates no additional information, unless the type of the field is * an array type. *

* It is (currently) undefined what happens if a field annotated as stable * is given a third value. In practice, if the JVM relies on this annotation * to promote a field reference to a constant, it may be that the Java memory * model would appear to be broken, if such a constant (the second value of the field) * is used as the value of the field even after the field value has changed. I think we can get away with tweaking the above two paragraphs. Paul. > On 30 Nov 2015, at 22:34, John Rose wrote: > > On Nov 30, 2015, at 1:29 PM, Remi Forax wrote: >> >> One annotation is missing, @TrueFinal, >> currently only the fields of the classes of java.lang.invoke are considered as true final. >> >> This annotation is also needed to improve the performance of the field updaters of j.u.c.a. >> One can use @Stable instead but it seems awkward. > > Good idea! Let's use "final @Stable" for that. (It's gonna be awkward in any case, till we sort out final finally.) > > Paul, would you please file a followup bug for this? It may let us decouple the JVM from the TrustFinalFields special cases. > > ? John -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From roland.westrelin at oracle.com Tue Dec 1 11:35:04 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 1 Dec 2015 12:35:04 +0100 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs Message-ID: http://cr.openjdk.java.net/~roland/8143930/webrev.00/ The problem is that loading values in pinned registers and then computing the address for the CAS puts too much pressure on registers. The fix consists in loading values in pinned registers right before the CAS. Roland. From aleksey.shipilev at oracle.com Tue Dec 1 11:47:37 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 1 Dec 2015 14:47:37 +0300 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs In-Reply-To: References: Message-ID: <565D88D9.1010509@oracle.com> On 12/01/2015 02:35 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8143930/webrev.00/ > > The problem is that loading values in pinned registers and then > computing the address for the CAS puts too much pressure on > registers. The fix consists in loading values in pinned registers > right before the CAS. I have tested this fix before on our new Unsafe tests. FTR, this patch looks good to me. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From paul.sandoz at oracle.com Tue Dec 1 11:52:11 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 1 Dec 2015 12:52:11 +0100 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs In-Reply-To: References: Message-ID: <3A7403D7-AE1E-46F0-84CB-541717563816@oracle.com> Hi Roland, Many thanks for fixing this. Can you update the test so it uses jdk.internal.misc.Unsafe rather than sun.misc.Unsafe? Paul. > On 1 Dec 2015, at 12:35, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8143930/webrev.00/ > > The problem is that loading values in pinned registers and then computing the address for the CAS puts too much pressure on registers. The fix consists in loading values in pinned registers right before the CAS. > > Roland. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From roland.schatz at oracle.com Tue Dec 1 12:53:13 2015 From: roland.schatz at oracle.com (Roland Schatz) Date: Tue, 1 Dec 2015 13:53:13 +0100 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565C9196.4060702@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> Message-ID: <565D9839.50705@oracle.com> Hi, > 27 public enum AArch64Kind implements PlatformKind { > 28 > 29 // scalar > 30 BYTE(1), > 31 WORD(2), > 32 DWORD(4), > 33 QWORD(8), > 34 SINGLE(4), > 35 DOUBLE(8), > 36 > 37 // SIMD > 38 V32_BYTE(4, BYTE), > 39 V32_WORD(4, WORD), > 40 V64_BYTE(8, BYTE), > 41 V64_WORD(8, WORD), > 42 V64_DWORD(8, DWORD), > 43 V128_BYTE(16, BYTE), > 44 V128_WORD(16, WORD), > 45 V128_DWORD(16, DWORD), > 46 V128_QWORD(16, QWORD), > 47 V128_SINGLE(16, SINGLE), > 48 V128_DOUBLE(16, DOUBLE), > 49 > 50 MASK8(1), > 51 MASK16(2), > 52 MASK32(4), > 53 MASK64(8); Regarding the MASK* kinds: Does it AArch64 really have a dedicated mask datatype, like AVX512? There seem to be no registers where the MASK kinds can be stored. AArch64.canStoreValue(...) returns false for all registers for the MASK kinds, so they can't be really used. - Roland On 11/30/2015 07:12 PM, Andrew Haley wrote: > On 11/30/2015 06:08 PM, Christian Thalinger wrote: >>> So what should I do? Drop this for now? >> As you like. I?m sure we cannot get it completely right but everything that we can integrate will help someone else to get something up and running. > OK. http://cr.openjdk.java.net/~aph/8143072-2/ is waiting for approval. > > Andrew. > From paul.sandoz at oracle.com Tue Dec 1 14:23:43 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 1 Dec 2015 15:23:43 +0100 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> References: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> Message-ID: > On 30 Nov 2015, at 20:20, John Rose wrote: >> >>> The annotations j.l.invoke.{ForceInline, DontInline, Stable} are currently package private but could have use outside that package if used carefully and sparingly. For example: >>> >>> 1) Methods on Atomic*FieldUpdater classes may benefit from @ForceInline, as would array support methods of JDK-8136924 to preserve existing inlining characteristics. >>> 2) Reference.reachabilityFence (JDK-8133348) could be annotated with @DontInline rather than explicitly making the method signature known to the VM. >>> 3) The alias jdk.vm.ci.hotspot.Stable could potentially be removed. >> >> I am especially interested if 3) is possible, it seems so from my limited knowledge. > > It's a good move, now that we have a module system which can limit the accidental use of such things. > > The javadoc for @ForceInline and @DontInline should both carry strong disclaimers about sparing usage. Suggest: > >> This annotation must be used sparingly. It is useful when the only reasonable alternative is to bind the name of a specific method into the HotSpot JVM for special handling by the inlining policy. This annotation must not be relied on as an alternative to avoid tuning the JVM's inlining policy. In a few cases, it may act as a temporary workaround until the profiling and inlining performed by the JVM is sufficiently improved. > Done. > (It is an odd thing for javadoc to refer the user to JVM internals, but reasonable in this case, since the user base of the annotations is maintainers of the JDK itself.) > > A grammar upgrade: s/Annotated methods.constructors of classes/Annotations on methods or constructors of classes/ > s/Annotated fields of classes/Annotations on fields of classes/ > Done. Thanks. I also took a stab at tweaking @Stable: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/ *

* Fields which are declared {@code final} may also be annotated as stable. * Since final fields already behave as stable values, such an annotation * conveys no additional information regarding change of the field's value, but * still conveys information regarding change of additional components values if * the type of the field is an array type (as described above). *

* The HotSpot VM relies on this annotation to promote a non-null (resp., * non-zero) component value to a constant, thereby enabling superior * optimizations of code depending on such a value (such as constant folding). * More specifically, the HotSpot VM will process non-null stable fields (final * or otherwise) in a similar manner to static final fields with respect to * promoting the field's value to a constant. Thus, placing aside the * differences for null/non-null values and arrays, a final stable field is * treated as if it is really final from both the Java language and the HotSpot * VM. *

* It is (currently) undefined what happens if a field annotated as stable * is given a third value (by explicitly updating a stable field, a component of * a stable array, or a final stable field via reflection or other means). * Since the HotSpot VM promotes a non-null component value to constant, it may * be that the Java memory model would appear to be broken, if such a constant * (the second value of the field) is used as the value of the field even after * the field value has changed. Paul. > ? John > > P.S. Some day, I suppose, we can get lazy evaluation worked into the JVM semantics, and available to all Java programmers. (If it is not hardwired as in [1], it may require a property mechanism in order to control the writes.) Until then, we have an unenforced "gentleman's agreement", as described by the javadoc of @Stable. > > [1]: http://cr.openjdk.java.net/~jrose/draft/lazy-final.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From aph at redhat.com Tue Dec 1 14:40:32 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Dec 2015 14:40:32 +0000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565D9839.50705@oracle.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> Message-ID: <565DB160.7000505@redhat.com> On 12/01/2015 12:53 PM, Roland Schatz wrote: > Regarding the MASK* kinds: Does it AArch64 really have a dedicated mask > datatype, like AVX512? > There seem to be no registers where the MASK kinds can be stored. > AArch64.canStoreValue(...) returns false for all registers for the MASK > kinds, so they can't be really used. No, that's just a hangover from the x86 version. Andrew. From vladimir.kozlov at oracle.com Tue Dec 1 16:26:45 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 08:26:45 -0800 Subject: RFR: 6869327: Add new C2 flag to keep safepoints in counted loops. In-Reply-To: <565D802C.8040600@oracle.com> References: <56585413.5060501@oracle.com> <565CECE8.1040205@oracle.com> <565D802C.8040600@oracle.com> Message-ID: <565DCA45.5020808@oracle.com> On 12/1/15 3:10 AM, Andreas Eriksson wrote: > > > On 2015-12-01 01:42, Vladimir Kozlov wrote: >> Looks good. >> >> Did you run it in JPRT? > > Yes, a JPRT run with tests from hotspot/test/compiler/loopopts added looks good. Thanks > (Except for TestCastIINoLoopLimitCheck.java failing because the fix for JDK-8141706 wasn't in jdk9-hs-comp.) > >> >> Add -XX:+IgnoreUrecognizedVMOptions flags to the list of flags (put it first) in the test because >> UseCountedLoopSafepoints is only C2 flag and TieredCompilation is Server VM flag. > > Alright, will do. > I'll go ahead and push this later this week (with that change), unless someone objects. Sounds good. Thanks, Vladimir > > Thanks, > Andreas > >> >> Thanks, >> Vladimir >> >> On 11/27/15 5:01 AM, Andreas Eriksson wrote: >>> Hi, >>> >>> Please review this change that adds a flag to keep a safepoint in counted loops. >>> >>> Currently C2 removes safepoints in counted loops. >>> This can force other safepointing threads to wait for the counted loop thread for long periods of time. >>> This change adds a flag, UseCountedLoopSafepoints, which keeps a safepoint in the loop. Its default value is false. >>> >>> Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. >>> https://bugs.openjdk.java.net/browse/JDK-6869327 >>> >>> Webrev: http://cr.openjdk.java.net/~aeriksso/6869327/webrev.00/ >>> >>> Thanks, >>> Andreas > From aph at redhat.com Tue Dec 1 16:56:34 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Dec 2015 16:56:34 +0000 Subject: Undefined behaviour in HotSpot Message-ID: <565DD142.7050002@redhat.com> I've been kicking the tyres of the undefined behaviour sanitizer in GCC. It picks up a few spurious errors in HotSpot but some serious ones too. In particular, there are many integer overflows in C2, and these can lead to incorrect code generation. I don't know that they actually cause any problems, but I do know that GCC's optimizations "know" that signed integer overflows never occur and generate code accordingly. Some of the code in C2 which checks for overflow (e.g. AddLNode::add_ring) looks very wrong to me. I am not comfortable that an aggressive C++ optimizing compiler will generate the expected code for this function. Would it be useful at this stage in JDK9 to fix these? If so, I can create some bug reports and webrevs. Andrew. From christian.thalinger at oracle.com Tue Dec 1 17:55:23 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 1 Dec 2015 07:55:23 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565DB160.7000505@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> Message-ID: > On Dec 1, 2015, at 4:40 AM, Andrew Haley wrote: > > On 12/01/2015 12:53 PM, Roland Schatz wrote: >> Regarding the MASK* kinds: Does it AArch64 really have a dedicated mask >> datatype, like AVX512? >> There seem to be no registers where the MASK kinds can be stored. >> AArch64.canStoreValue(...) returns false for all registers for the MASK >> kinds, so they can't be really used. > > No, that's just a hangover from the x86 version. Are you sending a new webrev? > > Andrew. > From vladimir.kozlov at oracle.com Tue Dec 1 17:57:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 09:57:07 -0800 Subject: Undefined behaviour in HotSpot In-Reply-To: <565DD142.7050002@redhat.com> References: <565DD142.7050002@redhat.com> Message-ID: <565DDF73.5050502@oracle.com> On 12/1/15 8:56 AM, Andrew Haley wrote: > I've been kicking the tyres of the undefined behaviour sanitizer in > GCC. It picks up a few spurious errors in HotSpot but some serious > ones too. In particular, there are many integer overflows in C2, and > these can lead to incorrect code generation. I don't know that they > actually cause any problems, but I do know that GCC's optimizations > "know" that signed integer overflows never occur and generate code > accordingly. > > Some of the code in C2 which checks for overflow (e.g. > AddLNode::add_ring) looks very wrong to me. I am not comfortable that > an aggressive C++ optimizing compiler will generate the expected code > for this function. > > Would it be useful at this stage in JDK9 to fix these? If so, I can > create some bug reports and webrevs. Yes, please. Any enhancements to code quality are welcome. But beware of false positive finding. Thanks, Vladimir > > Andrew. > From aph at redhat.com Tue Dec 1 17:58:25 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Dec 2015 17:58:25 +0000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> Message-ID: <565DDFC1.7020006@redhat.com> On 12/01/2015 05:55 PM, Christian Thalinger wrote: > >> On Dec 1, 2015, at 4:40 AM, Andrew Haley wrote: >> >> On 12/01/2015 12:53 PM, Roland Schatz wrote: >>> Regarding the MASK* kinds: Does it AArch64 really have a dedicated mask >>> datatype, like AVX512? >>> There seem to be no registers where the MASK kinds can be stored. >>> AArch64.canStoreValue(...) returns false for all registers for the MASK >>> kinds, so they can't be really used. >> >> No, that's just a hangover from the x86 version. > > Are you sending a new webrev? I was waiting to see if there were any more problems. Andrew. From vladimir.kozlov at oracle.com Tue Dec 1 18:16:34 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 10:16:34 -0800 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs In-Reply-To: References: Message-ID: <565DE402.3060101@oracle.com> Looks good. Thanks, Vladimir On 12/1/15 3:35 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8143930/webrev.00/ > > The problem is that loading values in pinned registers and then computing the address for the CAS puts too much pressure on registers. The fix consists in loading values in pinned registers right before the CAS. > > Roland. > From john.r.rose at oracle.com Tue Dec 1 18:43:40 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 1 Dec 2015 10:43:40 -0800 Subject: Undefined behaviour in HotSpot In-Reply-To: <565DD142.7050002@redhat.com> References: <565DD142.7050002@redhat.com> Message-ID: On Dec 1, 2015, at 8:56 AM, Andrew Haley wrote: > > Some of the code in C2 which checks for overflow (e.g. > AddLNode::add_ring) looks very wrong to me. I am not comfortable that > an aggressive C++ optimizing compiler will generate the expected code > for this function. Good catch. We may need to replace some normal-looking C++ expressions on jint (or even "int") with more formal-looking expressions, when the JIT is performing static evaluation of Java expressions. A straw man: // globalDefinitions.hpp #define jint_add(x, y) jint(juint(x) + juint(y)) #define jint_mul(x, y) jint(jlong(x) * jlong(y)) In some cases, it will be enough just to use an unsigned type instead of a signed one. (IIRC, C unsigneds have well-defined overflow.) But that won't always work, e.g., with evaluation of Java multiplication. Maybe we need an internal JVM API specifically for computing Java bytecode results? (As opposed to generating code that computes them.) Thank you very much for poking at this. Static evaluation of expressions by the JIT is a long-term source of bugs, and there must be other "clever" uses of overflow in our source base. (I'm sure I've coded some.) ? John P.S. In case anybody is thinking it: IMO, it would *not* be good to use C++ operator overloading to make the static evaluation code look normal. The special code should have a special look, even if it is uglier than the normal look. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Dec 1 19:00:32 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Dec 2015 19:00:32 +0000 Subject: Undefined behaviour in HotSpot In-Reply-To: References: <565DD142.7050002@redhat.com> Message-ID: <565DEE50.304@redhat.com> On 12/01/2015 06:43 PM, John Rose wrote: > On Dec 1, 2015, at 8:56 AM, Andrew Haley wrote: >> >> Some of the code in C2 which checks for overflow (e.g. >> AddLNode::add_ring) looks very wrong to me. I am not comfortable that >> an aggressive C++ optimizing compiler will generate the expected code >> for this function. > > Good catch. We may need to replace some normal-looking C++ expressions > on jint (or even "int") with more formal-looking expressions, when the JIT > is performing static evaluation of Java expressions. > > A straw man: > > // globalDefinitions.hpp > #define jint_add(x, y) jint(juint(x) + juint(y)) > #define jint_mul(x, y) jint(jlong(x) * jlong(y)) Yes, I'm trying it now. I'm not going to use exactly this because the cast from jlong to jint falls into the same trap: it is undefined in the same way. Strictly speaking there is no portable way to do this. However, all real compilers of which I'm aware allow the "union trick" to alias unsigned and signed varieties of the the same integer type. I'm looking at that now. > Thank you very much for poking at this. Static evaluation of > expressions by the JIT is a long-term source of bugs, and there must > be other "clever" uses of overflow in our source base. There are. GCC now has built-in arithmetic with overflow checking so perhaps I should use that. I can do something with the union trick for non-GCC and older GCC. > P.S. In case anybody is thinking it: IMO, it would *not* be good to > use C++ operator overloading to make the static evaluation code look > normal. The special code should have a special look, even if it is > uglier than the normal look. Yes, I was thinking that. Wilco. :-) Andrew. From john.r.rose at oracle.com Tue Dec 1 19:02:06 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 1 Dec 2015 11:02:06 -0800 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: References: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> Message-ID: <68DD38AB-7469-4C82-95B1-D3140C9C3A92@oracle.com> I like it! On Dec 1, 2015, at 6:23 AM, Paul Sandoz wrote: > > * Since the HotSpot VM promotes a non-null component value to constant, it may > * be that the Java memory model would appear to be broken, if such a constant > * (the second value of the field) is used as the value of the field even after > * the field value has changed. s/has changed/has changed (to a third value)/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.tornqvist at oracle.com Tue Dec 1 19:19:17 2015 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Tue, 1 Dec 2015 14:19:17 -0500 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> References: <7e5e2f21-a462-4fb8-8cb2-52f4c9e303fb@default> <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> Message-ID: <701701d12c6d$2cfa8cd0$86efa670$@oracle.com> Hi Paul, Tests in hotspot/test/runtime needs to be jtreg tests. Looking at your tests, I can't see a reason why they can't easily be modified to be jtreg tests instead? (adding the hotspot-dev mail alias) Thanks, Christian -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Paul Sandoz Sent: Tuesday, December 1, 2015 5:28 AM Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net Subject: Re: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables > On 30 Nov 2015, at 23:33, Paul Sandoz wrote: > > >> On 30 Nov 2015, at 23:05, Christian Tornqvist wrote: >> >> Because jtreg is the test framework that we use, we've been working hard to reduce the number of test frameworks in use. >> > > jtreg comes bundled with testng so what is there to reduce? > Here is an analogy: jtreg is to testng as launcher is to library They are complementary to each other i.e. think of testng as an implicit @library that helps one better organize tests and report errors. ? Would i be correct in stating that the HotSpot runtime team is taking a conservative position and does not want to deal with such a library, contrary to other areas of the JDK? Sorry to push back, but I don?t agree with that position (if correct). I am reluctant to change the tests. Please don?t think that complete pigheadedness on my part :-) I just don?t think it?s the right thing to do. If the HotSpot runtime team will not accept the use of TestNG then I suppose I could unblock by proposing to move the tests to the JDK repo, which I would also be reluctant to do since they caught an issue lying dormant for at least 8 years on certain platforms (not covered by the core testset) that existing hotspot tests never caught. Paul. From vladimir.kozlov at oracle.com Tue Dec 1 19:27:57 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 11:27:57 -0800 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565C6943.7050200@redhat.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> Message-ID: <565DF4BD.10907@oracle.com> Since FC date was moved I will push it today through JPRT. Do you mind if I remove "@requires os.arch == "aarch64" and reduce 10000000 loop limit to run the test on all platforms? Thanks, Vladimir On 11/30/15 7:20 AM, Andrew Haley wrote: > On 11/25/2015 06:17 PM, Andrew Haley wrote: >> New webrev at http://cr.openjdk.java.net/~aph/8144028-2/ > > Is this OK to push? It looks pretty low-risk for all non-AArch64 > targets because the test case is AArch64 only. > > Thanks, > > Andrew. > From christian.thalinger at oracle.com Tue Dec 1 19:30:51 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 1 Dec 2015 09:30:51 -1000 Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler In-Reply-To: <4295855A5C1DE049A61835A1887419CC41ED4F07@DEWDFEMB12A.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB41811656722779EB@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41ED4F07@DEWDFEMB12A.global.corp.sap> Message-ID: <018BFB19-5628-4484-87C6-F345A2CFBC3F@oracle.com> > On Nov 27, 2015, at 6:01 AM, Lindenmaier, Goetz wrote: > > Hi, > > could please someone from out of SAP review this? > This being a quite big change I would like to have someone > else to look over it to assure everything is formally correct. > > Also, it contains the few lines needed to enable the C1 build on ppc64 > in the shared linux makefiles. May I push this despite this? > Else we would need a sponsor please. The rules say we have to use JPRT. We could also split it into two changes and you can push the C1 port directly. I can piggyback the Makefile change with something else. > > Thanks, > Goetz. > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net]On Behalf Of Doerr, Martin > Sent: Mittwoch, 25. November 2015 15:06 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler > > Hi, > > we would like to contribute our PPC64 port of the Client Compiler to support Tiered Compilation. > The change includes refactoring of some functionality which is shared between C1 and C2 and some updates. > > The webrev is here: > http://cr.openjdk.java.net/~mdoerr/8144019_ppc64_c1/webrev.00 > > It only changes PPC64 files, with one minor exception: make/linux/Makefile > > Please review. I will also need a sponsor, please. > > Best regards, > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Dec 1 19:37:36 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 1 Dec 2015 09:37:36 -1000 Subject: RFR : 8043467 : JEP 233: Generate Run-Time Compiler Tests Automatically In-Reply-To: <5D7991CE-548D-4701-AA0F-207089B14B1F@oracle.com> References: <5D7991CE-548D-4701-AA0F-207089B14B1F@oracle.com> Message-ID: <968B1AD3-79B7-49F8-B79B-BCE6948182B3@oracle.com> > On Nov 26, 2015, at 3:25 PM, Igor Veresov wrote: > > >> On Nov 26, 2015, at 9:35 AM, Volker Simonis > wrote: >> >> Hi Tatiana, >> >> really a very impressive piece of work! >> >> Even more impressive how Igor an Vladimir could review this within a few hours (I always new you're wizards, guys :) > > That?s because I wrote the initial version of it ten years ago. Although it is substantially refactored and extended with some new functionality now, I have a feel of the context. :) > >> >> But more seriously: the code doesn't seem to contain any documentation at all and only a very homeopathic amount of comments. Would it be possible to add at least some kind of high-level documentation? Otherwise I think it will be very to grasp what's going on and/or maintain/extend this library for people not already involved in the project. >> > > That wouldn?t hurt. There is this high-level description: http://cr.openjdk.java.net/~iveresov/SUN060582-505001.pdf >> It would also be interesting to know if there's any bug you've already found until now with these tests. If yes, which one and if no, why do you expect to find bugs in the future? > > There were a bunch of problems in C2 that it found back in the days. I ran an old version once with Graal and found two problems: https://bugs.openjdk.java.net/browse/GRAAL-10 https://bugs.openjdk.java.net/browse/GRAAL-11 They are both still open and I?m not sure they ever got fixed. > > igor > >> >> Finally, the JEP mentions the generation of tests with bytcode sequences not generated by Javac to stress the VM with legal but uncommon bytecode patterns. I think this is a very good idea, but at a first glance I couldn't see any bytcode generation (at least not with jdk.internal.org.objectweb.asm). Will this feature be implemented later? >> >> Thanks, >> Volker >> >> >> On Wed, Nov 25, 2015 at 12:42 AM, Tatiana Pivovarova > wrote: >> Hi! >> >> Could you please review the code changes for JEP 233: Generate Run-Time Compiler Tests Automatically? >> >> The intent of this project is to develop a tool (JIT-tester) which randomly generates syntactically and semantically correct code and verifies that the results from interpreter is the same the results from compiler. Please find more details in the JEP. >> >> JEP : https://bugs.openjdk.java.net/browse/JDK-8043467 >> webrev: http://cr.openjdk.java.net/~iignatyev/tpivovarova/jep-233/webrev.00/ >> >> Thanks, >> Tatiana >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Dec 1 20:57:42 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 1 Dec 2015 20:57:42 +0000 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565DF4BD.10907@oracle.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> <565DF4BD.10907@oracle.com> Message-ID: <565E09C6.1010700@redhat.com> On 12/01/2015 07:27 PM, Vladimir Kozlov wrote: > Do you mind if I remove "@requires os.arch == "aarch64" and reduce 10000000 loop limit to run the test on all platforms? No, but please make sure the loop limit is low enough to trigger C2 with tiered compilation. Andrew. From vladimir.kozlov at oracle.com Tue Dec 1 21:18:46 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 13:18:46 -0800 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565E09C6.1010700@redhat.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> <565DF4BD.10907@oracle.com> <565E09C6.1010700@redhat.com> Message-ID: <565E0EB6.7000109@oracle.com> Thanks. I will add -Xbatch flag. It will make sure to trigger compilation when threshold is reached. And I will verify. Thanks, Vladimir On 12/1/15 12:57 PM, Andrew Haley wrote: > On 12/01/2015 07:27 PM, Vladimir Kozlov wrote: >> Do you mind if I remove "@requires os.arch == "aarch64" and reduce 10000000 loop limit to run the test on all platforms? > > No, but please make sure the loop limit is low enough to trigger C2 > with tiered compilation. > > Andrew. > From vladimir.kozlov at oracle.com Wed Dec 2 01:32:24 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 17:32:24 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: Message-ID: <565E4A28.5010008@oracle.com> Hotspot changes seems fine. But JDK changes should have additional method for range checks - this is new requirement for intrinsics which access arrays. See, for example, cryptBlockCheck() in AESCrypt.java. Thanks, Vladimir On 11/24/15 2:33 PM, Kharbas, Kishor wrote: > Hello all, > > I request the community to review a patch for enhancing > CounterMode.crypt() for AES. This patch defines intrinsic for > CounterMode.crypt() to leverage the parallel nature of AES in Counter > (CTR) Mode. > > This is achieved by operating on 6 blocks in parallel to issue > independent x86 AES-NI instructions and keep the CPU pipeline full. > > Testing on micro-benchmark has shown a speedup of 4x-6x. > > Bug id: > > https://bugs.openjdk.java.net/browse/JDK-8143925 > > Webrev: > > hotspot: http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.02/ > > jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.01/ > > Much appreciated! > > Kishor Kharbas > From vladimir.kozlov at oracle.com Wed Dec 2 01:48:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 17:48:02 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> Message-ID: <565E4DD2.1030200@oracle.com> This seems fine. 2x is for AVX implementation? Thanks, Vladimir On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: > Hi all > > We would like to contribute a patch from Intel which optimizes > vectorizedMismatch() method in java.util.ArraysSupport.java for X86 > architecture using AVX instructions. > > The improvement gives more than 2x gain over Unsafe implementation for > long arrays. > > > The bug is blocked by bug: vectorized support for array > equals/compare/mismatch using Unsafe > (https://bugs.openjdk.java.net/browse/JDK-8136924.) > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8143355 > webrev: > > http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ > > Thanks and regards, > > Vivek > From vladimir.kozlov at oracle.com Wed Dec 2 01:55:06 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 17:55:06 -0800 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> Message-ID: <565E4F7A.2050607@oracle.com> I reviewed 8143355 today and my main question is where are range checks? Thanks, Vladimir On 11/25/15 1:53 AM, Paul Sandoz wrote: > Hi, > > And this is the review for the Java part: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/webrev/ > > Which will be updated to add @HotSpotIntrinsicCandidate when JDK-8143355 is pushed. [1] > > The plan is all reviewed changes will be pushed to hs-comp and then we follow up: > > 1) adding the intrinsic to other platforms > > 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. > > 3) take a swing at consolidating other equal/compare intrinsics, such as those for char[]/String-based equal/compare > > 4) adding methods to String such as mismatch method. > > I can help by pushing all reviewed patches. I will kick off a JPRT run with all patches applied. > > I did evaluate/test the HotSpot patch (stared at the patch and generated code for UseAVX < 2, and measured) and reviewed with my limited knowledge of HotSpot. > > Paul. > > [1] > diff -r 01b49c2960fd src/java.base/share/classes/java/util/ArraysSupport.java > --- a/src/java.base/share/classes/java/util/ArraysSupport.java Tue Nov 17 15:42:53 2015 +0100 > +++ b/src/java.base/share/classes/java/util/ArraysSupport.java Tue Nov 17 17:05:09 2015 +0100 > @@ -24,7 +24,7 @@ > */ > package java.util; > > -//import jdk.internal.HotSpotIntrinsicCandidate; > +import jdk.internal.HotSpotIntrinsicCandidate; > import jdk.internal.misc.Unsafe; > > class ArraysSupport { > @@ -72,7 +72,7 @@ > * If a mismatch is not found the negation of one plus the number of > * remaining pairs of elements to be checked in the tail of the two arrays. > */ > -// @HotSpotIntrinsicCandidate > + @HotSpotIntrinsicCandidate > static int vectorizedMismatch(Object a, long aOffset, > Object b, long bOffset, > int length, > >> On 25 Nov 2015, at 01:00, Deshpande, Vivek R wrote: >> >> Hi all >> >> We would like to contribute a patch from Intel which optimizes vectorizedMismatch() method in java.util.ArraysSupport.java for X86 architecture using AVX instructions. >> The improvement gives more than 2x gain over Unsafe implementation for long arrays. >> The bug is blocked by bug: vectorized support for array equals/compare/mismatch using Unsafe (https://bugs.openjdk.java.net/browse/JDK-8136924.) >> Could you please review and sponsor this patch. >> >> Bug-id: >> https://bugs.openjdk.java.net/browse/JDK-8143355 >> webrev: >> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >> >> Thanks and regards, >> Vivek > From vladimir.kozlov at oracle.com Wed Dec 2 02:02:06 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 18:02:06 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <565E4DD2.1030200@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> Message-ID: <565E511E.9020503@oracle.com> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. If that is the case the flag should be global. Thanks, Vladimir On 12/1/15 5:48 PM, Vladimir Kozlov wrote: > This seems fine. 2x is for AVX implementation? > > Thanks, > Vladimir > > On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >> Hi all >> >> We would like to contribute a patch from Intel which optimizes >> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >> architecture using AVX instructions. >> >> The improvement gives more than 2x gain over Unsafe implementation for >> long arrays. >> >> >> The bug is blocked by bug: vectorized support for array >> equals/compare/mismatch using Unsafe >> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8143355 >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >> >> Thanks and regards, >> >> Vivek >> From vladimir.kozlov at oracle.com Wed Dec 2 02:06:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 1 Dec 2015 18:06:03 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> Message-ID: <565E520B.8060801@oracle.com> Please send link to new webrev on cr server. Thanks, Vladimir On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please find the webrev with your suggested updates attached with the mail. > We will update it in the jbs entry soon. > Please let me know if it needs further changes. > > Regards, > Vivek > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Tuesday, November 24, 2015 10:22 AM > To: 'joe darcy'; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > HI Vladimir, Joe > > I have done the jtreg tests in hotspot and tests from jdk you have mentioned. It passed those tests. > The ~4x gain is with XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos over without that option. > The performance gain is 3.2x over base jdk, that is over current fsin/fcos intrinsic. This gain is more realistic. > > Could I get those tests around the boundary values. Would WorstCaseTests.java jtreg test in jdk test those ? > If yes, then it has passed those boundary cases. > > I would work on adding either diagnostic flag or just one flag for libm and send out the webrev soon. > > Regards, > Vivek > > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Monday, November 23, 2015 6:28 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Hello, > > Just getting added to the thread.. > > On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >> Thank you, for explanation, Vivek. >> >> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >> Hotspot tests. >> >> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>> result and not exact result. So I added the flag to switch between >>> FDLIBM and LIBM. >>> >>> Quick explanation: >>> This is what we observed with comparison to HPA Library >>> (http://www.nongnu.org/hpalib/) explained with an example. >>> LIBM Observed Math result=0.19457293629570213 (4596178249117717083L) >>> (StrictMath - 1ulp) Required result should be = 0.19457293629570216 >>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>> result is between the above two values and Exact result would be >>> pretty close to it. >>> So here StrictMath result is less than quad-precision result, Math >>> result should be StrictMath or StrictMath + 1ulp and not StrictMath - >>> 1ulp, according to our test. >> >> Note, java.lang.Math allows to have 1ulp off (in both direction, I >> think) and it should be consistent for Interpreter and code generated >> by JIT compilers: >> >> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28dou >> ble%29 >> > > That interpretation of the spec is not quite right. For the Math methods with a 1/2 ulp error bound, the floating-point result closest to the exact result must be returned. For the methods with a 1 ulp error bound, either of the floating-point result bracketing the true result can be returned, subject to the monotonicity constraints of the specification of the particular method. > >> >>> >>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter would >>> go through LIBM and C1 and c2 through FDLIBM. >>> If we want to disable LIBM completely, we need the flags >>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >> >> I was thinking about using existing >> DirectiveSet::is_intrinsic_disabled() and >> vmIntrinsics::is_disabled_by_flags(). You need to add additional >> versions of functions which accept intrinsic ID instead of methodHandle. >> >> If you still want to use flags make them diagnostic. >> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >> >>> >>> Also the performance gain ~4x is with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin/_dcos. >> >> You confused me here. So you get 4x when only Interpreter use LIBM >> code and compilers use FDLIB? > > Just to be clear, are you comparing the new code to FDLIBM (StrictMath) or to the existing fsin/fcos instrinsics (Math)? > > I'm part way through porting the FDLIBM code to Java (JDK-8134780: Port fdlibm to Java), which is providing a significant speed boost to the StrictMath methods that have been ported. > > I find the current patch *insufficient* as-is in terms of its testing. > For example, part of patch says > > # For sin > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > # For cos > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > If nothing else, there are no tests at around those boundary values, which is unacceptable. There should also be some tests of values of interest to the algorithm in question. > > Cheers, > > -Joe > > >> >> Thanks, >> Vladimir >> >>> >>> Let me know your thoughts on this. I would answer more questions and >>> give more data if needed. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, November 23, 2015 10:37 AM >>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>> Cc: Viswanathan, Sandhya >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>> What is the reason you decided to add new flags? exp() and log() >>>> changes did not have flags. >>>> >>>> It would be interesting to see what happens if you disable >>>> intrinsics using existing flag, for example: >>>> >>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>> >>> Hi Vivek, >>> >>> I want to point that you can do this experiment later. We can file >>> bugs and fixed them after FC. >>> >>> For now, please, answer my question about flags only. This is the >>> only thing holding it from push. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>> Hi all >>>>> >>>>> I would like to contribute a patch which optimizes Math.sin() and >>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>> implementation. >>>>> >>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>> >>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>> and -XX:+UseLibmCosIntrinsic. >>>>> >>>>> Could you please review and sponsor this patch. >>>>> >>>>> Bug-id: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>> >>>>> Thanks and regards, >>>>> >>>>> Vivek >>>>> From john.r.rose at oracle.com Wed Dec 2 02:29:40 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 1 Dec 2015 18:29:40 -0800 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <565599BB.2000109@gmail.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> <565599BB.2000109@gmail.com> Message-ID: <75C90F26-9AED-4351-A0BB-DD50E8022083@oracle.com> On Nov 25, 2015, at 3:21 AM, Peter Levart wrote: > > The mentioning of "reference component types" in javadoc for vectorizedMismatch: > > 52 /** > 53 * Find the relative index of the first mismatching pair of elements in two > 54 * arrays of the same component type. For reference component types the > 55 * reference values (as opposed to what they reference) will be matched. > 56 * Pairs of elements will be tested in order relative to given offsets into > 57 * both arrays. > > ... is probably a left-over, since there is no caller of the method using reference arrays (yet?). You should probably mention that doing so is not safe. As you might know, object pointers are a "movable target". GC can rewrite them as it moves objects around and if you "cast" an object pointer (or two of them) into a long value and store it in a long variable, GC can't track that and update the value, so you might be comparing an old address of an object with new address of the same object and get a mismatch. > > I don't know much about intrinsified code. Whether it can be interrupted by GC or not, so it might be able to compare object references directly, but then the bytecode version of the method would have to have a special-case for reference arrays if it is executed in this form anytime. Here's the scoop on reading references as ints or longs: Just don't. Calling Unsafe.getLong or Unsafe.getInt on a reference variable is never a good idea. The reason is that the GC is allowed to run between the moment the reference is picked up (as a bit image in a primitive value) and the moment something is done with it. This is rare but if it happens, it will cause the bit image to be stale, with unpredictable results. Can the GC execute if the live interval of the bit image is very, very short? Probably not if the code was compiled, but remember that the code might sometimes be interpreted also, even after it is compiled. In the case of array-mismatch, comparing a broken bit-image of a pointer against another non-broken one might produce a false equality result. Very, very rarely. And won't that be a nice surprise for someone? ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Wed Dec 2 08:52:12 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 09:52:12 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <701701d12c6d$2cfa8cd0$86efa670$@oracle.com> References: <7e5e2f21-a462-4fb8-8cb2-52f4c9e303fb@default> <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> <701701d12c6d$2cfa8cd0$86efa670$@oracle.com> Message-ID: <9C417D00-F022-4CF6-87D3-0AF74CCD7441@oracle.com> Hi Christian, > On 1 Dec 2015, at 20:19, Christian Tornqvist wrote: > > Hi Paul, > > Tests in hotspot/test/runtime needs to be jtreg tests. They are jtreg tests. They are require to be run (re: ?launched") with jtreg see: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/webrev/test/runtime/Unsafe/JdkInternalMiscUnsafeAccessTestBoolean.java.html 24 /* 25 * @test 26 * @bug 8143628 27 * @summary Test unsafe access for boolean 28 * @modules java.base/jdk.internal.misc 29 * @run testng/othervm -Diters=100 -Xint JdkInternalMiscUnsafeAccessTestBoolean 30 * @run testng/othervm -Diters=20000 -XX:TieredStopAtLevel=1 JdkInternalMiscUnsafeAccessTestBoolean 31 * @run testng/othervm -Diters=20000 -XX:-TieredCompilation JdkInternalMiscUnsafeAccessTestBoolean 32 * @run testng/othervm -Diters=20000 JdkInternalMiscUnsafeAccessTestBoolean 33 */ That?s the point i was making with: jtreg is to testng as launcher is to library Note the use of the "@modules java.base/jdk.internal.misc?. That?s gonna be important later on. > Looking at your tests, I can't see a reason why they can't easily be modified to be jtreg tests instead? That?s not the point. There is a principle here about what test libraries one can or cannot use with the test in a particular area of a particular repo. At the moment i am not hearing any consistent and solid technical argument as to why testng cannot be used for HotSpot runtime tests. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From paul.sandoz at oracle.com Wed Dec 2 09:10:37 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 10:10:37 +0100 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <565E4F7A.2050607@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> <565E4F7A.2050607@oracle.com> Message-ID: <938E28F6-2FB9-46E5-B13F-D4535112F81E@oracle.com> > On 2 Dec 2015, at 02:55, Vladimir Kozlov wrote: > > I reviewed 8143355 today and my main question is where are range checks? > In this case the range checks are performed by the methods in Arrays, which call non-checking type-specific methods in ArraysSupport that in turn call vectorizedMismatch e.g: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/webrev/src/java.base/share/classes/java/util/Arrays.java.sdiff.html 2861 @HotSpotIntrinsicCandidate 2862 public static boolean equals(byte[] a, byte[] a2) { 2863 if (a==a2) 2864 return true; 2865 if (a==null || a2==null) 2866 return false; 2867 2868 int length = a.length; 2869 if (a2.length != length) 2870 return false; 2871 2872 return ArraysSupport.mismatch(a, a2, length) < 0; 2873 } 2907 public static boolean equals(byte[] a, int aFromIndex, int aToIndex, 2908 byte[] b, int bFromIndex, int bToIndex) { 2909 rangeCheck(a.length, aFromIndex, aToIndex); 2910 rangeCheck(b.length, bFromIndex, bToIndex); 2911 2912 int aLength = aToIndex - aFromIndex; 2913 int bLength = bToIndex - bFromIndex; 2914 if (aLength != bLength) 2915 return false; 2916 2917 return ArraysSupport.mismatch(a, aFromIndex, 2918 b, bFromIndex, 2919 aLength) < 0; 2920 } 5875 public static int compare(byte[] a, byte[] b) { 5876 if (a == b) 5877 return 0; 5878 if (a == null || b == null) 5879 return a == null ? -1 : 1; 5880 5881 int i = ArraysSupport.mismatch(a, b, 5882 Math.min(a.length, b.length)); 5883 if (i >= 0) { 5884 return Byte.compare(a[i], b[i]); 5885 } 5886 5887 return a.length - b.length; 5888 } 5950 public static int compare(byte[] a, int aFromIndex, int aToIndex, 5951 byte[] b, int bFromIndex, int bToIndex) { 5952 rangeCheck(a.length, aFromIndex, aToIndex); 5953 rangeCheck(b.length, bFromIndex, bToIndex); 5954 5955 int aLength = aToIndex - aFromIndex; 5956 int bLength = bToIndex - bFromIndex; 5957 int i = ArraysSupport.mismatch(a, aFromIndex, 5958 b, bFromIndex, 5959 Math.min(aLength, bLength)); 5960 if (i >= 0) { 5961 return Byte.compare(a[aFromIndex + i], b[bFromIndex + i]); 5962 } 5963 5964 return aLength - bLength; 5965 } There are existing tests in place verifying that exceptions are thrown for out of bounds conditions. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From paul.sandoz at oracle.com Wed Dec 2 09:15:28 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 10:15:28 +0100 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <565E511E.9020503@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> Message-ID: <68450F0D-3E91-4767-A995-F254C1EEEC52@oracle.com> > On 2 Dec 2015, at 03:02, Vladimir Kozlov wrote: > > 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. > > If that is the case the flag should be global. > Ah, so IIIUC this: +++ new/src/share/vm/opto/c2_globals.hpp 2015-11-23 14:52:06.559779600 -0800 @@ -727,6 +727,9 @@ product(bool, UseMontgomerySquareIntrinsic, false, \ "Enables intrinsification of BigInteger.montgomerySquare()") \ \ + product(bool, UseVectorizedMismatchIntrinsic, false, \ + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ Should be moved (for later use) to src/share/vm/runtime/globals.cpp Thanks, Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From aph at redhat.com Wed Dec 2 09:45:48 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Dec 2015 09:45:48 +0000 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565E0EB6.7000109@oracle.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> <565DF4BD.10907@oracle.com> <565E09C6.1010700@redhat.com> <565E0EB6.7000109@oracle.com> Message-ID: <565EBDCC.6010509@redhat.com> On 01/12/15 21:18, Vladimir Kozlov wrote: > Thanks. I will add -Xbatch flag. It will make sure to trigger compilation when threshold is reached. And I will verify. That does not work, I'm afraid. When -Xbatch is used, C2 does not generate the instructions I'm trying to test. The problem is that it generates conditional branches and moves instead of CMove. Why should it do this? Maybe the profile counts are different, but I don't think they are. Andrew. From paul.sandoz at oracle.com Wed Dec 2 10:16:05 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 11:16:05 +0100 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <68DD38AB-7469-4C82-95B1-D3140C9C3A92@oracle.com> References: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> <68DD38AB-7469-4C82-95B1-D3140C9C3A92@oracle.com> Message-ID: <3CC31D70-D544-4F87-B389-11BD82E22294@oracle.com> > On 1 Dec 2015, at 20:02, John Rose wrote: > > I like it! > > On Dec 1, 2015, at 6:23 AM, Paul Sandoz > wrote: >> >> * Since the HotSpot VM promotes a non-null component value to constant, it may >> * be that the Java memory model would appear to be broken, if such a constant >> * (the second value of the field) is used as the value of the field even after >> * the field value has changed. > > s/has changed/has changed (to a third value)/ > Thanks, webrev updated. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From goetz.lindenmaier at sap.com Wed Dec 2 10:21:26 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 2 Dec 2015 10:21:26 +0000 Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler In-Reply-To: <018BFB19-5628-4484-87C6-F345A2CFBC3F@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB41811656722779EB@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41ED4F07@DEWDFEMB12A.global.corp.sap> <018BFB19-5628-4484-87C6-F345A2CFBC3F@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41ED9D8F@DEWDFEMB12A.global.corp.sap> Hi Christian, I understand we need to follow the rules. But then I would like to keep it as one change. That's just cleaner. Best regards, Goetz. From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Dienstag, 1. Dezember 2015 20:31 To: Lindenmaier, Goetz Cc: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler On Nov 27, 2015, at 6:01 AM, Lindenmaier, Goetz > wrote: Hi, could please someone from out of SAP review this? This being a quite big change I would like to have someone else to look over it to assure everything is formally correct. Also, it contains the few lines needed to enable the C1 build on ppc64 in the shared linux makefiles. May I push this despite this? Else we would need a sponsor please. The rules say we have to use JPRT. We could also split it into two changes and you can push the C1 port directly. I can piggyback the Makefile change with something else. Thanks, Goetz. From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net]On Behalf Of Doerr, Martin Sent: Mittwoch, 25. November 2015 15:06 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler Hi, we would like to contribute our PPC64 port of the Client Compiler to support Tiered Compilation. The change includes refactoring of some functionality which is shared between C1 and C2 and some updates. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144019_ppc64_c1/webrev.00 It only changes PPC64 files, with one minor exception: make/linux/Makefile Please review. I will also need a sponsor, please. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Dec 2 10:32:23 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 2 Dec 2015 13:32:23 +0300 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <3CC31D70-D544-4F87-B389-11BD82E22294@oracle.com> References: <56A7FCB0-5AE6-438C-8A7C-FED9D39878E4@oracle.com> <68DD38AB-7469-4C82-95B1-D3140C9C3A92@oracle.com> <3CC31D70-D544-4F87-B389-11BD82E22294@oracle.com> Message-ID: <565EC8B7.9010100@oracle.com> Looks good. Best regards, Vladimir Ivanov On 12/2/15 1:16 PM, Paul Sandoz wrote: > >> On 1 Dec 2015, at 20:02, John Rose > > wrote: >> >> I like it! >> >> On Dec 1, 2015, at 6:23 AM, Paul Sandoz > > wrote: >>> >>> * Since the HotSpot VM promotes a non-null component value to >>> constant, it may >>> * be that the Java memory model would appear to be broken, if such a >>> constant >>> * (the second value of the field) is used as the value of the field >>> even after >>> * the field value has changed. >> >> s/has changed/has changed (to a third value)/ >> > > Thanks, webrev updated. > > Paul. From aph at redhat.com Wed Dec 2 11:13:39 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Dec 2015 11:13:39 +0000 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565EBDCC.6010509@redhat.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> <565DF4BD.10907@oracle.com> <565E09C6.1010700@redhat.com> <565E0EB6.7000109@oracle.com> <565EBDCC.6010509@redhat.com> Message-ID: <565ED263.3030409@redhat.com> On 02/12/15 09:45, Andrew Haley wrote: > On 01/12/15 21:18, Vladimir Kozlov wrote: >> Thanks. I will add -Xbatch flag. It will make sure to trigger compilation when threshold is reached. And I will verify. > > That does not work, I'm afraid. When -Xbatch is used, C2 does not > generate the instructions I'm trying to test. The problem is that > it generates conditional branches and moves instead of CMove. > > Why should it do this? Maybe the profile counts are different, > but I don't think they are. Here is the good code, without -Xbatch: ;; B2: # N40 <- B1 Freq: 0.999999 0x0000007fa8daf4c0: eor x11, x11, x11, lsl #13 ;*lxor {reexecute=0 rethrow=0 return_oop=0} ; - XorShift::nextLong at 12 (line 159) ; - BitTests::testLongMaskBranch at 4 (line 101) 0x0000007fa8daf4c4: eor x11, x11, x11, lsr #17 ;*lxor {reexecute=0 rethrow=0 return_oop=0} ; - XorShift::nextLong at 28 (line 160) ; - BitTests::testLongMaskBranch at 4 (line 101) 0x0000007fa8daf4c8: eor x11, x11, x11, lsl #5 ;*lxor {reexecute=0 rethrow=0 return_oop=0} ; - XorShift::nextLong at 43 (line 161) ; - BitTests::testLongMaskBranch at 4 (line 101) 0x0000007fa8daf4cc: tst x3, x11 0x0000007fa8daf4d0: str x11, [x10,#16] ;*invokevirtual nextLong {reexecute=0 rethrow=0 return_oop=0} ; - BitTests::testLongMaskBranch at 4 (line 101) 0x0000007fa8daf4d4: add x10, x2, #0x1 0x0000007fa8daf4d8: csel x0, x10, x2, ne ;*lload_1 {reexecute=0 rethrow=0 return_oop=0} ; - BitTests::testLongMaskBranch at 18 (line 104) Note that the call to XorShift::nextLong has been nicely inlined, and if ((((int)r.nextLong() & mask) != 0)) { counter++; } generates simply a TST, and ADD, and a CSEL. (It's the TST instruction that I need to be executed for this testcase.) Here is the bad code, with -Xbatch: ;; B2: # B8 B3 <- B1 Freq: 0.999999 0x0000007f851d92c8: bl 0x0000007f85148300 ; ImmutableOopMap{} ;*invokevirtual nextLong {reexecute=0 rethrow=0 return_oop=0} ; - BitTests::testLongMaskBranch at 4 (line 101) ; {optimized virtual_call} ;; B3: # B6 B4 <- B2 Freq: 0.999979 0x0000007f851d92cc: ldr x10, [sp] 0x0000007f851d92d0: and x10, x0, x10 0x0000007f851d92d4: cbz x10, 0x0000007f851d92f0 ;*ifeq {reexecute=0 rethrow=0 return_oop=0} ; - BitTests::testLongMaskBranch at 11 (line 101) ;; B4: # B5 <- B3 Freq: 0.899981 0x0000007f851d92d8: add x0, xfp, #0x1 ;*lload_1 {reexecute=0 rethrow=0 return_oop=0} ; - BitTests::testLongMaskBranch at 18 (line 104) ;; B5: # N62 <- B4 B6 Freq: 0.999979 0x0000007f851d92dc: ldp xfp, xlr, [sp,#32] 0x0000007f851d92e0: add sp, sp, #0x30 0x0000007f851d92e4: adrp xscratch1, 0x0000007f8f099000 ; {poll_return} 0x0000007f851d92e8: ldr wzr, [xscratch1] ; {poll_return} 0x0000007f851d92ec: ret ;; B6: # B5 <- B3 Freq: 0.0999979 0x0000007f851d92f0: mov x0, xfp 0x0000007f851d92f4: b 0x0000007f851d92dc where it doesn't inline a trivial function and it doesn't CSEL either. This looks like C1 code, but it's not. Is C2 trying to do a quick-and-dirty compilation or something? This pattern (great code without -Xbatch, poor code with -Xbatch) is repeated throughout these tests. Andrew. From roland.schatz at oracle.com Wed Dec 2 11:23:37 2015 From: roland.schatz at oracle.com (Roland Schatz) Date: Wed, 2 Dec 2015 12:23:37 +0100 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565DDFC1.7020006@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> Message-ID: <565ED4B9.3020003@oracle.com> Sorry for noticing it that late, but there seems to be no implementation for src/cpu/aarch64/vm/jvmciCodeInstaller_aarch64.cpp... Of course the stuff in there could be implemented in a separate patch. It's rather hard to test that without an actual compiler. Perhaps we should have a series of test analogous to the test/compiler/jvmci/errors tests, but for "working" instead of "broken" code installation. For that we would need a platform dependent "fake" compiler (e.g. handwritten assembly for well-known test methods). - Roland On 12/01/2015 06:58 PM, Andrew Haley wrote: > On 12/01/2015 05:55 PM, Christian Thalinger wrote: >>> On Dec 1, 2015, at 4:40 AM, Andrew Haley wrote: >>> >>> On 12/01/2015 12:53 PM, Roland Schatz wrote: >>>> Regarding the MASK* kinds: Does it AArch64 really have a dedicated mask >>>> datatype, like AVX512? >>>> There seem to be no registers where the MASK kinds can be stored. >>>> AArch64.canStoreValue(...) returns false for all registers for the MASK >>>> kinds, so they can't be really used. >>> No, that's just a hangover from the x86 version. >> Are you sending a new webrev? > I was waiting to see if there were any more problems. > > Andrew. > > From aph at redhat.com Wed Dec 2 11:33:30 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Dec 2015 11:33:30 +0000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565ED4B9.3020003@oracle.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> Message-ID: <565ED70A.9060509@redhat.com> On 02/12/15 11:23, Roland Schatz wrote: > Sorry for noticing it that late, but there seems to be no implementation > for src/cpu/aarch64/vm/jvmciCodeInstaller_aarch64.cpp... > Of course the stuff in there could be implemented in a separate patch. > It's rather hard to test that without an actual compiler. > > Perhaps we should have a series of test analogous to the > test/compiler/jvmci/errors tests, but for "working" instead of "broken" > code installation. For that we would need a platform dependent "fake" > compiler (e.g. handwritten assembly for well-known test methods). Maybe. But if there is no way to actually exercise the code which is in HotSpot, why is it there? Andrew. From roland.schatz at oracle.com Wed Dec 2 12:09:08 2015 From: roland.schatz at oracle.com (Roland Schatz) Date: Wed, 2 Dec 2015 13:09:08 +0100 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565ED70A.9060509@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> Message-ID: <565EDF64.7080504@oracle.com> On 12/02/2015 12:33 PM, Andrew Haley wrote: > On 02/12/15 11:23, Roland Schatz wrote: >> Perhaps we should have a series of test analogous to the >> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >> code installation. For that we would need a platform dependent "fake" >> compiler (e.g. handwritten assembly for well-known test methods). > Maybe. But if there is no way to actually exercise the code which > is in HotSpot, why is it there? It's an interface for compilers. You can exercise the code, you just have to write a compiler ;) For example, the Graal compiler is using that interface. But there is no (working) aarch64 backend for Graal. For the tests, I can only answer for the tests that I wrote, that is the test/compiler/jvmci/error/* tests. They are explicitly disabled on aarch64. I agree we should have more tests. - Roland > > Andrew. > From aph at redhat.com Wed Dec 2 12:22:15 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Dec 2015 12:22:15 +0000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565EDF64.7080504@oracle.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> Message-ID: <565EE277.7030907@redhat.com> On 02/12/15 12:09, Roland Schatz wrote: > On 12/02/2015 12:33 PM, Andrew Haley wrote: >> On 02/12/15 11:23, Roland Schatz wrote: >>> Perhaps we should have a series of test analogous to the >>> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >>> code installation. For that we would need a platform dependent "fake" >>> compiler (e.g. handwritten assembly for well-known test methods). >> Maybe. But if there is no way to actually exercise the code which >> is in HotSpot, why is it there? > It's an interface for compilers. You can exercise the code, you just > have to write a compiler ;) Sure, but I don't see why we can't have a tiny compiler in the test suite. Andrew. From aph at redhat.com Wed Dec 2 13:41:40 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 2 Dec 2015 13:41:40 +0000 Subject: RFR: 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <565ED263.3030409@redhat.com> References: <5655D888.8030701@redhat.com> <5655EBDD.5090800@oracle.com> <5655ECA5.2070203@redhat.com> <5655FB4F.6060103@redhat.com> <565C6943.7050200@redhat.com> <565DF4BD.10907@oracle.com> <565E09C6.1010700@redhat.com> <565E0EB6.7000109@oracle.com> <565EBDCC.6010509@redhat.com> <565ED263.3030409@redhat.com> Message-ID: <565EF514.6090904@redhat.com> Please ignore that: I think that there might be something wrong with the profile counters on AArch64. The probability of the branch in the conditional move is 0.899981, which is very close to the threshold. But I know that the real probability is close to 0.5. I'm guessing that the interpreter profiling is wrong. Andrew. From roland.westrelin at oracle.com Wed Dec 2 14:36:39 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 2 Dec 2015 15:36:39 +0100 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs In-Reply-To: <3A7403D7-AE1E-46F0-84CB-541717563816@oracle.com> References: <3A7403D7-AE1E-46F0-84CB-541717563816@oracle.com> Message-ID: <09D53123-A1E4-478E-B186-3F77C166FA7A@oracle.com> Hi Paul, > Can you update the test so it uses jdk.internal.misc.Unsafe rather than sun.misc.Unsafe? I will do that. Thanks for looking at this. Roland. From roland.westrelin at oracle.com Wed Dec 2 14:37:34 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 2 Dec 2015 15:37:34 +0100 Subject: RFR(XS): 8143930: C1 LinearScan asserts when compiling two back-to-back CompareAndSwapLongs In-Reply-To: <565DE402.3060101@oracle.com> References: <565DE402.3060101@oracle.com> Message-ID: <7A64A474-049F-4DC5-9731-B1FE4FF18E28@oracle.com> Thanks for the reviews, Vladimir and Aleksey. Roland. From vladimir.kozlov at oracle.com Wed Dec 2 16:02:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Dec 2015 08:02:14 -0800 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <938E28F6-2FB9-46E5-B13F-D4535112F81E@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> <565E4F7A.2050607@oracle.com> <938E28F6-2FB9-46E5-B13F-D4535112F81E@oracle.com> Message-ID: <565F1606.70606@oracle.com> Thank you, Paul. This looks good. Vladimir On 12/2/15 1:10 AM, Paul Sandoz wrote: > >> On 2 Dec 2015, at 02:55, Vladimir Kozlov wrote: >> >> I reviewed 8143355 today and my main question is where are range checks? >> > > In this case the range checks are performed by the methods in Arrays, which call non-checking type-specific methods in ArraysSupport that in turn call vectorizedMismatch e.g: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/webrev/src/java.base/share/classes/java/util/Arrays.java.sdiff.html > > 2861 @HotSpotIntrinsicCandidate > 2862 public static boolean equals(byte[] a, byte[] a2) { > 2863 if (a==a2) > 2864 return true; > 2865 if (a==null || a2==null) > 2866 return false; > 2867 > 2868 int length = a.length; > 2869 if (a2.length != length) > 2870 return false; > 2871 > 2872 return ArraysSupport.mismatch(a, a2, length) < 0; > 2873 } > > > 2907 public static boolean equals(byte[] a, int aFromIndex, int aToIndex, > 2908 byte[] b, int bFromIndex, int bToIndex) { > 2909 rangeCheck(a.length, aFromIndex, aToIndex); > 2910 rangeCheck(b.length, bFromIndex, bToIndex); > 2911 > 2912 int aLength = aToIndex - aFromIndex; > 2913 int bLength = bToIndex - bFromIndex; > 2914 if (aLength != bLength) > 2915 return false; > 2916 > 2917 return ArraysSupport.mismatch(a, aFromIndex, > 2918 b, bFromIndex, > 2919 aLength) < 0; > 2920 } > > > 5875 public static int compare(byte[] a, byte[] b) { > 5876 if (a == b) > 5877 return 0; > 5878 if (a == null || b == null) > 5879 return a == null ? -1 : 1; > 5880 > 5881 int i = ArraysSupport.mismatch(a, b, > 5882 Math.min(a.length, b.length)); > 5883 if (i >= 0) { > 5884 return Byte.compare(a[i], b[i]); > 5885 } > 5886 > 5887 return a.length - b.length; > 5888 } > > > 5950 public static int compare(byte[] a, int aFromIndex, int aToIndex, > 5951 byte[] b, int bFromIndex, int bToIndex) { > 5952 rangeCheck(a.length, aFromIndex, aToIndex); > 5953 rangeCheck(b.length, bFromIndex, bToIndex); > 5954 > 5955 int aLength = aToIndex - aFromIndex; > 5956 int bLength = bToIndex - bFromIndex; > 5957 int i = ArraysSupport.mismatch(a, aFromIndex, > 5958 b, bFromIndex, > 5959 Math.min(aLength, bLength)); > 5960 if (i >= 0) { > 5961 return Byte.compare(a[aFromIndex + i], b[bFromIndex + i]); > 5962 } > 5963 > 5964 return aLength - bLength; > 5965 } > > > There are existing tests in place verifying that exceptions are thrown for out of bounds conditions. > > Paul. > From peter.levart at gmail.com Wed Dec 2 16:16:41 2015 From: peter.levart at gmail.com (Peter Levart) Date: Wed, 2 Dec 2015 17:16:41 +0100 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <3F081061-A73D-43D3-B8BC-EB7DD4891D4F@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> <3F081061-A73D-43D3-B8BC-EB7DD4891D4F@oracle.com> Message-ID: <565F1969.5060004@gmail.com> Hi Paul, Just a nit more: 120 int valuesPerWidth = LOG2_ARRAY_LONG_INDEX_SCALE - log2ArrayIndexScale; Would it be more correct to call that variable log2ValuesPerWidth? Regards, Peter On 11/30/2015 04:21 PM, Paul Sandoz wrote: >> On 25 Nov 2015, at 10:53, Paul Sandoz wrote: >> >> Hi, >> >> And this is the review for the Java part: >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8136924-arrays-mismatch-vectorized-unsafe/webrev/ >> >> Which will be updated to add @HotSpotIntrinsicCandidate when JDK-8143355 is pushed. [1] >> >> The plan is all reviewed changes will be pushed to hs-comp and then we follow up: >> >> 1) adding the intrinsic to other platforms >> >> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >> >> 3) take a swing at consolidating other equal/compare intrinsics, such as those for char[]/String-based equal/compare >> >> 4) adding methods to String such as mismatch method. >> >> I can help by pushing all reviewed patches. I will kick off a JPRT run with all patches applied. >> > JPRT runs for both core and hotspot tests report no issues. > > Paul. > >> I did evaluate/test the HotSpot patch (stared at the patch and generated code for UseAVX < 2, and measured) and reviewed with my limited knowledge of HotSpot. >> >> Paul. >> From stefan.sarne at oracle.com Wed Dec 2 16:20:42 2015 From: stefan.sarne at oracle.com (=?UTF-8?Q?Stefan_S=c3=a4rne?=) Date: Wed, 2 Dec 2015 17:20:42 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <9C417D00-F022-4CF6-87D3-0AF74CCD7441@oracle.com> References: <7e5e2f21-a462-4fb8-8cb2-52f4c9e303fb@default> <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> <701701d12c6d$2cfa8cd0$86efa670$@oracle.com> <9C417D00-F022-4CF6-87D3-0AF74CCD7441@oracle.com> Message-ID: <565F1A5A.4040509@oracle.com> Hi Paul, The reason we stick on standard jtreg tests is because it is simpler. For us, a java test is not a unit test, it is an application. :) I agree with you that when writing and debugging java code, I would choose testng over jtreg and run and debug it inside my java IDE. But debugging the VM is instead done with a native debugger and what the framework gives you for java development, becomes a level of indirection in VM land. Just adding the test class as argument to the java launcher where a main method exists is preferred. Cheers, Stefan Den 2015-12-02 kl. 09:52, skrev Paul Sandoz: > Hi Christian, > >> On 1 Dec 2015, at 20:19, Christian Tornqvist >> > > wrote: >> >> Hi Paul, >> >> Tests in hotspot/test/runtime needs to be jtreg tests. > > They are jtreg tests. They are require to be run (re: ?launched") with > jtreg see: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/webrev/test/runtime/Unsafe/JdkInternalMiscUnsafeAccessTestBoolean.java.html > > 24 /* > 25 * @test > 26 * @bug 8143628 > 27 * @summary Test unsafe access for boolean > 28 * @modules java.base/jdk.internal.misc > 29 * @run testng/othervm -Diters=100 -Xint JdkInternalMiscUnsafeAccessTestBoolean > 30 * @run testng/othervm -Diters=20000 -XX:TieredStopAtLevel=1 JdkInternalMiscUnsafeAccessTestBoolean > 31 * @run testng/othervm -Diters=20000 -XX:-TieredCompilation JdkInternalMiscUnsafeAccessTestBoolean > 32 * @run testng/othervm -Diters=20000 JdkInternalMiscUnsafeAccessTestBoolean > 33 */ > > That?s the point i was making with: > > jtreg is to testng as launcher is to library > > Note the use of the "@modules java.base/jdk.internal.misc?. That?s > gonna be important later on. > > >> Looking at your tests, I can't see a reason why they can't easily be >> modified to be jtreg tests instead? > > That?s not the point. There is a principle here about what test > libraries one can or cannot use with the test in a particular area of > a particular repo. At the moment i am not hearing any consistent and > solid technical argument as to why testng cannot be used for HotSpot > runtime tests. > > Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.vorobyev at oracle.com Wed Dec 2 16:27:15 2015 From: alexander.vorobyev at oracle.com (alexander vorobyev) Date: Wed, 2 Dec 2015 19:27:15 +0300 Subject: RFR 8079667: port vm/compiler/AESIntrinsics/CheckIntrinsics into jtreg In-Reply-To: <562FAFB6.7080305@oracle.com> References: <562FAFB6.7080305@oracle.com> Message-ID: <565F1BE3.2030806@oracle.com> Hello Please review a bug fix. JBS entry: https://bugs.openjdk.java.net/browse/JDK-8079667 Webrev: http://cr.openjdk.java.net/~kshefov/8079667 This fix has been already approved - http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-September/018846.html But it was blocked at that time by https://bugs.openjdk.java.net/browse/JDK-8131778. JDK-8131778 is fixed now so please take a look at it again. Webrev is slightly different because of changes caused by JDK-8131778 - messages (which this test checks) returned by Java have changed a bit. Testing done: see information for original fix, also tested locally Thanks Alexander Vorobyev -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Wed Dec 2 16:59:04 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 17:59:04 +0100 Subject: RFR 8136924 Vectorized support for array equals/compare/mismatch using Unsafe In-Reply-To: <565F1969.5060004@gmail.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <7F18EFC4-0E53-4FD2-BF56-2219CFEC597E@oracle.com> <3F081061-A73D-43D3-B8BC-EB7DD4891D4F@oracle.com> <565F1969.5060004@gmail.com> Message-ID: > On 2 Dec 2015, at 17:16, Peter Levart wrote: > > Hi Paul, > > Just a nit more: > > 120 int valuesPerWidth = LOG2_ARRAY_LONG_INDEX_SCALE - log2ArrayIndexScale; > > Would it be more correct to call that variable log2ValuesPerWidth? > Yes, good point. Updated. I also used the constant LOG2_ARRAY_INT_INDEX_SCALE instead of 2, here: 138 if (log2ArrayIndexScale < LOG2_ARRAY_INT_INDEX_SCALE) { Thanks, Paul. > Regards, Peter > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vladimir.kozlov at oracle.com Wed Dec 2 17:13:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Dec 2015 09:13:26 -0800 Subject: RFR 8079667: port vm/compiler/AESIntrinsics/CheckIntrinsics into jtreg In-Reply-To: <565F1BE3.2030806@oracle.com> References: <562FAFB6.7080305@oracle.com> <565F1BE3.2030806@oracle.com> Message-ID: <565F26B6.6060703@oracle.com> Good. Thanks, Vladimir On 12/2/15 8:27 AM, alexander vorobyev wrote: > Hello > Please review a bug fix. > JBS entry: https://bugs.openjdk.java.net/browse/JDK-8079667 > Webrev: http://cr.openjdk.java.net/~kshefov/8079667 > > This fix has been already approved - > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-September/018846.html > But it was blocked at that time by > https://bugs.openjdk.java.net/browse/JDK-8131778. > JDK-8131778 is fixed now so please take a look at it again. Webrev is slightly different because of changes caused by > JDK-8131778 - messages (which this test checks) returned by Java have changed a bit. > > Testing done: see information for original fix, also tested locally > > Thanks > Alexander Vorobyev > From christian.thalinger at oracle.com Wed Dec 2 17:58:25 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 2 Dec 2015 07:58:25 -1000 Subject: RFR (XXS): 8144521: [JVMCI] JVMCI is built on 32-bit Windows compiler2 and tiered builds Message-ID: https://bugs.openjdk.java.net/browse/JDK-8144521 Currently we build JVMCI for 32-bit Windows targets but we shouldn?t. diff -r 8578909eeef4 make/windows/makefiles/vm.make --- a/make/windows/makefiles/vm.make Thu Nov 26 10:38:33 2015 +0000 +++ b/make/windows/makefiles/vm.make Wed Dec 02 07:41:29 2015 -1000 @@ -45,10 +45,16 @@ CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D !if "$(Variant)" == "compiler2" CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER2" +!if "$(BUILDARCH)" == "i486" +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 +!endif !endif !if "$(Variant)" == "tiered" CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D "COMPILER2" +!if "$(BUILDARCH)" == "i486" +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 +!endif !endif !if "$(BUILDARCH)" == "i486" -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Dec 2 18:43:47 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Dec 2015 10:43:47 -0800 Subject: RFR (XXS): 8144521: [JVMCI] JVMCI is built on 32-bit Windows compiler2 and tiered builds In-Reply-To: References: Message-ID: <565F3BE3.4030303@oracle.com> Looks good. Vladimir On 12/2/15 9:58 AM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8144521 > > Currently we build JVMCI for 32-bit Windows targets but we shouldn?t. > > diff -r 8578909eeef4 make/windows/makefiles/vm.make > --- a/make/windows/makefiles/vm.makeThu Nov 26 10:38:33 2015 +0000 > +++ b/make/windows/makefiles/vm.makeWed Dec 02 07:41:29 2015 -1000 > @@ -45,10 +45,16 @@ CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D > > > !if "$(Variant)" == "compiler2" > CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER2" > +!if "$(BUILDARCH)" == "i486" > +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 > +!endif > !endif > > > !if "$(Variant)" == "tiered" > CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D "COMPILER2" > +!if "$(BUILDARCH)" == "i486" > +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 > +!endif > !endif > > > !if "$(BUILDARCH)" == "i486" > From christian.thalinger at oracle.com Wed Dec 2 18:49:59 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 2 Dec 2015 08:49:59 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <565EE277.7030907@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> Message-ID: <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> > On Dec 2, 2015, at 2:22 AM, Andrew Haley wrote: > > On 02/12/15 12:09, Roland Schatz wrote: >> On 12/02/2015 12:33 PM, Andrew Haley wrote: >>> On 02/12/15 11:23, Roland Schatz wrote: >>>> Perhaps we should have a series of test analogous to the >>>> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >>>> code installation. For that we would need a platform dependent "fake" >>>> compiler (e.g. handwritten assembly for well-known test methods). >>> Maybe. But if there is no way to actually exercise the code which >>> is in HotSpot, why is it there? >> It's an interface for compilers. You can exercise the code, you just >> have to write a compiler ;) > > Sure, but I don't see why we can't have a tiny compiler in the test > suite. Wow. This is getting crazy now :-) Anyway, let?s push what we have now and wait for the AArch64 backend to be functional. Then we can fix the CodeInstaller methods. > > Andrew. > From vivek.r.deshpande at intel.com Wed Dec 2 19:21:29 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 2 Dec 2015 19:21:29 +0000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <565E511E.9020503@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CDB35@ORSMSX106.amr.corp.intel.com> Hi Vladimir Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). We will update the patch and jbs entry with global flag and let you know soon. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, December 01, 2015 6:02 PM To: Deshpande, Vivek R; hotspot compiler Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. If that is the case the flag should be global. Thanks, Vladimir On 12/1/15 5:48 PM, Vladimir Kozlov wrote: > This seems fine. 2x is for AVX implementation? > > Thanks, > Vladimir > > On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >> Hi all >> >> We would like to contribute a patch from Intel which optimizes >> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >> architecture using AVX instructions. >> >> The improvement gives more than 2x gain over Unsafe implementation >> for long arrays. >> >> >> The bug is blocked by bug: vectorized support for array >> equals/compare/mismatch using Unsafe >> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8143355 >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >> >> Thanks and regards, >> >> Vivek >> From christian.thalinger at oracle.com Wed Dec 2 19:43:29 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 2 Dec 2015 09:43:29 -1000 Subject: RFR (XXS): 8144521: [JVMCI] JVMCI is built on 32-bit Windows compiler2 and tiered builds In-Reply-To: <565F3BE3.4030303@oracle.com> References: <565F3BE3.4030303@oracle.com> Message-ID: <549F9FED-3962-4FB4-8706-F27EEDFB59CD@oracle.com> Hmm. Looks like we also need a change like this: diff -r 8578909eeef4 make/windows/create_obj_files.sh --- a/make/windows/create_obj_files.sh Thu Nov 26 10:38:33 2015 +0000 +++ b/make/windows/create_obj_files.sh Wed Dec 02 09:42:23 2015 -1000 @@ -129,7 +129,7 @@ esac # Special handling of arch model. case "${Platform_arch_model}" in - "x86_32") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} *x86_64*" ;; + "x86_32") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} ${JVMCI_SPECIFIC_FILES} *x86_64*" ;; "x86_64") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} *x86_32*" ;; esac > On Dec 2, 2015, at 8:43 AM, Vladimir Kozlov wrote: > > Looks good. > > Vladimir > > On 12/2/15 9:58 AM, Christian Thalinger wrote: >> https://bugs.openjdk.java.net/browse/JDK-8144521 >> >> Currently we build JVMCI for 32-bit Windows targets but we shouldn?t. >> >> diff -r 8578909eeef4 make/windows/makefiles/vm.make >> --- a/make/windows/makefiles/vm.makeThu Nov 26 10:38:33 2015 +0000 >> +++ b/make/windows/makefiles/vm.makeWed Dec 02 07:41:29 2015 -1000 >> @@ -45,10 +45,16 @@ CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D >> >> >> !if "$(Variant)" == "compiler2" >> CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER2" >> +!if "$(BUILDARCH)" == "i486" >> +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 >> +!endif >> !endif >> >> >> !if "$(Variant)" == "tiered" >> CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D "COMPILER2" >> +!if "$(BUILDARCH)" == "i486" >> +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 >> +!endif >> !endif >> >> >> !if "$(BUILDARCH)" == "i486" >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Dec 2 20:08:32 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Dec 2015 12:08:32 -0800 Subject: RFR (XXS): 8144521: [JVMCI] JVMCI is built on 32-bit Windows compiler2 and tiered builds In-Reply-To: <549F9FED-3962-4FB4-8706-F27EEDFB59CD@oracle.com> References: <565F3BE3.4030303@oracle.com> <549F9FED-3962-4FB4-8706-F27EEDFB59CD@oracle.com> Message-ID: <565F4FC0.1050306@oracle.com> Okay. Vladimir On 12/2/15 11:43 AM, Christian Thalinger wrote: > Hmm. Looks like we also need a change like this: > > diff -r 8578909eeef4 make/windows/create_obj_files.sh > --- a/make/windows/create_obj_files.shThu Nov 26 10:38:33 2015 +0000 > +++ b/make/windows/create_obj_files.shWed Dec 02 09:42:23 2015 -1000 > @@ -129,7 +129,7 @@ esac > > > # Special handling of arch model. > case "${Platform_arch_model}" in > -"x86_32") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} *x86_64*" ;; > +"x86_32") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} > ${JVMCI_SPECIFIC_FILES} *x86_64*" ;; > "x86_64") Src_Files_EXCLUDE="${Src_Files_EXCLUDE} *x86_32*" ;; > esac > > > >> On Dec 2, 2015, at 8:43 AM, Vladimir Kozlov >> > wrote: >> >> Looks good. >> >> Vladimir >> >> On 12/2/15 9:58 AM, Christian Thalinger wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8144521 >>> >>> Currently we build JVMCI for 32-bit Windows targets but we shouldn?t. >>> >>> diff -r 8578909eeef4 make/windows/makefiles/vm.make >>> --- a/make/windows/makefiles/vm.makeThu Nov 26 10:38:33 2015 +0000 >>> +++ b/make/windows/makefiles/vm.makeWed Dec 02 07:41:29 2015 -1000 >>> @@ -45,10 +45,16 @@ CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D >>> >>> >>> !if "$(Variant)" == "compiler2" >>> CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER2" >>> +!if "$(BUILDARCH)" == "i486" >>> +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 >>> +!endif >>> !endif >>> >>> >>> !if "$(Variant)" == "tiered" >>> CXX_FLAGS=$(CXX_FLAGS) /D "COMPILER1" /D "COMPILER2" >>> +!if "$(BUILDARCH)" == "i486" >>> +CXX_FLAGS=$(CXX_FLAGS) /D INCLUDE_JVMCI=0 >>> +!endif >>> !endif >>> >>> >>> !if "$(BUILDARCH)" == "i486" >>> > From mandy.chung at oracle.com Wed Dec 2 20:58:51 2015 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 2 Dec 2015 12:58:51 -0800 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: References: Message-ID: <22C54219-695B-486F-AEAA-7B96473DEDF4@oracle.com> > On Nov 30, 2015, at 9:40 AM, Paul Sandoz wrote: > > Please review: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/ > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/ http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/Stable.java.cdiff.html 32 * This annotation functions as an alias for the jdk.internal.Stable annotation within JVMCI s/jdk.internal.Stable/jdk.internal.vm.annotation.Stable http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java.frames.html 1327 mv.visitAnnotation("Ljdk/internal/DontInline;", true); need fixing. Otherwise, looks good. Mandy From paul.sandoz at oracle.com Wed Dec 2 21:11:40 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 22:11:40 +0100 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <22C54219-695B-486F-AEAA-7B96473DEDF4@oracle.com> References: <22C54219-695B-486F-AEAA-7B96473DEDF4@oracle.com> Message-ID: > On 2 Dec 2015, at 21:58, Mandy Chung wrote: > > >> On Nov 30, 2015, at 9:40 AM, Paul Sandoz wrote: >> >> Please review: >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/ >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/ > > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/Stable.java.cdiff.html > > 32 * This annotation functions as an alias for the jdk.internal.Stable annotation within JVMCI > > s/jdk.internal.Stable/jdk.internal.vm.annotation.Stable > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java.frames.html > > 1327 mv.visitAnnotation("Ljdk/internal/DontInline;", true); > > need fixing. > Oops that?s embarrassing, i fat fingered the search/replace. Our tests don?t catch such cases of non-existent annotations. Updated, thanks, Paul. > Otherwise, looks good. > > Mandy > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mandy.chung at oracle.com Wed Dec 2 21:24:22 2015 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 2 Dec 2015 13:24:22 -0800 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: References: Message-ID: <28639752-BD0A-4B91-BE3F-61DEA4B997FA@oracle.com> > On Nov 26, 2015, at 1:55 AM, Paul Sandoz wrote: > > Hi, > > This is a request for an optimistic review to fork the sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables so that we can evolve the latter e.g. for VarHandles unsafe work > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-jdk/ > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/ Looks good. About the tests, would it be better for these tests be placed in the jdk repo? Nit: extra lines in the testAccess method in several new tests can be removed. 144 static void testAccess(Object base, long offset) { 145 // Plain 146 { 147 UNSAFE.putDouble(base, offset, 1.0d); 148 double x = UNSAFE.getDouble(base, offset); 149 assertEquals(x, 1.0d, "set double value"); 150 } 151 152 // Volatile 153 { 154 UNSAFE.putDoubleVolatile(base, offset, 2.0d); 155 double x = UNSAFE.getDoubleVolatile(base, offset); 156 assertEquals(x, 2.0d, "putVolatile double value"); 157 } 158 159 160 161 162 } From christian.thalinger at oracle.com Wed Dec 2 21:53:54 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 2 Dec 2015 11:53:54 -1000 Subject: RFR (S): 8144529: [JVMCI] compiler/jvmci/errors/TestInvalidCompilationResult.java fails to compile after JDK-8143730 Message-ID: <1CE506C1-E3E8-4A83-84A1-2D86C9C7C0D4@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8144529 I forgot to fix some tests. diff -r 8578909eeef4 test/compiler/jvmci/errors/CodeInstallerTest.java --- a/test/compiler/jvmci/errors/CodeInstallerTest.java Thu Nov 26 10:38:33 2015 +0000 +++ b/test/compiler/jvmci/errors/CodeInstallerTest.java Wed Dec 02 10:59:39 2015 -1000 @@ -68,6 +68,7 @@ public class CodeInstallerTest { } protected void installCode(CompilationResult result) { + result.close(); codeCache.addCode(dummyMethod, result, null, null); } diff -r 8578909eeef4 test/compiler/jvmci/errors/TestInvalidCompilationResult.java --- a/test/compiler/jvmci/errors/TestInvalidCompilationResult.java Thu Nov 26 10:38:33 2015 +0000 +++ b/test/compiler/jvmci/errors/TestInvalidCompilationResult.java Wed Dec 02 10:59:39 2015 -1000 @@ -219,13 +219,6 @@ public class TestInvalidCompilationResul } @Test(expected = JVMCIError.class) - public void testUnknownInfopointReason() { - CompilationResult result = createEmptyCompilationResult(); - result.addInfopoint(new Infopoint(0, null, InfopointReason.UNKNOWN)); - installCode(result); - } - - @Test(expected = JVMCIError.class) public void testInfopointMissingDebugInfo() { CompilationResult result = createEmptyCompilationResult(); result.addInfopoint(new Infopoint(0, null, InfopointReason.METHOD_START)); diff -r 8578909eeef4 test/compiler/jvmci/events/JvmciNotifyInstallEventTest.java --- a/test/compiler/jvmci/events/JvmciNotifyInstallEventTest.java Thu Nov 26 10:38:33 2015 +0000 +++ b/test/compiler/jvmci/events/JvmciNotifyInstallEventTest.java Wed Dec 02 10:59:39 2015 -1000 @@ -106,13 +106,12 @@ public class JvmciNotifyInstallEventTest HotSpotCompilationRequest compRequest = new HotSpotCompilationRequest(method, -1, 0L); // to pass sanity check of default -1 compResult.setTotalFrameSize(0); + compResult.close(); codeCache.installCode(compRequest, compResult, /* installedCode = */ null, /* speculationLog = */ null, /* isDefault = */ false); Asserts.assertEQ(gotInstallNotification, 1, "Got unexpected event count after 1st install attempt"); // since "empty" compilation result is ok, a second attempt should be ok - compResult = new CompilationResult(METHOD_NAME); // create another instance with fresh state - compResult.setTotalFrameSize(0); codeCache.installCode(compRequest, compResult, /* installedCode = */ null, /* speculationLog = */ null, /* isDefault = */ false); Asserts.assertEQ(gotInstallNotification, 2, -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Wed Dec 2 22:36:05 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 2 Dec 2015 23:36:05 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <28639752-BD0A-4B91-BE3F-61DEA4B997FA@oracle.com> References: <28639752-BD0A-4B91-BE3F-61DEA4B997FA@oracle.com> Message-ID: <2EDB2846-96A2-4C36-A44F-DC3FAD48BF94@oracle.com> > On 2 Dec 2015, at 22:24, Mandy Chung wrote: > > >> On Nov 26, 2015, at 1:55 AM, Paul Sandoz wrote: >> >> Hi, >> >> This is a request for an optimistic review to fork the sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables so that we can evolve the latter e.g. for VarHandles unsafe work >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-jdk/ >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/ > > Looks good. About the tests, would it be better for these tests be placed in the jdk repo? > That?s my preferred back up plan :-) Something to ponder while waiting for 8143930 (C1 LinearScan...) to be integrated. > Nit: extra lines in the testAccess method in several new tests can be removed. > > 144 static void testAccess(Object base, long offset) { > 145 // Plain > 146 { > 147 UNSAFE.putDouble(base, offset, 1.0d); > 148 double x = UNSAFE.getDouble(base, offset); > 149 assertEquals(x, 1.0d, "set double value"); > 150 } > 151 > 152 // Volatile > 153 { > 154 UNSAFE.putDoubleVolatile(base, offset, 2.0d); > 155 double x = UNSAFE.getDoubleVolatile(base, offset); > 156 assertEquals(x, 2.0d, "putVolatile double value"); > 157 } > 158 > 159 > 160 > 161 > 162 } I could remove empty lines in the template between #if/#end blocks but i find it harder to read. So i would prefer to leave this as is. Thanks, Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From paul.sandoz at oracle.com Wed Dec 2 23:00:22 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 3 Dec 2015 00:00:22 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <565F1A5A.4040509@oracle.com> References: <7e5e2f21-a462-4fb8-8cb2-52f4c9e303fb@default> <17CDB8FA-3B1E-465A-8FB6-121113BE66CA@oracle.com> <701701d12c6d$2cfa8cd0$86efa670$@oracle.com> <9C417D00-F022-4CF6-87D3-0AF74CCD7441@oracle.com> <565F1A5A.4040509@oracle.com> Message-ID: <8D34B054-1832-428A-8A00-C7C5E1056061@oracle.com> Hi Stefan, > On 2 Dec 2015, at 17:20, Stefan S?rne wrote: > > > Hi Paul, > > The reason we stick on standard jtreg tests is because it is simpler. > For us, a java test is not a unit test, it is an application. :) > I tend to think of that as an artificial distinction since such java test classes often contain a logical grouping of tests (and perhaps data over which to test) and make test assertions. Let?s call it duck unit testing, it looks and quacks like a unit test :-) > I agree with you that when writing and debugging java code, I would choose testng over jtreg and run and debug it inside my java IDE. In the case of the JDK it's not jtreg over testng it is jtreg + testng. > But debugging the VM is instead done with a native debugger and what the framework gives you for java development, becomes a level of indirection in VM land. Just adding the test class as argument to the java launcher where a main method exists is preferred. > How do HotSpot engineers debug the VM with a jtreg test that uses @library (or @module once Jigsaw gets integrated), or uses WhiteBox, or uses ProcessTools? Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mandy.chung at oracle.com Thu Dec 3 00:06:30 2015 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 2 Dec 2015 16:06:30 -0800 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: <2EDB2846-96A2-4C36-A44F-DC3FAD48BF94@oracle.com> References: <28639752-BD0A-4B91-BE3F-61DEA4B997FA@oracle.com> <2EDB2846-96A2-4C36-A44F-DC3FAD48BF94@oracle.com> Message-ID: > On Dec 2, 2015, at 2:36 PM, Paul Sandoz wrote: > > >> On 2 Dec 2015, at 22:24, Mandy Chung wrote: >> >> >>> On Nov 26, 2015, at 1:55 AM, Paul Sandoz wrote: >>> >>> Hi, >>> >>> This is a request for an optimistic review to fork the sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables so that we can evolve the latter e.g. for VarHandles unsafe work >>> >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-jdk/ >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/ >> >> Looks good. About the tests, would it be better for these tests be placed in the jdk repo? >> > > That?s my preferred back up plan :-) Something to ponder while waiting for 8143930 (C1 LinearScan...) to be integrated. > My reason of suggesting that because they should be part of the core test group that are run by core-libs and SQE nightly and hs-rt nightly (majority of the core tests if not all). > >> Nit: extra lines in the testAccess method in several new tests can be removed. >> >> 144 static void testAccess(Object base, long offset) { >> 145 // Plain >> 146 { >> 147 UNSAFE.putDouble(base, offset, 1.0d); >> 148 double x = UNSAFE.getDouble(base, offset); >> 149 assertEquals(x, 1.0d, "set double value"); >> 150 } >> 151 >> 152 // Volatile >> 153 { >> 154 UNSAFE.putDoubleVolatile(base, offset, 2.0d); >> 155 double x = UNSAFE.getDoubleVolatile(base, offset); >> 156 assertEquals(x, 2.0d, "putVolatile double value"); >> 157 } >> 158 >> 159 >> 160 >> 161 >> 162 } > > I could remove empty lines in the template between #if/#end blocks but i find it harder to read. So i would prefer to leave this as is. That?s fine with me. Mandy From vladimir.kozlov at oracle.com Thu Dec 3 00:44:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Dec 2015 16:44:12 -0800 Subject: RFR (S): 8144529: [JVMCI] compiler/jvmci/errors/TestInvalidCompilationResult.java fails to compile after JDK-8143730 In-Reply-To: <1CE506C1-E3E8-4A83-84A1-2D86C9C7C0D4@oracle.com> References: <1CE506C1-E3E8-4A83-84A1-2D86C9C7C0D4@oracle.com> Message-ID: <565F905C.3090408@oracle.com> Good. Thanks, Vladimir On 12/2/15 1:53 PM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8144529 > > I forgot to fix some tests. > > diff -r 8578909eeef4 test/compiler/jvmci/errors/CodeInstallerTest.java > --- a/test/compiler/jvmci/errors/CodeInstallerTest.javaThu Nov 26 > 10:38:33 2015 +0000 > +++ b/test/compiler/jvmci/errors/CodeInstallerTest.javaWed Dec 02 > 10:59:39 2015 -1000 > @@ -68,6 +68,7 @@ public class CodeInstallerTest { > } > > > protected void installCode(CompilationResult result) { > + result.close(); > codeCache.addCode(dummyMethod, result, null, null); > } > > > diff -r 8578909eeef4 > test/compiler/jvmci/errors/TestInvalidCompilationResult.java > --- a/test/compiler/jvmci/errors/TestInvalidCompilationResult.javaThu > Nov 26 10:38:33 2015 +0000 > +++ b/test/compiler/jvmci/errors/TestInvalidCompilationResult.javaWed > Dec 02 10:59:39 2015 -1000 > @@ -219,13 +219,6 @@ public class TestInvalidCompilationResul > } > > > @Test(expected = JVMCIError.class) > - public void testUnknownInfopointReason() { > - CompilationResult result = createEmptyCompilationResult(); > - result.addInfopoint(new Infopoint(0, null, > InfopointReason.UNKNOWN)); > - installCode(result); > - } > - > - @Test(expected = JVMCIError.class) > public void testInfopointMissingDebugInfo() { > CompilationResult result = createEmptyCompilationResult(); > result.addInfopoint(new Infopoint(0, null, > InfopointReason.METHOD_START)); > diff -r 8578909eeef4 > test/compiler/jvmci/events/JvmciNotifyInstallEventTest.java > --- a/test/compiler/jvmci/events/JvmciNotifyInstallEventTest.javaThu Nov > 26 10:38:33 2015 +0000 > +++ b/test/compiler/jvmci/events/JvmciNotifyInstallEventTest.javaWed Dec > 02 10:59:39 2015 -1000 > @@ -106,13 +106,12 @@ public class JvmciNotifyInstallEventTest > HotSpotCompilationRequest compRequest = new > HotSpotCompilationRequest(method, -1, 0L); > // to pass sanity check of default -1 > compResult.setTotalFrameSize(0); > + compResult.close(); > codeCache.installCode(compRequest, compResult, /* > installedCode = */ null, /* speculationLog = */ null, > /* isDefault = */ false); > Asserts.assertEQ(gotInstallNotification, 1, > "Got unexpected event count after 1st install attempt"); > // since "empty" compilation result is ok, a second attempt > should be ok > - compResult = new CompilationResult(METHOD_NAME); // create > another instance with fresh state > - compResult.setTotalFrameSize(0); > codeCache.installCode(compRequest, compResult, /* > installedCode = */ null, /* speculationLog = */ null, > /* isDefault = */ false); > Asserts.assertEQ(gotInstallNotification, 2, > From vivek.r.deshpande at intel.com Thu Dec 3 06:32:44 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 3 Dec 2015 06:32:44 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <565E520B.8060801@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CE3D4@ORSMSX106.amr.corp.intel.com> Hi Vladimir This is the link for the updated webrev for your review. http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ Thank you. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, December 01, 2015 6:06 PM To: Deshpande, Vivek R; joe darcy Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Please send link to new webrev on cr server. Thanks, Vladimir On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please find the webrev with your suggested updates attached with the mail. > We will update it in the jbs entry soon. > Please let me know if it needs further changes. > > Regards, > Vivek > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Tuesday, November 24, 2015 10:22 AM > To: 'joe darcy'; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > HI Vladimir, Joe > > I have done the jtreg tests in hotspot and tests from jdk you have mentioned. It passed those tests. > The ~4x gain is with XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos over without that option. > The performance gain is 3.2x over base jdk, that is over current fsin/fcos intrinsic. This gain is more realistic. > > Could I get those tests around the boundary values. Would WorstCaseTests.java jtreg test in jdk test those ? > If yes, then it has passed those boundary cases. > > I would work on adding either diagnostic flag or just one flag for libm and send out the webrev soon. > > Regards, > Vivek > > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Monday, November 23, 2015 6:28 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > Hello, > > Just getting added to the thread.. > > On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >> Thank you, for explanation, Vivek. >> >> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >> Hotspot tests. >> >> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>> result and not exact result. So I added the flag to switch between >>> FDLIBM and LIBM. >>> >>> Quick explanation: >>> This is what we observed with comparison to HPA Library >>> (http://www.nongnu.org/hpalib/) explained with an example. >>> LIBM Observed Math result=0.19457293629570213 (4596178249117717083L) >>> (StrictMath - 1ulp) Required result should be = 0.19457293629570216 >>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>> result is between the above two values and Exact result would be >>> pretty close to it. >>> So here StrictMath result is less than quad-precision result, Math >>> result should be StrictMath or StrictMath + 1ulp and not StrictMath >>> - 1ulp, according to our test. >> >> Note, java.lang.Math allows to have 1ulp off (in both direction, I >> think) and it should be consistent for Interpreter and code generated >> by JIT compilers: >> >> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28do >> u >> ble%29 >> > > That interpretation of the spec is not quite right. For the Math methods with a 1/2 ulp error bound, the floating-point result closest to the exact result must be returned. For the methods with a 1 ulp error bound, either of the floating-point result bracketing the true result can be returned, subject to the monotonicity constraints of the specification of the particular method. > >> >>> >>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter would >>> go through LIBM and C1 and c2 through FDLIBM. >>> If we want to disable LIBM completely, we need the flags >>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >> >> I was thinking about using existing >> DirectiveSet::is_intrinsic_disabled() and >> vmIntrinsics::is_disabled_by_flags(). You need to add additional >> versions of functions which accept intrinsic ID instead of methodHandle. >> >> If you still want to use flags make them diagnostic. >> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >> >>> >>> Also the performance gain ~4x is with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin/_dcos. >> >> You confused me here. So you get 4x when only Interpreter use LIBM >> code and compilers use FDLIB? > > Just to be clear, are you comparing the new code to FDLIBM (StrictMath) or to the existing fsin/fcos instrinsics (Math)? > > I'm part way through porting the FDLIBM code to Java (JDK-8134780: Port fdlibm to Java), which is providing a significant speed boost to the StrictMath methods that have been ported. > > I find the current patch *insufficient* as-is in terms of its testing. > For example, part of patch says > > # For sin > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > # For cos > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > If nothing else, there are no tests at around those boundary values, which is unacceptable. There should also be some tests of values of interest to the algorithm in question. > > Cheers, > > -Joe > > >> >> Thanks, >> Vladimir >> >>> >>> Let me know your thoughts on this. I would answer more questions and >>> give more data if needed. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, November 23, 2015 10:37 AM >>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>> Cc: Viswanathan, Sandhya >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>> What is the reason you decided to add new flags? exp() and log() >>>> changes did not have flags. >>>> >>>> It would be interesting to see what happens if you disable >>>> intrinsics using existing flag, for example: >>>> >>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>> >>> Hi Vivek, >>> >>> I want to point that you can do this experiment later. We can file >>> bugs and fixed them after FC. >>> >>> For now, please, answer my question about flags only. This is the >>> only thing holding it from push. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>> Hi all >>>>> >>>>> I would like to contribute a patch which optimizes Math.sin() and >>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>> implementation. >>>>> >>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>> >>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>> and -XX:+UseLibmCosIntrinsic. >>>>> >>>>> Could you please review and sponsor this patch. >>>>> >>>>> Bug-id: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>> >>>>> Thanks and regards, >>>>> >>>>> Vivek >>>>> From martin.doerr at sap.com Thu Dec 3 12:02:02 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 3 Dec 2015 12:02:02 +0000 Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler In-Reply-To: <4295855A5C1DE049A61835A1887419CC41ED9D8F@DEWDFEMB12A.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB41811656722779EB@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41ED4F07@DEWDFEMB12A.global.corp.sap> <018BFB19-5628-4484-87C6-F345A2CFBC3F@oracle.com> <4295855A5C1DE049A61835A1887419CC41ED9D8F@DEWDFEMB12A.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228650B@DEWDFEMB19C.global.corp.sap> Hi, I also prefer to push it in one change. However, we need a sponsor from Oracle. Can anybody volunteer, please? The only non-PPC64 file is make/linux/Makefile. I just removed a small PPC64 part from it which prevented the C1 build. The remaining files are maintained by us, anyway. The current webrev is: http://cr.openjdk.java.net/~mdoerr/8144019_ppc64_c1/webrev.01/ Thanks and best regards, Martin From: Lindenmaier, Goetz Sent: Mittwoch, 2. Dezember 2015 11:21 To: Christian Thalinger Cc: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler Hi Christian, I understand we need to follow the rules. But then I would like to keep it as one change. That's just cleaner. Best regards, Goetz. From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Dienstag, 1. Dezember 2015 20:31 To: Lindenmaier, Goetz > Cc: Doerr, Martin >; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler On Nov 27, 2015, at 6:01 AM, Lindenmaier, Goetz > wrote: Hi, could please someone from out of SAP review this? This being a quite big change I would like to have someone else to look over it to assure everything is formally correct. Also, it contains the few lines needed to enable the C1 build on ppc64 in the shared linux makefiles. May I push this despite this? Else we would need a sponsor please. The rules say we have to use JPRT. We could also split it into two changes and you can push the C1 port directly. I can piggyback the Makefile change with something else. Thanks, Goetz. From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net]On Behalf Of Doerr, Martin Sent: Mittwoch, 25. November 2015 15:06 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(XXL): 8144019: PPC64 C1: Introduce Client Compiler Hi, we would like to contribute our PPC64 port of the Client Compiler to support Tiered Compilation. The change includes refactoring of some functionality which is shared between C1 and C2 and some updates. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144019_ppc64_c1/webrev.00 It only changes PPC64 files, with one minor exception: make/linux/Makefile Please review. I will also need a sponsor, please. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From fei.yang0953 at yahoo.com Thu Dec 3 14:22:26 2015 From: fei.yang0953 at yahoo.com (felix yang) Date: Thu, 3 Dec 2015 14:22:26 +0000 (UTC) Subject: [RFR] aarch64: C2 generate vectorized MLA/MLS instructions References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> Hi, ? Can someone help review and sponsor this code generation improvement for aarch64 port??? ? Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587 ? Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.00/ ? The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server as a test case.?? With this patch, the following code snippet by C2:? ? ? 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s ? ? 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s ? ? 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s ? ? 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s ? ? 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s ? will be further optimized into:? ? ? 0x0000007f9cdb86dc: mul? ? ? v19.4s, v16.4s, v17.4s ? ? 0x0000007f9cdb86e0: mla? ? ? v19.4s, v16.4s, v18.4s ? ? 0x0000007f9cdb86e4: mla? ? ? v19.4s, v17.4s, v18.4s ? About 13% performance gain achieved for the test case on my aarch64 server.?? ? Tested with jtreg hotspot & langtools.? Results are the same before and after.?? ? Is it OK to push??? Felix,?? Thanks for your help.?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Dec 3 14:40:07 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Dec 2015 14:40:07 +0000 Subject: [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com> <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> Message-ID: <56605447.9070103@redhat.com> It would help everybody if you did "hg commit" with an appropriate changeset comment before generating the webrev. Andrew. From martin.doerr at sap.com Thu Dec 3 17:17:30 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 3 Dec 2015 17:17:30 +0000 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion Message-ID: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> Hi, I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix the performance problem we observe in Octane benchmarks. It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ Please review. The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn't observe resizing because it was too small. I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. How was that previous value determined? Should I implement it differently? Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Thu Dec 3 19:06:38 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 3 Dec 2015 19:06:38 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> Hi Vladimir This is the link for the updated webrev with latest hotspot source as base for your review. http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ Thank you. Regards, Vivek -----Original Message----- From: Deshpande, Vivek R Sent: Wednesday, December 02, 2015 10:33 PM To: 'Vladimir Kozlov'; joe darcy Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math lib Hi Vladimir This is the link for the updated webrev for your review. http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ Thank you. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, December 01, 2015 6:06 PM To: Deshpande, Vivek R; joe darcy Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Please send link to new webrev on cr server. Thanks, Vladimir On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please find the webrev with your suggested updates attached with the mail. > We will update it in the jbs entry soon. > Please let me know if it needs further changes. > > Regards, > Vivek > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Tuesday, November 24, 2015 10:22 AM > To: 'joe darcy'; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > HI Vladimir, Joe > > I have done the jtreg tests in hotspot and tests from jdk you have mentioned. It passed those tests. > The ~4x gain is with XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos over without that option. > The performance gain is 3.2x over base jdk, that is over current fsin/fcos intrinsic. This gain is more realistic. > > Could I get those tests around the boundary values. Would WorstCaseTests.java jtreg test in jdk test those ? > If yes, then it has passed those boundary cases. > > I would work on adding either diagnostic flag or just one flag for libm and send out the webrev soon. > > Regards, > Vivek > > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Monday, November 23, 2015 6:28 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > Hello, > > Just getting added to the thread.. > > On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >> Thank you, for explanation, Vivek. >> >> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >> Hotspot tests. >> >> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>> result and not exact result. So I added the flag to switch between >>> FDLIBM and LIBM. >>> >>> Quick explanation: >>> This is what we observed with comparison to HPA Library >>> (http://www.nongnu.org/hpalib/) explained with an example. >>> LIBM Observed Math result=0.19457293629570213 (4596178249117717083L) >>> (StrictMath - 1ulp) Required result should be = 0.19457293629570216 >>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>> result is between the above two values and Exact result would be >>> pretty close to it. >>> So here StrictMath result is less than quad-precision result, Math >>> result should be StrictMath or StrictMath + 1ulp and not StrictMath >>> - 1ulp, according to our test. >> >> Note, java.lang.Math allows to have 1ulp off (in both direction, I >> think) and it should be consistent for Interpreter and code generated >> by JIT compilers: >> >> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28do >> u >> ble%29 >> > > That interpretation of the spec is not quite right. For the Math methods with a 1/2 ulp error bound, the floating-point result closest to the exact result must be returned. For the methods with a 1 ulp error bound, either of the floating-point result bracketing the true result can be returned, subject to the monotonicity constraints of the specification of the particular method. > >> >>> >>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter would >>> go through LIBM and C1 and c2 through FDLIBM. >>> If we want to disable LIBM completely, we need the flags >>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >> >> I was thinking about using existing >> DirectiveSet::is_intrinsic_disabled() and >> vmIntrinsics::is_disabled_by_flags(). You need to add additional >> versions of functions which accept intrinsic ID instead of methodHandle. >> >> If you still want to use flags make them diagnostic. >> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >> >>> >>> Also the performance gain ~4x is with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin/_dcos. >> >> You confused me here. So you get 4x when only Interpreter use LIBM >> code and compilers use FDLIB? > > Just to be clear, are you comparing the new code to FDLIBM (StrictMath) or to the existing fsin/fcos instrinsics (Math)? > > I'm part way through porting the FDLIBM code to Java (JDK-8134780: Port fdlibm to Java), which is providing a significant speed boost to the StrictMath methods that have been ported. > > I find the current patch *insufficient* as-is in terms of its testing. > For example, part of patch says > > # For sin > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > # For cos > > +// This means that the main path is actually only taken for > +// 2^-252 <= |X| < 90112. > > If nothing else, there are no tests at around those boundary values, which is unacceptable. There should also be some tests of values of interest to the algorithm in question. > > Cheers, > > -Joe > > >> >> Thanks, >> Vladimir >> >>> >>> Let me know your thoughts on this. I would answer more questions and >>> give more data if needed. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, November 23, 2015 10:37 AM >>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>> Cc: Viswanathan, Sandhya >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>> What is the reason you decided to add new flags? exp() and log() >>>> changes did not have flags. >>>> >>>> It would be interesting to see what happens if you disable >>>> intrinsics using existing flag, for example: >>>> >>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>> >>> Hi Vivek, >>> >>> I want to point that you can do this experiment later. We can file >>> bugs and fixed them after FC. >>> >>> For now, please, answer my question about flags only. This is the >>> only thing holding it from push. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>> Hi all >>>>> >>>>> I would like to contribute a patch which optimizes Math.sin() and >>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>> implementation. >>>>> >>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>> >>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>> and -XX:+UseLibmCosIntrinsic. >>>>> >>>>> Could you please review and sponsor this patch. >>>>> >>>>> Bug-id: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>> >>>>> Thanks and regards, >>>>> >>>>> Vivek >>>>> From vladimir.kozlov at oracle.com Thu Dec 3 21:05:58 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 3 Dec 2015 13:05:58 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> Message-ID: <5660AEB6.8060007@oracle.com> Okay, looks reasonable to me. Thanks, Vladimir On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > This is the link for the updated webrev with latest hotspot source as base for your review. > http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ > Thank you. > > Regards, > Vivek > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Wednesday, December 02, 2015 10:33 PM > To: 'Vladimir Kozlov'; joe darcy > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Hi Vladimir > > This is the link for the updated webrev for your review. > http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ > Thank you. > > Regards, > Vivek > > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, December 01, 2015 6:06 PM > To: Deshpande, Vivek R; joe darcy > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Please send link to new webrev on cr server. > > Thanks, > Vladimir > > On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >> Hi Vladimir >> >> Please find the webrev with your suggested updates attached with the mail. >> We will update it in the jbs entry soon. >> Please let me know if it needs further changes. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Tuesday, November 24, 2015 10:22 AM >> To: 'joe darcy'; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> HI Vladimir, Joe >> >> I have done the jtreg tests in hotspot and tests from jdk you have mentioned. It passed those tests. >> The ~4x gain is with XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos over without that option. >> The performance gain is 3.2x over base jdk, that is over current fsin/fcos intrinsic. This gain is more realistic. >> >> Could I get those tests around the boundary values. Would WorstCaseTests.java jtreg test in jdk test those ? >> If yes, then it has passed those boundary cases. >> >> I would work on adding either diagnostic flag or just one flag for libm and send out the webrev soon. >> >> Regards, >> Vivek >> >> >> -----Original Message----- >> From: joe darcy [mailto:joe.darcy at oracle.com] >> Sent: Monday, November 23, 2015 6:28 PM >> To: Vladimir Kozlov; Deshpande, Vivek R >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hello, >> >> Just getting added to the thread.. >> >> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>> Thank you, for explanation, Vivek. >>> >>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>> Hotspot tests. >>> >>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>>> result and not exact result. So I added the flag to switch between >>>> FDLIBM and LIBM. >>>> >>>> Quick explanation: >>>> This is what we observed with comparison to HPA Library >>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>> LIBM Observed Math result=0.19457293629570213 (4596178249117717083L) >>>> (StrictMath - 1ulp) Required result should be = 0.19457293629570216 >>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>>> result is between the above two values and Exact result would be >>>> pretty close to it. >>>> So here StrictMath result is less than quad-precision result, Math >>>> result should be StrictMath or StrictMath + 1ulp and not StrictMath >>>> - 1ulp, according to our test. >>> >>> Note, java.lang.Math allows to have 1ulp off (in both direction, I >>> think) and it should be consistent for Interpreter and code generated >>> by JIT compilers: >>> >>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28do >>> u >>> ble%29 >>> >> >> That interpretation of the spec is not quite right. For the Math methods with a 1/2 ulp error bound, the floating-point result closest to the exact result must be returned. For the methods with a 1 ulp error bound, either of the floating-point result bracketing the true result can be returned, subject to the monotonicity constraints of the specification of the particular method. >> >>> >>>> >>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter would >>>> go through LIBM and C1 and c2 through FDLIBM. >>>> If we want to disable LIBM completely, we need the flags >>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>> >>> I was thinking about using existing >>> DirectiveSet::is_intrinsic_disabled() and >>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>> versions of functions which accept intrinsic ID instead of methodHandle. >>> >>> If you still want to use flags make them diagnostic. >>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>> >>>> >>>> Also the performance gain ~4x is with XX:+UnlockDiagnosticVMOptions >>>> -XX:DisableIntrinsic=_dsin/_dcos. >>> >>> You confused me here. So you get 4x when only Interpreter use LIBM >>> code and compilers use FDLIB? >> >> Just to be clear, are you comparing the new code to FDLIBM (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >> >> I'm part way through porting the FDLIBM code to Java (JDK-8134780: Port fdlibm to Java), which is providing a significant speed boost to the StrictMath methods that have been ported. >> >> I find the current patch *insufficient* as-is in terms of its testing. >> For example, part of patch says >> >> # For sin >> >> +// This means that the main path is actually only taken for >> +// 2^-252 <= |X| < 90112. >> >> # For cos >> >> +// This means that the main path is actually only taken for >> +// 2^-252 <= |X| < 90112. >> >> If nothing else, there are no tests at around those boundary values, which is unacceptable. There should also be some tests of values of interest to the algorithm in question. >> >> Cheers, >> >> -Joe >> >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Let me know your thoughts on this. I would answer more questions and >>>> give more data if needed. >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, November 23, 2015 10:37 AM >>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>> Cc: Viswanathan, Sandhya >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>> What is the reason you decided to add new flags? exp() and log() >>>>> changes did not have flags. >>>>> >>>>> It would be interesting to see what happens if you disable >>>>> intrinsics using existing flag, for example: >>>>> >>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>> >>>> Hi Vivek, >>>> >>>> I want to point that you can do this experiment later. We can file >>>> bugs and fixed them after FC. >>>> >>>> For now, please, answer my question about flags only. This is the >>>> only thing holding it from push. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>> Hi all >>>>>> >>>>>> I would like to contribute a patch which optimizes Math.sin() and >>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>> implementation. >>>>>> >>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>> >>>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>>> and -XX:+UseLibmCosIntrinsic. >>>>>> >>>>>> Could you please review and sponsor this patch. >>>>>> >>>>>> Bug-id: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>> >>>>>> Thanks and regards, >>>>>> >>>>>> Vivek >>>>>> From vladimir.kozlov at oracle.com Thu Dec 3 21:16:46 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 3 Dec 2015 13:16:46 -0800 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion In-Reply-To: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> Message-ID: <5660B13E.7080509@oracle.com> You reversed 8011858 changes which made stack much smaller - live_nodes was usually 1/10 of unique nodes: - stack.map((C->live_nodes() >> 1) + 16, NULL); + Node_Stack stack(arena, (C->unique() >> 2) + 16); // pre-grow Please, use live_nodes with your >> 2 change: Node_Stack stack(arena, (C->live_nodes() >> 2) + 16); // pre-grow Iterator changes seems fine to me. Thanks, Vladimir On 12/3/15 9:17 AM, Doerr, Martin wrote: > Hi, > > I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix > the performance problem we observe in Octane benchmarks. > > It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ > > Please review. > > The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. > > My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn?t observe resizing because it was too small. > > I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. > > How was that previous value determined? Should I implement it differently? > > Best regards, > > Martin > From joe.darcy at oracle.com Thu Dec 3 21:16:43 2015 From: joe.darcy at oracle.com (joe darcy) Date: Thu, 3 Dec 2015 13:16:43 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5660AEB6.8060007@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> Message-ID: <5660B13B.1020907@oracle.com> I think it is unwise for this large of an implementation change to be pushed with no tests targeting the specifics of the new implementation. The worst-case tests in the jdk repo are the mathematical worst cases for floating-point approximations, in other words the cases were the exact mathematical answer is closes to half-way between two representation floating-point numbers. Passing such tests is necessary but not sufficient condition for a new implementation. Chers, -Joe On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: > Okay, looks reasonable to me. > > Thanks, > Vladimir > > On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >> Hi Vladimir >> >> This is the link for the updated webrev with latest hotspot source as >> base for your review. >> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >> Thank you. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Wednesday, December 02, 2015 10:33 PM >> To: 'Vladimir Kozlov'; joe darcy >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hi Vladimir >> >> This is the link for the updated webrev for your review. >> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >> Thank you. >> >> Regards, >> Vivek >> >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, December 01, 2015 6:06 PM >> To: Deshpande, Vivek R; joe darcy >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Please send link to new webrev on cr server. >> >> Thanks, >> Vladimir >> >> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> Please find the webrev with your suggested updates attached with the >>> mail. >>> We will update it in the jbs entry soon. >>> Please let me know if it needs further changes. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Deshpande, Vivek R >>> Sent: Tuesday, November 24, 2015 10:22 AM >>> To: 'joe darcy'; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> HI Vladimir, Joe >>> >>> I have done the jtreg tests in hotspot and tests from jdk you have >>> mentioned. It passed those tests. >>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>> The performance gain is 3.2x over base jdk, that is over current >>> fsin/fcos intrinsic. This gain is more realistic. >>> >>> Could I get those tests around the boundary values. Would >>> WorstCaseTests.java jtreg test in jdk test those ? >>> If yes, then it has passed those boundary cases. >>> >>> I would work on adding either diagnostic flag or just one flag for >>> libm and send out the webrev soon. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Monday, November 23, 2015 6:28 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Hello, >>> >>> Just getting added to the thread.. >>> >>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>> Thank you, for explanation, Vivek. >>>> >>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>> Hotspot tests. >>>> >>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>>>> result and not exact result. So I added the flag to switch between >>>>> FDLIBM and LIBM. >>>>> >>>>> Quick explanation: >>>>> This is what we observed with comparison to HPA Library >>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>> LIBM Observed Math result=0.19457293629570213 (4596178249117717083L) >>>>> (StrictMath - 1ulp) Required result should be = 0.19457293629570216 >>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>>>> result is between the above two values and Exact result would be >>>>> pretty close to it. >>>>> So here StrictMath result is less than quad-precision result, Math >>>>> result should be StrictMath or StrictMath + 1ulp and not StrictMath >>>>> - 1ulp, according to our test. >>>> >>>> Note, java.lang.Math allows to have 1ulp off (in both direction, I >>>> think) and it should be consistent for Interpreter and code generated >>>> by JIT compilers: >>>> >>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28do >>>> u >>>> ble%29 >>>> >>> >>> That interpretation of the spec is not quite right. For the Math >>> methods with a 1/2 ulp error bound, the floating-point result >>> closest to the exact result must be returned. For the methods with a >>> 1 ulp error bound, either of the floating-point result bracketing >>> the true result can be returned, subject to the monotonicity >>> constraints of the specification of the particular method. >>> >>>> >>>>> >>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter would >>>>> go through LIBM and C1 and c2 through FDLIBM. >>>>> If we want to disable LIBM completely, we need the flags >>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>> >>>> I was thinking about using existing >>>> DirectiveSet::is_intrinsic_disabled() and >>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>> versions of functions which accept intrinsic ID instead of >>>> methodHandle. >>>> >>>> If you still want to use flags make them diagnostic. >>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>> >>>>> >>>>> Also the performance gain ~4x is with XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dsin/_dcos. >>>> >>>> You confused me here. So you get 4x when only Interpreter use LIBM >>>> code and compilers use FDLIB? >>> >>> Just to be clear, are you comparing the new code to FDLIBM >>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>> >>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>> Port fdlibm to Java), which is providing a significant speed boost >>> to the StrictMath methods that have been ported. >>> >>> I find the current patch *insufficient* as-is in terms of its testing. >>> For example, part of patch says >>> >>> # For sin >>> >>> +// This means that the main path is actually only taken for >>> +// 2^-252 <= |X| < 90112. >>> >>> # For cos >>> >>> +// This means that the main path is actually only taken for >>> +// 2^-252 <= |X| < 90112. >>> >>> If nothing else, there are no tests at around those boundary values, >>> which is unacceptable. There should also be some tests of values of >>> interest to the algorithm in question. >>> >>> Cheers, >>> >>> -Joe >>> >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Let me know your thoughts on this. I would answer more questions and >>>>> give more data if needed. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: Viswanathan, Sandhya >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>> What is the reason you decided to add new flags? exp() and log() >>>>>> changes did not have flags. >>>>>> >>>>>> It would be interesting to see what happens if you disable >>>>>> intrinsics using existing flag, for example: >>>>>> >>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>> >>>>> Hi Vivek, >>>>> >>>>> I want to point that you can do this experiment later. We can file >>>>> bugs and fixed them after FC. >>>>> >>>>> For now, please, answer my question about flags only. This is the >>>>> only thing holding it from push. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>> Hi all >>>>>>> >>>>>>> I would like to contribute a patch which optimizes Math.sin() and >>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>> implementation. >>>>>>> >>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>> >>>>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>>>> and -XX:+UseLibmCosIntrinsic. >>>>>>> >>>>>>> Could you please review and sponsor this patch. >>>>>>> >>>>>>> Bug-id: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>> webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>> >>>>>>> Thanks and regards, >>>>>>> >>>>>>> Vivek >>>>>>> From vivek.r.deshpande at intel.com Thu Dec 3 21:22:52 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 3 Dec 2015 21:22:52 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5660B13B.1020907@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> Hi Joe It would be great if you would please share the additional tests with us. Regards, Vivek -----Original Message----- From: joe darcy [mailto:joe.darcy at oracle.com] Sent: Thursday, December 03, 2015 1:17 PM To: Vladimir Kozlov; Deshpande, Vivek R Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib I think it is unwise for this large of an implementation change to be pushed with no tests targeting the specifics of the new implementation. The worst-case tests in the jdk repo are the mathematical worst cases for floating-point approximations, in other words the cases were the exact mathematical answer is closes to half-way between two representation floating-point numbers. Passing such tests is necessary but not sufficient condition for a new implementation. Chers, -Joe On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: > Okay, looks reasonable to me. > > Thanks, > Vladimir > > On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >> Hi Vladimir >> >> This is the link for the updated webrev with latest hotspot source as >> base for your review. >> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >> Thank you. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Wednesday, December 02, 2015 10:33 PM >> To: 'Vladimir Kozlov'; joe darcy >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hi Vladimir >> >> This is the link for the updated webrev for your review. >> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >> Thank you. >> >> Regards, >> Vivek >> >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, December 01, 2015 6:06 PM >> To: Deshpande, Vivek R; joe darcy >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Please send link to new webrev on cr server. >> >> Thanks, >> Vladimir >> >> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> Please find the webrev with your suggested updates attached with the >>> mail. >>> We will update it in the jbs entry soon. >>> Please let me know if it needs further changes. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Deshpande, Vivek R >>> Sent: Tuesday, November 24, 2015 10:22 AM >>> To: 'joe darcy'; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> HI Vladimir, Joe >>> >>> I have done the jtreg tests in hotspot and tests from jdk you have >>> mentioned. It passed those tests. >>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>> The performance gain is 3.2x over base jdk, that is over current >>> fsin/fcos intrinsic. This gain is more realistic. >>> >>> Could I get those tests around the boundary values. Would >>> WorstCaseTests.java jtreg test in jdk test those ? >>> If yes, then it has passed those boundary cases. >>> >>> I would work on adding either diagnostic flag or just one flag for >>> libm and send out the webrev soon. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Monday, November 23, 2015 6:28 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> Hello, >>> >>> Just getting added to the thread.. >>> >>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>> Thank you, for explanation, Vivek. >>>> >>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>> Hotspot tests. >>>> >>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>>>> result and not exact result. So I added the flag to switch between >>>>> FDLIBM and LIBM. >>>>> >>>>> Quick explanation: >>>>> This is what we observed with comparison to HPA Library >>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>> LIBM Observed Math result=0.19457293629570213 >>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result should >>>>> be = 0.19457293629570216 >>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>>>> result is between the above two values and Exact result would be >>>>> pretty close to it. >>>>> So here StrictMath result is less than quad-precision result, Math >>>>> result should be StrictMath or StrictMath + 1ulp and not >>>>> StrictMath >>>>> - 1ulp, according to our test. >>>> >>>> Note, java.lang.Math allows to have 1ulp off (in both direction, I >>>> think) and it should be consistent for Interpreter and code >>>> generated by JIT compilers: >>>> >>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28 >>>> do >>>> u >>>> ble%29 >>>> >>> >>> That interpretation of the spec is not quite right. For the Math >>> methods with a 1/2 ulp error bound, the floating-point result >>> closest to the exact result must be returned. For the methods with a >>> 1 ulp error bound, either of the floating-point result bracketing >>> the true result can be returned, subject to the monotonicity >>> constraints of the specification of the particular method. >>> >>>> >>>>> >>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>> If we want to disable LIBM completely, we need the flags >>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>> >>>> I was thinking about using existing >>>> DirectiveSet::is_intrinsic_disabled() and >>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>> versions of functions which accept intrinsic ID instead of >>>> methodHandle. >>>> >>>> If you still want to use flags make them diagnostic. >>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>> >>>>> >>>>> Also the performance gain ~4x is with >>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>> >>>> You confused me here. So you get 4x when only Interpreter use LIBM >>>> code and compilers use FDLIB? >>> >>> Just to be clear, are you comparing the new code to FDLIBM >>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>> >>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>> Port fdlibm to Java), which is providing a significant speed boost >>> to the StrictMath methods that have been ported. >>> >>> I find the current patch *insufficient* as-is in terms of its testing. >>> For example, part of patch says >>> >>> # For sin >>> >>> +// This means that the main path is actually only taken for >>> +// 2^-252 <= |X| < 90112. >>> >>> # For cos >>> >>> +// This means that the main path is actually only taken for >>> +// 2^-252 <= |X| < 90112. >>> >>> If nothing else, there are no tests at around those boundary values, >>> which is unacceptable. There should also be some tests of values of >>> interest to the algorithm in question. >>> >>> Cheers, >>> >>> -Joe >>> >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Let me know your thoughts on this. I would answer more questions >>>>> and give more data if needed. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>> Cc: Viswanathan, Sandhya >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>> What is the reason you decided to add new flags? exp() and log() >>>>>> changes did not have flags. >>>>>> >>>>>> It would be interesting to see what happens if you disable >>>>>> intrinsics using existing flag, for example: >>>>>> >>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>> >>>>> Hi Vivek, >>>>> >>>>> I want to point that you can do this experiment later. We can file >>>>> bugs and fixed them after FC. >>>>> >>>>> For now, please, answer my question about flags only. This is the >>>>> only thing holding it from push. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>> Hi all >>>>>>> >>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>> and >>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>> implementation. >>>>>>> >>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>> >>>>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>>>> and -XX:+UseLibmCosIntrinsic. >>>>>>> >>>>>>> Could you please review and sponsor this patch. >>>>>>> >>>>>>> Bug-id: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>> webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>> >>>>>>> Thanks and regards, >>>>>>> >>>>>>> Vivek >>>>>>> From vladimir.kozlov at oracle.com Thu Dec 3 21:25:25 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 3 Dec 2015 13:25:25 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> Message-ID: <5660B345.8010905@oracle.com> Vivek, I think Joe is asking you to write these tests as hotspot regression test in hotspot/test/compiler. Vladimir On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: > Hi Joe > > It would be great if you would please share the additional tests with us. > > Regards, > Vivek > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Thursday, December 03, 2015 1:17 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > I think it is unwise for this large of an implementation change to be pushed with no tests targeting the specifics of the new implementation. > > The worst-case tests in the jdk repo are the mathematical worst cases for floating-point approximations, in other words the cases were the exact mathematical answer is closes to half-way between two representation floating-point numbers. Passing such tests is necessary but not sufficient condition for a new implementation. > > Chers, > > -Joe > > On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >> Okay, looks reasonable to me. >> >> Thanks, >> Vladimir >> >> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> This is the link for the updated webrev with latest hotspot source as >>> base for your review. >>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Deshpande, Vivek R >>> Sent: Wednesday, December 02, 2015 10:33 PM >>> To: 'Vladimir Kozlov'; joe darcy >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Hi Vladimir >>> >>> This is the link for the updated webrev for your review. >>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, December 01, 2015 6:06 PM >>> To: Deshpande, Vivek R; joe darcy >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Please send link to new webrev on cr server. >>> >>> Thanks, >>> Vladimir >>> >>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> Please find the webrev with your suggested updates attached with the >>>> mail. >>>> We will update it in the jbs entry soon. >>>> Please let me know if it needs further changes. >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: Deshpande, Vivek R >>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>> To: 'joe darcy'; Vladimir Kozlov >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> HI Vladimir, Joe >>>> >>>> I have done the jtreg tests in hotspot and tests from jdk you have >>>> mentioned. It passed those tests. >>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>> The performance gain is 3.2x over base jdk, that is over current >>>> fsin/fcos intrinsic. This gain is more realistic. >>>> >>>> Could I get those tests around the boundary values. Would >>>> WorstCaseTests.java jtreg test in jdk test those ? >>>> If yes, then it has passed those boundary cases. >>>> >>>> I would work on adding either diagnostic flag or just one flag for >>>> libm and send out the webrev soon. >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Monday, November 23, 2015 6:28 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Hello, >>>> >>>> Just getting added to the thread.. >>>> >>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>> Thank you, for explanation, Vivek. >>>>> >>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>> Hotspot tests. >>>>> >>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>>>>> result and not exact result. So I added the flag to switch between >>>>>> FDLIBM and LIBM. >>>>>> >>>>>> Quick explanation: >>>>>> This is what we observed with comparison to HPA Library >>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result should >>>>>> be = 0.19457293629570216 >>>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>>>>> result is between the above two values and Exact result would be >>>>>> pretty close to it. >>>>>> So here StrictMath result is less than quad-precision result, Math >>>>>> result should be StrictMath or StrictMath + 1ulp and not >>>>>> StrictMath >>>>>> - 1ulp, according to our test. >>>>> >>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, I >>>>> think) and it should be consistent for Interpreter and code >>>>> generated by JIT compilers: >>>>> >>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28 >>>>> do >>>>> u >>>>> ble%29 >>>>> >>>> >>>> That interpretation of the spec is not quite right. For the Math >>>> methods with a 1/2 ulp error bound, the floating-point result >>>> closest to the exact result must be returned. For the methods with a >>>> 1 ulp error bound, either of the floating-point result bracketing >>>> the true result can be returned, subject to the monotonicity >>>> constraints of the specification of the particular method. >>>> >>>>> >>>>>> >>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>> If we want to disable LIBM completely, we need the flags >>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>> >>>>> I was thinking about using existing >>>>> DirectiveSet::is_intrinsic_disabled() and >>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>> versions of functions which accept intrinsic ID instead of >>>>> methodHandle. >>>>> >>>>> If you still want to use flags make them diagnostic. >>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>> >>>>>> >>>>>> Also the performance gain ~4x is with >>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>> >>>>> You confused me here. So you get 4x when only Interpreter use LIBM >>>>> code and compilers use FDLIB? >>>> >>>> Just to be clear, are you comparing the new code to FDLIBM >>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>> >>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>> Port fdlibm to Java), which is providing a significant speed boost >>>> to the StrictMath methods that have been ported. >>>> >>>> I find the current patch *insufficient* as-is in terms of its testing. >>>> For example, part of patch says >>>> >>>> # For sin >>>> >>>> +// This means that the main path is actually only taken for >>>> +// 2^-252 <= |X| < 90112. >>>> >>>> # For cos >>>> >>>> +// This means that the main path is actually only taken for >>>> +// 2^-252 <= |X| < 90112. >>>> >>>> If nothing else, there are no tests at around those boundary values, >>>> which is unacceptable. There should also be some tests of values of >>>> interest to the algorithm in question. >>>> >>>> Cheers, >>>> >>>> -Joe >>>> >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Let me know your thoughts on this. I would answer more questions >>>>>> and give more data if needed. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>> Cc: Viswanathan, Sandhya >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>> What is the reason you decided to add new flags? exp() and log() >>>>>>> changes did not have flags. >>>>>>> >>>>>>> It would be interesting to see what happens if you disable >>>>>>> intrinsics using existing flag, for example: >>>>>>> >>>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>>> >>>>>> Hi Vivek, >>>>>> >>>>>> I want to point that you can do this experiment later. We can file >>>>>> bugs and fixed them after FC. >>>>>> >>>>>> For now, please, answer my question about flags only. This is the >>>>>> only thing holding it from push. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi all >>>>>>>> >>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>> and >>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>> implementation. >>>>>>>> >>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>> >>>>>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>>>>> and -XX:+UseLibmCosIntrinsic. >>>>>>>> >>>>>>>> Could you please review and sponsor this patch. >>>>>>>> >>>>>>>> Bug-id: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>> webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>> >>>>>>>> Thanks and regards, >>>>>>>> >>>>>>>> Vivek >>>>>>>> > From joe.darcy at oracle.com Thu Dec 3 21:28:45 2015 From: joe.darcy at oracle.com (joe darcy) Date: Thu, 3 Dec 2015 13:28:45 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5660B345.8010905@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> Message-ID: <5660B40D.4050800@oracle.com> Hello, On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: > Vivek, > > I think Joe is asking you to write these tests as hotspot regression > test in hotspot/test/compiler. Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. Thanks, -Joe > > Vladimir > > On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >> Hi Joe >> >> It would be great if you would please share the additional tests with >> us. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: joe darcy [mailto:joe.darcy at oracle.com] >> Sent: Thursday, December 03, 2015 1:17 PM >> To: Vladimir Kozlov; Deshpande, Vivek R >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> I think it is unwise for this large of an implementation change to be >> pushed with no tests targeting the specifics of the new implementation. >> >> The worst-case tests in the jdk repo are the mathematical worst cases >> for floating-point approximations, in other words the cases were the >> exact mathematical answer is closes to half-way between two >> representation floating-point numbers. Passing such tests is >> necessary but not sufficient condition for a new implementation. >> >> Chers, >> >> -Joe >> >> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>> Okay, looks reasonable to me. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> This is the link for the updated webrev with latest hotspot source as >>>> base for your review. >>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>> Thank you. >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: Deshpande, Vivek R >>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>> To: 'Vladimir Kozlov'; joe darcy >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >>>> lib >>>> >>>> Hi Vladimir >>>> >>>> This is the link for the updated webrev for your review. >>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>> Thank you. >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>> To: Deshpande, Vivek R; joe darcy >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>>> lib >>>> >>>> Please send link to new webrev on cr server. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> Please find the webrev with your suggested updates attached with the >>>>> mail. >>>>> We will update it in the jbs entry soon. >>>>> Please let me know if it needs further changes. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: Deshpande, Vivek R >>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>> To: 'joe darcy'; Vladimir Kozlov >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> HI Vladimir, Joe >>>>> >>>>> I have done the jtreg tests in hotspot and tests from jdk you have >>>>> mentioned. It passed those tests. >>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>> The performance gain is 3.2x over base jdk, that is over current >>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>> >>>>> Could I get those tests around the boundary values. Would >>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>> If yes, then it has passed those boundary cases. >>>>> >>>>> I would work on adding either diagnostic flag or just one flag for >>>>> libm and send out the webrev soon. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Hello, >>>>> >>>>> Just getting added to the thread.. >>>>> >>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>> Thank you, for explanation, Vivek. >>>>>> >>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>> Hotspot tests. >>>>>> >>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> The result we obtain with LIBM are within +/- 1ulp from StrictMath >>>>>>> result and not exact result. So I added the flag to switch between >>>>>>> FDLIBM and LIBM. >>>>>>> >>>>>>> Quick explanation: >>>>>>> This is what we observed with comparison to HPA Library >>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result should >>>>>>> be = 0.19457293629570216 >>>>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA library >>>>>>> result is between the above two values and Exact result would be >>>>>>> pretty close to it. >>>>>>> So here StrictMath result is less than quad-precision result, Math >>>>>>> result should be StrictMath or StrictMath + 1ulp and not >>>>>>> StrictMath >>>>>>> - 1ulp, according to our test. >>>>>> >>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, I >>>>>> think) and it should be consistent for Interpreter and code >>>>>> generated by JIT compilers: >>>>>> >>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin%28 >>>>>> do >>>>>> u >>>>>> ble%29 >>>>>> >>>>> >>>>> That interpretation of the spec is not quite right. For the Math >>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>> closest to the exact result must be returned. For the methods with a >>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>> the true result can be returned, subject to the monotonicity >>>>> constraints of the specification of the particular method. >>>>> >>>>>> >>>>>>> >>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>> >>>>>> I was thinking about using existing >>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>> versions of functions which accept intrinsic ID instead of >>>>>> methodHandle. >>>>>> >>>>>> If you still want to use flags make them diagnostic. >>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>> >>>>>>> >>>>>>> Also the performance gain ~4x is with >>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>> >>>>>> You confused me here. So you get 4x when only Interpreter use LIBM >>>>>> code and compilers use FDLIB? >>>>> >>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>> >>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>> Port fdlibm to Java), which is providing a significant speed boost >>>>> to the StrictMath methods that have been ported. >>>>> >>>>> I find the current patch *insufficient* as-is in terms of its >>>>> testing. >>>>> For example, part of patch says >>>>> >>>>> # For sin >>>>> >>>>> +// This means that the main path is actually only taken for >>>>> +// 2^-252 <= |X| < 90112. >>>>> >>>>> # For cos >>>>> >>>>> +// This means that the main path is actually only taken for >>>>> +// 2^-252 <= |X| < 90112. >>>>> >>>>> If nothing else, there are no tests at around those boundary values, >>>>> which is unacceptable. There should also be some tests of values of >>>>> interest to the algorithm in question. >>>>> >>>>> Cheers, >>>>> >>>>> -Joe >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Let me know your thoughts on this. I would answer more questions >>>>>>> and give more data if needed. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>> Cc: Viswanathan, Sandhya >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>> What is the reason you decided to add new flags? exp() and log() >>>>>>>> changes did not have flags. >>>>>>>> >>>>>>>> It would be interesting to see what happens if you disable >>>>>>>> intrinsics using existing flag, for example: >>>>>>>> >>>>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>>>> >>>>>>> Hi Vivek, >>>>>>> >>>>>>> I want to point that you can do this experiment later. We can file >>>>>>> bugs and fixed them after FC. >>>>>>> >>>>>>> For now, please, answer my question about flags only. This is the >>>>>>> only thing holding it from push. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>> and >>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>> >>>>>>>>> The option to use the optimizations are -XX:+UseLibmSinIntrinsic >>>>>>>>> and -XX:+UseLibmCosIntrinsic. >>>>>>>>> >>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>> >>>>>>>>> Bug-id: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>> webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> >>>>>>>>> Vivek >>>>>>>>> >> From mandy.chung at oracle.com Thu Dec 3 21:33:51 2015 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 3 Dec 2015 13:33:51 -0800 Subject: Reference.reachabilityFence In-Reply-To: References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> Message-ID: <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> > On Nov 26, 2015, at 8:22 AM, Paul Sandoz wrote: > > Hi, > > I have updated the patches: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-hotspot/webrev/ > > There is now more documentation on Reference (copied and suitable rearranged from 166 Fences.java). The method name remains the same. > I think the addition to the Reference class specification should belong to the reachabilityFence method specification. Any reason why not? I suggest to change this (occurs in the class spec and the method spec): strongly reachable (as defined in the {@link java.lang.ref} package documentation), to strongly reachable Should the reachabilityFence method throw NPE if ref is null? Mandy From vivek.r.deshpande at intel.com Thu Dec 3 22:01:39 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 3 Dec 2015 22:01:39 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5660B40D.4050800@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> Hi Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. Let me know your thoughts. Regards, Vivek -----Original Message----- From: joe darcy [mailto:joe.darcy at oracle.com] Sent: Thursday, December 03, 2015 1:29 PM To: Vladimir Kozlov; Deshpande, Vivek R Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Hello, On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: > Vivek, > > I think Joe is asking you to write these tests as hotspot regression > test in hotspot/test/compiler. Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. Thanks, -Joe > > Vladimir > > On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >> Hi Joe >> >> It would be great if you would please share the additional tests with >> us. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: joe darcy [mailto:joe.darcy at oracle.com] >> Sent: Thursday, December 03, 2015 1:17 PM >> To: Vladimir Kozlov; Deshpande, Vivek R >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> I think it is unwise for this large of an implementation change to be >> pushed with no tests targeting the specifics of the new implementation. >> >> The worst-case tests in the jdk repo are the mathematical worst cases >> for floating-point approximations, in other words the cases were the >> exact mathematical answer is closes to half-way between two >> representation floating-point numbers. Passing such tests is >> necessary but not sufficient condition for a new implementation. >> >> Chers, >> >> -Joe >> >> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>> Okay, looks reasonable to me. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> This is the link for the updated webrev with latest hotspot source >>>> as base for your review. >>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>> Thank you. >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: Deshpande, Vivek R >>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>> To: 'Vladimir Kozlov'; joe darcy >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Hi Vladimir >>>> >>>> This is the link for the updated webrev for your review. >>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>> Thank you. >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>> To: Deshpande, Vivek R; joe darcy >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Please send link to new webrev on cr server. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> Please find the webrev with your suggested updates attached with >>>>> the mail. >>>>> We will update it in the jbs entry soon. >>>>> Please let me know if it needs further changes. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: Deshpande, Vivek R >>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>> To: 'joe darcy'; Vladimir Kozlov >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> HI Vladimir, Joe >>>>> >>>>> I have done the jtreg tests in hotspot and tests from jdk you have >>>>> mentioned. It passed those tests. >>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>> The performance gain is 3.2x over base jdk, that is over current >>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>> >>>>> Could I get those tests around the boundary values. Would >>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>> If yes, then it has passed those boundary cases. >>>>> >>>>> I would work on adding either diagnostic flag or just one flag for >>>>> libm and send out the webrev soon. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Hello, >>>>> >>>>> Just getting added to the thread.. >>>>> >>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>> Thank you, for explanation, Vivek. >>>>>> >>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>> Hotspot tests. >>>>>> >>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>> switch between FDLIBM and LIBM. >>>>>>> >>>>>>> Quick explanation: >>>>>>> This is what we observed with comparison to HPA Library >>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>> should be = 0.19457293629570216 >>>>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>> library result is between the above two values and Exact result >>>>>>> would be pretty close to it. >>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>> StrictMath >>>>>>> - 1ulp, according to our test. >>>>>> >>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>> I >>>>>> think) and it should be consistent for Interpreter and code >>>>>> generated by JIT compilers: >>>>>> >>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin% >>>>>> 28 >>>>>> do >>>>>> u >>>>>> ble%29 >>>>>> >>>>> >>>>> That interpretation of the spec is not quite right. For the Math >>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>> closest to the exact result must be returned. For the methods with >>>>> a >>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>> the true result can be returned, subject to the monotonicity >>>>> constraints of the specification of the particular method. >>>>> >>>>>> >>>>>>> >>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>> >>>>>> I was thinking about using existing >>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>> versions of functions which accept intrinsic ID instead of >>>>>> methodHandle. >>>>>> >>>>>> If you still want to use flags make them diagnostic. >>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>> >>>>>>> >>>>>>> Also the performance gain ~4x is with >>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>> >>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>> LIBM code and compilers use FDLIB? >>>>> >>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>> >>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>> Port fdlibm to Java), which is providing a significant speed boost >>>>> to the StrictMath methods that have been ported. >>>>> >>>>> I find the current patch *insufficient* as-is in terms of its >>>>> testing. >>>>> For example, part of patch says >>>>> >>>>> # For sin >>>>> >>>>> +// This means that the main path is actually only taken for >>>>> +// 2^-252 <= |X| < 90112. >>>>> >>>>> # For cos >>>>> >>>>> +// This means that the main path is actually only taken for >>>>> +// 2^-252 <= |X| < 90112. >>>>> >>>>> If nothing else, there are no tests at around those boundary >>>>> values, which is unacceptable. There should also be some tests of >>>>> values of interest to the algorithm in question. >>>>> >>>>> Cheers, >>>>> >>>>> -Joe >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Let me know your thoughts on this. I would answer more questions >>>>>>> and give more data if needed. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>> Cc: Viswanathan, Sandhya >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>> log() changes did not have flags. >>>>>>>> >>>>>>>> It would be interesting to see what happens if you disable >>>>>>>> intrinsics using existing flag, for example: >>>>>>>> >>>>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>>>> >>>>>>> Hi Vivek, >>>>>>> >>>>>>> I want to point that you can do this experiment later. We can >>>>>>> file bugs and fixed them after FC. >>>>>>> >>>>>>> For now, please, answer my question about flags only. This is >>>>>>> the only thing holding it from push. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>> and >>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>> >>>>>>>>> The option to use the optimizations are >>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>> >>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>> >>>>>>>>> Bug-id: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>> webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> >>>>>>>>> Vivek >>>>>>>>> >> From mikael.vidstedt at oracle.com Thu Dec 3 23:24:25 2015 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Thu, 3 Dec 2015 15:24:25 -0800 Subject: RFR (XS): 8144657: Invalid format specifiers in jvmci trace messages Message-ID: <5660CF29.3060703@oracle.com> Please review this simple change which updates the format specifiers used in a couple of JVMCI_trace_3 calls: Bug: https://bugs.openjdk.java.net/browse/JDK-8144657 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8144657/webrev.00/webrev/ narrowOop is a juint, so %p doesn't match, and for the rest of the cases PTR_FORMAT+p2i is more in line with the "standard" pattern for printing pointers. Cheers, Mikael From edward.nevill at gmail.com Fri Dec 4 09:59:46 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 04 Dec 2015 09:59:46 +0000 Subject: RFR: 8144498: aarch64: large code cache generates SEGV Message-ID: <1449223186.15424.42.camel@mint> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8144498/webrev.0/ JIRA issue: https://bugs.openjdk.java.net/browse/JDK-8144498 This fixes an issue when random SEGVs were generated with -XX:ReservedCodeCacheSize > 128m The problem was that pd_call_destination was using addr() rather than orig_addr. IE. It was using the address in the copied, but not relocated code. It was then following a call destination to determine whether or not this was a call to a trampoline (in order that it could substitute the final trampoline address). Usually this worked OK because it ended up just referencing a random address in the code buffer. However, very occasionally it would point to a trampoline somewhere in the code buffer and get a false positive. In this case it would substitute the final address of that trampoline. The result was that it would very occasionally relocate the address of some call to a random trampoline stub. I have tested this with jtreg hotspot and langtools with -XX:ReservedCodeCacheSize=256m and without specifying any ReservedCodeCacheSize (so it defaults to 128m). With ReservedCodeCacheSize == default Hotspot (original): Test results: passed: 935; failed: 22; error: 12 Hotspot (patched): Test results: passed: 942; failed: 15; error: 12 Langtools (original): Test results: passed: 3,313; failed: 33 Langtools (patched): Test results: passed: 3,316; failed: 33 With -XX:+ReservedCodeCacheSize=256m Hotspot (original): Test results: passed: 865; failed: 19; error: 85 Hotspot (patched): Test results: passed: 946; failed: 10; error: 13 Langtools (original): Test results: passed: 3,049; failed: 77; error: 223 Langtools (patched): Test results: passed: 3,314; failed: 33 So in all cases it generates results as good, or better than the original. In the case of langtools with a 256m buffer it goes from 300 failures+errors to just 33. I have also tested this with EEMBC GrinderBench which also showed the problem every few 100 runs. I have run this over 5000 times with no occurrence of the problem. Thanks for your review, Ed. From adinn at redhat.com Fri Dec 4 10:11:27 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 4 Dec 2015 10:11:27 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449223186.15424.42.camel@mint> References: <1449223186.15424.42.camel@mint> Message-ID: <566166CF.5000006@redhat.com> On 04/12/15 09:59, Edward Nevill wrote: > Hi, > > Please review the following webrev . . . Reviewed by me as an AArch64-only patch. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Paul Argiry (US) From roland.westrelin at oracle.com Fri Dec 4 13:08:55 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 4 Dec 2015 14:08:55 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash Message-ID: http://cr.openjdk.java.net/~roland/8139771/webrev.00/ In PhiNode::Ideal(), when all uncasted inputs are identical, the Phi is removed which can cause a control dependency to be lost. See the test case (which includes a step by step description of how this can lead to a bad graph) for an example. The test case crashes on sparc with -XX:+StressGCM. To fix that I propose that rather than simply replacing the Phi by its uncasted input, we replace it by a CastPP that is specially marked as carrying a dependency (similar to what we have for CastII). That fixes this issue and should protect us from other similar issues: - I moved the _carry_dependency from CastII to ConstraintCast and CheckCastPP so it applies to CastPP, CastII and CheckCastPP - I added code to remove the casts that carry a dependency if there?s a dominating cast with identical inputs and a more restrictive type. I made is_dominator() a virtual method of PhaseTransform so we can have an implementation in GVN and use the same code during GVN and loop opts - We can now have a chain of CastPP with a control so the code in final_graph_reshaping shouldn?t go follow through a CastPP only if its control is null - I changed the code in PhaseCFG::schedule_pinned_nodes() to handle controls that are in the same block I performed extensive perf testing on x86 and sparc and found not statistically significant regressions. Roland. From paul.sandoz at oracle.com Fri Dec 4 13:47:55 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 4 Dec 2015 14:47:55 +0100 Subject: Reference.reachabilityFence In-Reply-To: <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> Message-ID: <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> > On 3 Dec 2015, at 22:33, Mandy Chung wrote: > > >> On Nov 26, 2015, at 8:22 AM, Paul Sandoz wrote: >> >> Hi, >> >> I have updated the patches: >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-hotspot/webrev/ >> >> There is now more documentation on Reference (copied and suitable rearranged from 166 Fences.java). The method name remains the same. >> > > I think the addition to the Reference class specification should belong to the reachabilityFence method specification. Any reason why not? I thought it would be more visible in the JavaDoc, as it?s there upfront. The api note may get larger if we include some additional real world examples. I don?t have a strong opinion on this, if yours is stronger i will move it :-) > > I suggest to change this (occurs in the class spec and the method spec): > strongly reachable (as defined in the {@link java.lang.ref} package documentation), > > to > strongly reachable > Good point. I also linked to the referred section in the JLS. > Should the reachabilityFence method throw NPE if ref is null? > I am ok with it doing nothing, it?s a performance sensitive method. It means no null checks/de-opts are required and (hand-waving here...) might make it more amenable to optimization see https://bugs.openjdk.java.net/browse/JDK-8130398). Updated: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ reachabilityFence is now annotated with @DontInline (to be pushed real soon now) and the HotSpot changes are no longer needed. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From martin.doerr at sap.com Fri Dec 4 14:02:36 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 4 Dec 2015 14:02:36 +0000 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion In-Reply-To: <5660B13E.7080509@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> <5660B13E.7080509@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB41811656722867C6@DEWDFEMB19C.global.corp.sap> Hi Vladimir, thank you very much for your review. I have changed the Node_Stack initialization as you suggested. In addition, I changed the iterator to really support removal of nodes: uint idx = MIN2(_stack.index(), self->outcnt()); // Support removal of nodes. I believe it's currently not needed in openjdk, but it may be needed in the future. Hope this is ok. The new webrev Is here: http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.01/ Can anybody volunteer to sponsor, please? Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 3. Dezember 2015 22:17 To: Doerr, Martin ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion You reversed 8011858 changes which made stack much smaller - live_nodes was usually 1/10 of unique nodes: - stack.map((C->live_nodes() >> 1) + 16, NULL); + Node_Stack stack(arena, (C->unique() >> 2) + 16); // pre-grow Please, use live_nodes with your >> 2 change: Node_Stack stack(arena, (C->live_nodes() >> 2) + 16); // pre-grow Iterator changes seems fine to me. Thanks, Vladimir On 12/3/15 9:17 AM, Doerr, Martin wrote: > Hi, > > I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix > the performance problem we observe in Octane benchmarks. > > It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ > > Please review. > > The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. > > My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn't observe resizing because it was too small. > > I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. > > How was that previous value determined? Should I implement it differently? > > Best regards, > > Martin > From martin.doerr at sap.com Fri Dec 4 16:01:42 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 4 Dec 2015 16:01:42 +0000 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion References: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> <5660B13E.7080509@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228683A@DEWDFEMB19C.global.corp.sap> Please use this one: http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.02/ It prevents a build warning on windows. I have added an explicit type cast. Best regards, Martin -----Original Message----- From: Doerr, Martin Sent: Freitag, 4. Dezember 2015 15:03 To: 'Vladimir Kozlov' ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net Cc: Lindenmaier, Goetz Subject: RE: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion Hi Vladimir, thank you very much for your review. I have changed the Node_Stack initialization as you suggested. In addition, I changed the iterator to really support removal of nodes: uint idx = MIN2(_stack.index(), self->outcnt()); // Support removal of nodes. I believe it's currently not needed in openjdk, but it may be needed in the future. Hope this is ok. The new webrev Is here: http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.01/ Can anybody volunteer to sponsor, please? Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 3. Dezember 2015 22:17 To: Doerr, Martin ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion You reversed 8011858 changes which made stack much smaller - live_nodes was usually 1/10 of unique nodes: - stack.map((C->live_nodes() >> 1) + 16, NULL); + Node_Stack stack(arena, (C->unique() >> 2) + 16); // pre-grow Please, use live_nodes with your >> 2 change: Node_Stack stack(arena, (C->live_nodes() >> 2) + 16); // pre-grow Iterator changes seems fine to me. Thanks, Vladimir On 12/3/15 9:17 AM, Doerr, Martin wrote: > Hi, > > I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix > the performance problem we observe in Octane benchmarks. > > It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ > > Please review. > > The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. > > My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn't observe resizing because it was too small. > > I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. > > How was that previous value determined? Should I implement it differently? > > Best regards, > > Martin > From aph at redhat.com Fri Dec 4 16:14:03 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 4 Dec 2015 16:14:03 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449223186.15424.42.camel@mint> References: <1449223186.15424.42.camel@mint> Message-ID: <5661BBCB.5000307@redhat.com> Your fix looks OK. However, there is one other fix which would be nice. We use call relocs for things other than bl instructions. This is because some things (e.g. MachUEPNode::emit) do this: __ far_jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); Only bl immediate instructions are ever used to jump to trampolines. This is essential because they must be patchable. Because of this, in here: if (is_call()) { address trampoline = nativeCall_at(orig_addr)->get_trampoline(); if (trampoline) { return nativeCallTrampolineStub_at(trampoline)->destination(); } } the is_call() could be replaced by NativeCall::is_call_at(). Otherwise we're pointlessly decoding instructions and chasing nonexistent trampolines. Could you try that? Thanks, Andrew. From roland.westrelin at oracle.com Fri Dec 4 16:40:00 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 4 Dec 2015 17:40:00 +0100 Subject: RFR(S): 8134883: C1 hard crash in range check elimination in Nashorn test262parallel Message-ID: <02C0AC97-22B8-415F-93D2-6F5D73036493@oracle.com> http://cr.openjdk.java.net/~roland/8134883/webrev.00/ The problem happens when c1?s loop optimizations encounter a non natural loop for which one entry is an exception handler. C1?s loop optimizations usually ignore non natural loop but in that case, it misses that there?s a non natural loop, half build the loop data structures (marks the loop header but doesn?t keep track of the blocks that belong to the loop). AFAICT, C1 also bails out when it encounters a natural loop with an exception handler as header. The fix I propose simply doesn?t mark the loop header and loop end for any loop with an exception handler header so the loop is ignored by loop optimizations. Roland. From aph at redhat.com Fri Dec 4 17:38:19 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 4 Dec 2015 17:38:19 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661BBCB.5000307@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> Message-ID: <5661CF8B.6040405@redhat.com> On 12/04/2015 04:14 PM, Andrew Haley wrote: > Your fix looks OK. Scratch that, I'm seeing NetBeans failures with your patch. I think it's because you're missing a trampoline destination when the initial relocation is being done. This is because get_trampoline() looks for a trampoline_stub reloc based on orig_addr, and this can never work. (When a trampoline call is first created it is a call to self; the reloc is the only way to find the trampoline. For this reason, you must use nativeCall_at(addr())->get_trampoline().) I'm going to suggest this as a simpler fix: address Relocation::pd_call_destination(address orig_addr) { assert(is_call(), "should be a call here"); if (NativeCall::is_call_at(addr())) { // is a BL instruction address trampoline = nativeCall_at(addr())->get_trampoline(); if (trampoline) { return nativeCallTrampolineStub_at(trampoline)->destination(); } } if (orig_addr != NULL) { return MacroAssembler::pd_call_destination(orig_addr); } return MacroAssembler::pd_call_destination(addr()); } I think it's right because this way we only follow real BL instructions, and if these point to trampolines they must be within the blob which is being relocated. I think this will fix your problem because such BL instructions cannot point to anywhere wild. Thanks, Andrew. From edward.nevill at gmail.com Fri Dec 4 17:43:37 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 04 Dec 2015 17:43:37 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661BBCB.5000307@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> Message-ID: <1449251017.4670.3.camel@mint> On Fri, 2015-12-04 at 16:14 +0000, Andrew Haley wrote: > Your fix looks OK. > > However, there is one other fix which would be nice. > if (is_call()) { > address trampoline = nativeCall_at(orig_addr)->get_trampoline(); > if (trampoline) { > return nativeCallTrampolineStub_at(trampoline)->destination(); > } > } > > the is_call() could be replaced by NativeCall::is_call_at(). > Otherwise we're pointlessly decoding instructions and chasing > nonexistent trampolines. Could you try that? Done. New webrev at http://cr.openjdk.java.net/~enevill/8144498/webrev.1 jtreg results with ReservedCodeCacheSize=256m Hotspot (original): Test results: passed: 865; failed: 19; error: 85 Hotspot (patched): Test results: passed: 947; failed: 10; error: 12 Langtools (original): Test results: passed: 3,049; failed: 77; error: 223 Hotspot (patched): Test results: passed: 3,316; failed: 33 Many thanks, Ed. From igor.veresov at oracle.com Fri Dec 4 18:23:01 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 4 Dec 2015 10:23:01 -0800 Subject: RFR(S): 8134883: C1 hard crash in range check elimination in Nashorn test262parallel In-Reply-To: <02C0AC97-22B8-415F-93D2-6F5D73036493@oracle.com> References: <02C0AC97-22B8-415F-93D2-6F5D73036493@oracle.com> Message-ID: <6A256077-5431-4842-8EA5-E1AA6582E5F2@oracle.com> Looks good. igor > On Dec 4, 2015, at 8:40 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8134883/webrev.00/ > > The problem happens when c1?s loop optimizations encounter a non natural loop for which one entry is an exception handler. C1?s loop optimizations usually ignore non natural loop but in that case, it misses that there?s a non natural loop, half build the loop data structures (marks the loop header but doesn?t keep track of the blocks that belong to the loop). AFAICT, C1 also bails out when it encounters a natural loop with an exception handler as header. The fix I propose simply doesn?t mark the loop header and loop end for any loop with an exception handler header so the loop is ignored by loop optimizations. > > Roland. From vivek.r.deshpande at intel.com Fri Dec 4 19:26:15 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Fri, 4 Dec 2015 19:26:15 +0000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> Hi Vladimir We have updated the webrev at the jbs entry with the global flag. This is the link for your review. http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ Regards Vivek -----Original Message----- From: Deshpande, Vivek R Sent: Wednesday, December 02, 2015 11:21 AM To: 'Vladimir Kozlov'; hotspot compiler Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' Subject: RE: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 Hi Vladimir Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). We will update the patch and jbs entry with global flag and let you know soon. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, December 01, 2015 6:02 PM To: Deshpande, Vivek R; hotspot compiler Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. If that is the case the flag should be global. Thanks, Vladimir On 12/1/15 5:48 PM, Vladimir Kozlov wrote: > This seems fine. 2x is for AVX implementation? > > Thanks, > Vladimir > > On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >> Hi all >> >> We would like to contribute a patch from Intel which optimizes >> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >> architecture using AVX instructions. >> >> The improvement gives more than 2x gain over Unsafe implementation >> for long arrays. >> >> >> The bug is blocked by bug: vectorized support for array >> equals/compare/mismatch using Unsafe >> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8143355 >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >> >> Thanks and regards, >> >> Vivek >> From mikael.vidstedt at oracle.com Fri Dec 4 20:23:01 2015 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Fri, 4 Dec 2015 12:23:01 -0800 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files Message-ID: <5661F625.9000204@oracle.com> Please review this change which moves a large-ish number of function definitions/bodies from assembler_sparc.hpp and macroAssembler_sparc.hpp to the corresponding assembler_sparc.inline.hpp and macroAssembler_sparc.inline.hpp files. Bug: https://bugs.openjdk.java.net/browse/JDK-8144748 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.00/webrev/ * Background The specific problem which triggered this change was the following pattern in assembler_sparc.hpp: class Assembler : ... { ... inline void emit_int32(int); inline void emit_data(int x) { emit_data(x); } ... } If assembler_sparc.hpp is ever included directly without including assembler_sparc.inline.hpp this will lead to a use without definition, since emit_int32 is only defined in assembler_sparc.inline.hpp. The same pattern is true for almost all of the inline functions in assembler_sparc/macroAssembler_sparc. In general, inline functions (apart from trivial ones) should be defined in the inline.hpp file and any .cpp file actually making use of them should include the inline.hpp file instead. In this specific case, for whatever reason, it seems to be working well with Solaris Studio, but GCC is generating an error. * About the change The change here is very mechanical: for every relevant function F: * add "inline" keyword if needed * copy function to inline.hpp - trying to place it in the Right Place(tm) * add class prefix (Assembler:: or MacroAssembler::) * remove potential default parameter values * remove function body from .hpp file In a few cases I took the liberty of updating the indentation where it seemed off. Btw, does anybody know why ret & retl are only inlined in PRODUCT builds? Cheers, Mikael From vladimir.kozlov at oracle.com Fri Dec 4 20:41:32 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 12:41:32 -0800 Subject: RFR (XS): 8144657: Invalid format specifiers in jvmci trace messages In-Reply-To: <5660CF29.3060703@oracle.com> References: <5660CF29.3060703@oracle.com> Message-ID: <5661FA7C.5020204@oracle.com> Good. Thanks, Vladimir On 12/3/15 3:24 PM, Mikael Vidstedt wrote: > > Please review this simple change which updates the format specifiers > used in a couple of JVMCI_trace_3 calls: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8144657 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8144657/webrev.00/webrev/ > > narrowOop is a juint, so %p doesn't match, and for the rest of the cases > PTR_FORMAT+p2i is more in line with the "standard" pattern for printing > pointers. > > Cheers, > Mikael > From vladimir.kozlov at oracle.com Fri Dec 4 21:02:41 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 13:02:41 -0800 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <5661F625.9000204@oracle.com> References: <5661F625.9000204@oracle.com> Message-ID: <5661FF71.2080904@oracle.com> > Btw, does anybody know why ret & retl are only inlined in PRODUCT builds? TraceJumps is develop flag which is const 'false' in product (and in optimized too). As result the code of ret() is only one instruction. I think they put code in .cpp because wanted to set breakpoints in these methods in debug VM but debug VM does not inline anyway. I think it should be only in .hpp. Please, remove ifdef and code's copy in .cpp. Also use code stile with {} and separate lines for: + inline void MacroAssembler::mov( Register s, Register d) { + if ( s != d ) or3( G0, s, d); + else assert_not_delayed(); // Put something useful in the delay slot! + } + + inline void MacroAssembler::mov_or_nop( Register s, Register d) { + if ( s != d ) or3( G0, s, d); + else nop(); + } Thanks, Vladimir On 12/4/15 12:23 PM, Mikael Vidstedt wrote: > > Please review this change which moves a large-ish number of function > definitions/bodies from assembler_sparc.hpp and macroAssembler_sparc.hpp > to the corresponding assembler_sparc.inline.hpp and > macroAssembler_sparc.inline.hpp files. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8144748 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.00/webrev/ > > > * Background > > The specific problem which triggered this change was the following > pattern in assembler_sparc.hpp: > > class Assembler : ... { > ... > inline void emit_int32(int); > inline void emit_data(int x) { emit_data(x); } > ... > } > > If assembler_sparc.hpp is ever included directly without including > assembler_sparc.inline.hpp this will lead to a use without definition, > since emit_int32 is only defined in assembler_sparc.inline.hpp. The same > pattern is true for almost all of the inline functions in > assembler_sparc/macroAssembler_sparc. > > In general, inline functions (apart from trivial ones) should be defined > in the inline.hpp file and any .cpp file actually making use of them > should include the inline.hpp file instead. In this specific case, for > whatever reason, it seems to be working well with Solaris Studio, but > GCC is generating an error. > > > * About the change > > The change here is very mechanical: > > for every relevant function F: > * add "inline" keyword if needed > * copy function to inline.hpp - trying to place it in the Right Place(tm) > * add class prefix (Assembler:: or MacroAssembler::) > * remove potential default parameter values > * remove function body from .hpp file > > In a few cases I took the liberty of updating the indentation where it > seemed off. > > Btw, does anybody know why ret & retl are only inlined in PRODUCT builds? > > Cheers, > Mikael > From tom.rodriguez at oracle.com Fri Dec 4 21:02:26 2015 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 4 Dec 2015 13:02:26 -0800 Subject: RFR (S): 8143571: [JVMCI] Double unregistering of nmethod during unloading Message-ID: http://cr.openjdk.java.net/~never/8143571/webrev.00/ This is a follow on fix for 8142436 which introduced some assertion failures when running with G1. They were benign in practice but it indicated that more careful updating of those field was required. In the end the idea is that we never clear the _installed_code field using the barrier logic except during the do_unloading methods. The other places we want to clear the installed code is when becoming unloaded or zombie and those are explicitly unregistered anyway. We explicitly clear _installed_code after the unregister for clarity instead. It also became clear that we needed to ensure that updates to the installed code were performed under a lock to make sure it had a consistent state. Since part of the updates were already being done under the Patching_lock because they were part of make_not_entrant_or_zombie, I used the Patching_lock in other places to protect updates. This has been in our graal repo with nightly fastdebug testing for the last few weeks without issue. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Fri Dec 4 21:09:59 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 4 Dec 2015 13:09:59 -0800 Subject: RFR (S): 8143571: [JVMCI] Double unregistering of nmethod during unloading In-Reply-To: References: Message-ID: <85707D7B-B60C-4015-9A92-80D6686F5AA2@oracle.com> Looks good! Thanks! igor > On Dec 4, 2015, at 1:02 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8143571/webrev.00/ > > This is a follow on fix for 8142436 which introduced some assertion failures when running with G1. They were benign in practice but it indicated that more careful updating of those field was required. In the end the idea is that we never clear the _installed_code field using the barrier logic except during the do_unloading methods. The other places we want to clear the installed code is when becoming unloaded or zombie and those are explicitly unregistered anyway. We explicitly clear _installed_code after the unregister for clarity instead. > > It also became clear that we needed to ensure that updates to the installed code were performed under a lock to make sure it had a consistent state. Since part of the updates were already being done under the Patching_lock because they were part of make_not_entrant_or_zombie, I used the Patching_lock in other places to protect updates. This has been in our graal repo with nightly fastdebug testing for the last few weeks without issue. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Dec 4 21:11:46 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 13:11:46 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> Message-ID: <56620192.5050808@oracle.com> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). + #ifdef COMPILER2 + #ifdef _LP64 + if (UseSSE42Intrinsics) { Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. Thanks, Vladimir On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > We have updated the webrev at the jbs entry with the global flag. > This is the link for your review. > http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ > > Regards > Vivek > -----Original Message----- > From: Deshpande, Vivek R > Sent: Wednesday, December 02, 2015 11:21 AM > To: 'Vladimir Kozlov'; hotspot compiler > Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' > Subject: RE: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 > > Hi Vladimir > > Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). > We will update the patch and jbs entry with global flag and let you know soon. > > Regards, > Vivek > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, December 01, 2015 6:02 PM > To: Deshpande, Vivek R; hotspot compiler > Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric > Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 > > 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. > > If that is the case the flag should be global. > > Thanks, > Vladimir > > On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >> This seems fine. 2x is for AVX implementation? >> >> Thanks, >> Vladimir >> >> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>> Hi all >>> >>> We would like to contribute a patch from Intel which optimizes >>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>> architecture using AVX instructions. >>> >>> The improvement gives more than 2x gain over Unsafe implementation >>> for long arrays. >>> >>> >>> The bug is blocked by bug: vectorized support for array >>> equals/compare/mismatch using Unsafe >>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>> webrev: >>> >>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>> >>> Thanks and regards, >>> >>> Vivek >>> From vladimir.kozlov at oracle.com Fri Dec 4 21:26:34 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 13:26:34 -0800 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228683A@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> <5660B13E.7080509@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116567228683A@DEWDFEMB19C.global.corp.sap> Message-ID: <5662050A.5060609@oracle.com> This looks good. I will push it through JPRT. Thanks, Vladimir On 12/4/15 8:01 AM, Doerr, Martin wrote: > Please use this one: > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.02/ > It prevents a build warning on windows. I have added an explicit type cast. > > Best regards, > Martin > > -----Original Message----- > From: Doerr, Martin > Sent: Freitag, 4. Dezember 2015 15:03 > To: 'Vladimir Kozlov' ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: RE: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion > > Hi Vladimir, > > thank you very much for your review. > > I have changed the Node_Stack initialization as you suggested. > > In addition, I changed the iterator to really support removal of nodes: > uint idx = MIN2(_stack.index(), self->outcnt()); // Support removal of nodes. > I believe it's currently not needed in openjdk, but it may be needed in the future. > Hope this is ok. > > The new webrev Is here: > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.01/ > > Can anybody volunteer to sponsor, please? > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 3. Dezember 2015 22:17 > To: Doerr, Martin ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: Re: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion > > You reversed 8011858 changes which made stack much smaller - live_nodes was usually 1/10 of unique nodes: > > - stack.map((C->live_nodes() >> 1) + 16, NULL); > + Node_Stack stack(arena, (C->unique() >> 2) + 16); // pre-grow > > Please, use live_nodes with your >> 2 change: > > Node_Stack stack(arena, (C->live_nodes() >> 2) + 16); // pre-grow > > Iterator changes seems fine to me. > > Thanks, > Vladimir > > On 12/3/15 9:17 AM, Doerr, Martin wrote: >> Hi, >> >> I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix >> the performance problem we observe in Octane benchmarks. >> >> It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. >> >> The webrev is here: >> >> http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ >> >> Please review. >> >> The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. >> >> My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn't observe resizing because it was too small. >> >> I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. >> >> How was that previous value determined? Should I implement it differently? >> >> Best regards, >> >> Martin >> From kishor.kharbas at intel.com Fri Dec 4 21:40:03 2015 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 4 Dec 2015 21:40:03 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <565E4A28.5010008@oracle.com> References: <565E4A28.5010008@oracle.com> Message-ID: Thanks Vladimir for the feedback! I have updated the jbs entry with the new patch. JDK changes : added range checks in the JDK using additional methods. Hotspot changes : renamed the UseCTRAESIntrinsics flag to UseAESCTRIntrinsics Further review and feedback is appreciated! - Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, December 01, 2015 5:32 PM To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES Hotspot changes seems fine. But JDK changes should have additional method for range checks - this is new requirement for intrinsics which access arrays. See, for example, cryptBlockCheck() in AESCrypt.java. Thanks, Vladimir On 11/24/15 2:33 PM, Kharbas, Kishor wrote: > Hello all, > > I request the community to review a patch for enhancing > CounterMode.crypt() for AES. This patch defines intrinsic for > CounterMode.crypt() to leverage the parallel nature of AES in Counter > (CTR) Mode. > > This is achieved by operating on 6 blocks in parallel to issue > independent x86 AES-NI instructions and keep the CPU pipeline full. > > Testing on micro-benchmark has shown a speedup of 4x-6x. > > Bug id: > > https://bugs.openjdk.java.net/browse/JDK-8143925 > > Webrev: > > hotspot: > http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.02/ > > jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.01/ > > Much appreciated! > > Kishor Kharbas > From mikael.vidstedt at oracle.com Fri Dec 4 22:27:53 2015 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Fri, 4 Dec 2015 14:27:53 -0800 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <5661FF71.2080904@oracle.com> References: <5661F625.9000204@oracle.com> <5661FF71.2080904@oracle.com> Message-ID: <56621369.4070809@oracle.com> New webrev: http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.01/webrev/ Changes since webrev.00: * ret/retl implementation is now always inline (no #ifdefs) and defined in macroAssembler_sparc.inline.hpp, removed from macroAssembler_sparc.cpp * Formatting of mov and mov_or_nop updated Cheers, Mikael On 2015-12-04 13:02, Vladimir Kozlov wrote: > > Btw, does anybody know why ret & retl are only inlined in PRODUCT > builds? > > TraceJumps is develop flag which is const 'false' in product (and in > optimized too). As result the code of ret() is only one instruction. > > I think they put code in .cpp because wanted to set breakpoints in > these methods in debug VM but debug VM does not inline anyway. I think > it should be only in .hpp. Please, remove ifdef and code's copy in .cpp. > > Also use code stile with {} and separate lines for: > > + inline void MacroAssembler::mov( Register s, Register d) { > + if ( s != d ) or3( G0, s, d); > + else assert_not_delayed(); // Put something useful in > the delay slot! > + } > + > + inline void MacroAssembler::mov_or_nop( Register s, Register d) { > + if ( s != d ) or3( G0, s, d); > + else nop(); > + } > > Thanks, > Vladimir > > On 12/4/15 12:23 PM, Mikael Vidstedt wrote: >> >> Please review this change which moves a large-ish number of function >> definitions/bodies from assembler_sparc.hpp and macroAssembler_sparc.hpp >> to the corresponding assembler_sparc.inline.hpp and >> macroAssembler_sparc.inline.hpp files. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144748 >> Webrev: >> http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.00/webrev/ >> >> >> * Background >> >> The specific problem which triggered this change was the following >> pattern in assembler_sparc.hpp: >> >> class Assembler : ... { >> ... >> inline void emit_int32(int); >> inline void emit_data(int x) { emit_data(x); } >> ... >> } >> >> If assembler_sparc.hpp is ever included directly without including >> assembler_sparc.inline.hpp this will lead to a use without definition, >> since emit_int32 is only defined in assembler_sparc.inline.hpp. The same >> pattern is true for almost all of the inline functions in >> assembler_sparc/macroAssembler_sparc. >> >> In general, inline functions (apart from trivial ones) should be defined >> in the inline.hpp file and any .cpp file actually making use of them >> should include the inline.hpp file instead. In this specific case, for >> whatever reason, it seems to be working well with Solaris Studio, but >> GCC is generating an error. >> >> >> * About the change >> >> The change here is very mechanical: >> >> for every relevant function F: >> * add "inline" keyword if needed >> * copy function to inline.hpp - trying to place it in the Right >> Place(tm) >> * add class prefix (Assembler:: or MacroAssembler::) >> * remove potential default parameter values >> * remove function body from .hpp file >> >> In a few cases I took the liberty of updating the indentation where it >> seemed off. >> >> Btw, does anybody know why ret & retl are only inlined in PRODUCT >> builds? >> >> Cheers, >> Mikael >> From vladimir.kozlov at oracle.com Fri Dec 4 22:44:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 14:44:26 -0800 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <56621369.4070809@oracle.com> References: <5661F625.9000204@oracle.com> <5661FF71.2080904@oracle.com> <56621369.4070809@oracle.com> Message-ID: <5662174A.6080108@oracle.com> Good. Thanks, Vladimir On 12/4/15 2:27 PM, Mikael Vidstedt wrote: > > New webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.01/webrev/ > > Changes since webrev.00: > > * ret/retl implementation is now always inline (no #ifdefs) and defined > in macroAssembler_sparc.inline.hpp, removed from macroAssembler_sparc.cpp > * Formatting of mov and mov_or_nop updated > > Cheers, > Mikael > > On 2015-12-04 13:02, Vladimir Kozlov wrote: >> > Btw, does anybody know why ret & retl are only inlined in PRODUCT >> builds? >> >> TraceJumps is develop flag which is const 'false' in product (and in >> optimized too). As result the code of ret() is only one instruction. >> >> I think they put code in .cpp because wanted to set breakpoints in >> these methods in debug VM but debug VM does not inline anyway. I think >> it should be only in .hpp. Please, remove ifdef and code's copy in .cpp. >> >> Also use code stile with {} and separate lines for: >> >> + inline void MacroAssembler::mov( Register s, Register d) { >> + if ( s != d ) or3( G0, s, d); >> + else assert_not_delayed(); // Put something useful in >> the delay slot! >> + } >> + >> + inline void MacroAssembler::mov_or_nop( Register s, Register d) { >> + if ( s != d ) or3( G0, s, d); >> + else nop(); >> + } >> >> Thanks, >> Vladimir >> >> On 12/4/15 12:23 PM, Mikael Vidstedt wrote: >>> >>> Please review this change which moves a large-ish number of function >>> definitions/bodies from assembler_sparc.hpp and macroAssembler_sparc.hpp >>> to the corresponding assembler_sparc.inline.hpp and >>> macroAssembler_sparc.inline.hpp files. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144748 >>> Webrev: >>> http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.00/webrev/ >>> >>> >>> * Background >>> >>> The specific problem which triggered this change was the following >>> pattern in assembler_sparc.hpp: >>> >>> class Assembler : ... { >>> ... >>> inline void emit_int32(int); >>> inline void emit_data(int x) { emit_data(x); } >>> ... >>> } >>> >>> If assembler_sparc.hpp is ever included directly without including >>> assembler_sparc.inline.hpp this will lead to a use without definition, >>> since emit_int32 is only defined in assembler_sparc.inline.hpp. The same >>> pattern is true for almost all of the inline functions in >>> assembler_sparc/macroAssembler_sparc. >>> >>> In general, inline functions (apart from trivial ones) should be defined >>> in the inline.hpp file and any .cpp file actually making use of them >>> should include the inline.hpp file instead. In this specific case, for >>> whatever reason, it seems to be working well with Solaris Studio, but >>> GCC is generating an error. >>> >>> >>> * About the change >>> >>> The change here is very mechanical: >>> >>> for every relevant function F: >>> * add "inline" keyword if needed >>> * copy function to inline.hpp - trying to place it in the Right >>> Place(tm) >>> * add class prefix (Assembler:: or MacroAssembler::) >>> * remove potential default parameter values >>> * remove function body from .hpp file >>> >>> In a few cases I took the liberty of updating the indentation where it >>> seemed off. >>> >>> Btw, does anybody know why ret & retl are only inlined in PRODUCT >>> builds? >>> >>> Cheers, >>> Mikael >>> > From coleen.phillimore at oracle.com Fri Dec 4 22:48:52 2015 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Fri, 4 Dec 2015 17:48:52 -0500 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <56621369.4070809@oracle.com> References: <5661F625.9000204@oracle.com> <5661FF71.2080904@oracle.com> <56621369.4070809@oracle.com> Message-ID: <56621854.8050004@oracle.com> Seems good to me too. Coleen On 12/4/15 5:27 PM, Mikael Vidstedt wrote: > > New webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.01/webrev/ > > Changes since webrev.00: > > * ret/retl implementation is now always inline (no #ifdefs) and > defined in macroAssembler_sparc.inline.hpp, removed from > macroAssembler_sparc.cpp > * Formatting of mov and mov_or_nop updated > > Cheers, > Mikael > > On 2015-12-04 13:02, Vladimir Kozlov wrote: >> > Btw, does anybody know why ret & retl are only inlined in PRODUCT >> builds? >> >> TraceJumps is develop flag which is const 'false' in product (and in >> optimized too). As result the code of ret() is only one instruction. >> >> I think they put code in .cpp because wanted to set breakpoints in >> these methods in debug VM but debug VM does not inline anyway. I >> think it should be only in .hpp. Please, remove ifdef and code's copy >> in .cpp. >> >> Also use code stile with {} and separate lines for: >> >> + inline void MacroAssembler::mov( Register s, Register d) { >> + if ( s != d ) or3( G0, s, d); >> + else assert_not_delayed(); // Put something useful >> in the delay slot! >> + } >> + >> + inline void MacroAssembler::mov_or_nop( Register s, Register d) { >> + if ( s != d ) or3( G0, s, d); >> + else nop(); >> + } >> >> Thanks, >> Vladimir >> >> On 12/4/15 12:23 PM, Mikael Vidstedt wrote: >>> >>> Please review this change which moves a large-ish number of function >>> definitions/bodies from assembler_sparc.hpp and >>> macroAssembler_sparc.hpp >>> to the corresponding assembler_sparc.inline.hpp and >>> macroAssembler_sparc.inline.hpp files. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144748 >>> Webrev: >>> http://cr.openjdk.java.net/~mikael/webrevs/8144748/webrev.00/webrev/ >>> >>> >>> * Background >>> >>> The specific problem which triggered this change was the following >>> pattern in assembler_sparc.hpp: >>> >>> class Assembler : ... { >>> ... >>> inline void emit_int32(int); >>> inline void emit_data(int x) { emit_data(x); } >>> ... >>> } >>> >>> If assembler_sparc.hpp is ever included directly without including >>> assembler_sparc.inline.hpp this will lead to a use without definition, >>> since emit_int32 is only defined in assembler_sparc.inline.hpp. The >>> same >>> pattern is true for almost all of the inline functions in >>> assembler_sparc/macroAssembler_sparc. >>> >>> In general, inline functions (apart from trivial ones) should be >>> defined >>> in the inline.hpp file and any .cpp file actually making use of them >>> should include the inline.hpp file instead. In this specific case, for >>> whatever reason, it seems to be working well with Solaris Studio, but >>> GCC is generating an error. >>> >>> >>> * About the change >>> >>> The change here is very mechanical: >>> >>> for every relevant function F: >>> * add "inline" keyword if needed >>> * copy function to inline.hpp - trying to place it in the Right >>> Place(tm) >>> * add class prefix (Assembler:: or MacroAssembler::) >>> * remove potential default parameter values >>> * remove function body from .hpp file >>> >>> In a few cases I took the liberty of updating the indentation where it >>> seemed off. >>> >>> Btw, does anybody know why ret & retl are only inlined in PRODUCT >>> builds? >>> >>> Cheers, >>> Mikael >>> > From vladimir.kozlov at oracle.com Fri Dec 4 23:58:37 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 15:58:37 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> Message-ID: <566228AD.6060704@oracle.com> jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.02/ JDK changes looks good to me. hotspot: http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.04/ Please, set flag to 'false' on platforms which does not support this intrinsic: if (UseAESCTRIntrinsics) { warning("AES/CTR intrinsics are not available on this CPU"); FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); } Also Anthony asked to add test for this intrinsic. Please do it: "2) It would be good to add CTR to the TestAES tests. It's in hotspot/test/compiler/codegen/7184394/. The test currently has CBC, ECB, and GCM in it, so it should be easy. It's also the only test I know of that tests the intrinsic. None of the tests in the jdk repo that I know of loop enough to trigger the intrinsic." Thanks, Vladimir On 12/4/15 1:40 PM, Kharbas, Kishor wrote: > Thanks Vladimir for the feedback! > > I have updated the jbs entry with the new patch. > > JDK changes : added range checks in the JDK using additional methods. > Hotspot changes : renamed the UseCTRAESIntrinsics flag to UseAESCTRIntrinsics > > Further review and feedback is appreciated! > > - Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, December 01, 2015 5:32 PM > To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES > > Hotspot changes seems fine. But JDK changes should have additional method for range checks - this is new requirement for intrinsics which access arrays. See, for example, cryptBlockCheck() in AESCrypt.java. > > Thanks, > Vladimir > > On 11/24/15 2:33 PM, Kharbas, Kishor wrote: >> Hello all, >> >> I request the community to review a patch for enhancing >> CounterMode.crypt() for AES. This patch defines intrinsic for >> CounterMode.crypt() to leverage the parallel nature of AES in Counter >> (CTR) Mode. >> >> This is achieved by operating on 6 blocks in parallel to issue >> independent x86 AES-NI instructions and keep the CPU pipeline full. >> >> Testing on micro-benchmark has shown a speedup of 4x-6x. >> >> Bug id: >> >> https://bugs.openjdk.java.net/browse/JDK-8143925 >> >> Webrev: >> >> hotspot: >> http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.02/ >> >> jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.01/ >> >> Much appreciated! >> >> Kishor Kharbas >> From mandy.chung at oracle.com Sat Dec 5 00:05:42 2015 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 4 Dec 2015 16:05:42 -0800 Subject: Reference.reachabilityFence In-Reply-To: <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> Message-ID: <5C9099DC-0B11-4765-9AA4-6D81C2CCD682@oracle.com> > On Dec 4, 2015, at 5:47 AM, Paul Sandoz wrote: > >> >> On 3 Dec 2015, at 22:33, Mandy Chung wrote: >> >> >>> On Nov 26, 2015, at 8:22 AM, Paul Sandoz wrote: >>> >>> Hi, >>> >>> I have updated the patches: >>> >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-hotspot/webrev/ >>> >>> There is now more documentation on Reference (copied and suitable rearranged from 166 Fences.java). The method name remains the same. >>> >> >> I think the addition to the Reference class specification should belong to the reachabilityFence method specification. Any reason why not? > > I thought it would be more visible in the JavaDoc, as it?s there upfront. The api note may get larger if we include some additional real world examples. I don?t have a strong opinion on this, if yours is stronger i will move it :-) > Reference is the best class among other choices to place this reachabilityFence method and not directly tied with the Reference class spec. That?s how I read it. I I think no issue for a method spec contains a long api note. I prefer it to move to the method spec. > > >> Should the reachabilityFence method throw NPE if ref is null? >> > > I am ok with it doing nothing, it?s a performance sensitive method. It means no null checks/de-opts are required and (hand-waving here...) might make it more amenable to optimization see https://bugs.openjdk.java.net/browse/JDK-8130398). > I later also thought performance sensitivity that explain the reason to accept null. It?s fine with me. > > Updated: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ > > reachabilityFence is now annotated with @DontInline (to be pushed real soon now) and the HotSpot changes are no longer needed. Looks fine with me. No need to see any new webrev. Mandy From christian.thalinger at oracle.com Sat Dec 5 00:09:42 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 4 Dec 2015 14:09:42 -1000 Subject: RFR (S): 8143571: [JVMCI] Double unregistering of nmethod during unloading In-Reply-To: References: Message-ID: Yes, seems good. > On Dec 4, 2015, at 11:02 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8143571/webrev.00/ > > This is a follow on fix for 8142436 which introduced some assertion failures when running with G1. They were benign in practice but it indicated that more careful updating of those field was required. In the end the idea is that we never clear the _installed_code field using the barrier logic except during the do_unloading methods. The other places we want to clear the installed code is when becoming unloaded or zombie and those are explicitly unregistered anyway. We explicitly clear _installed_code after the unregister for clarity instead. > > It also became clear that we needed to ensure that updates to the installed code were performed under a lock to make sure it had a consistent state. Since part of the updates were already being done under the Patching_lock because they were part of make_not_entrant_or_zombie, I used the Patching_lock in other places to protect updates. This has been in our graal repo with nightly fastdebug testing for the last few weeks without issue. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe.darcy at oracle.com Sat Dec 5 00:50:14 2015 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Fri, 04 Dec 2015 16:50:14 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> Message-ID: <566234C6.8010806@oracle.com> Hi Vivek, On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: > Hi > > Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. > Let me know your thoughts. As a rough test of another sin/cos implementation, StrictMath.{sin, cos} can be used a reference with the following caveat: there isn't an indication of which why the error is in a StrictMath result. Let me given an example, if StrictMath.sin(x) => y then one of the following should be true Math.sin(x) => y Math.sin(x) => Math.nextUp(y) Math.sin(x) => Math.nextDown(y) That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR equal to one of the floating-point numbers adjacent to that result. Of these three options, only two area allowed by the accuracy requirements of the StrictMath.sin specification. However, since StrictMath.sin doesn't give an indication of which way its error went (if it rounded up or down), there is no indication without additional work which of nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). HTH, -Joe > > Regards, > Vivek > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Thursday, December 03, 2015 1:29 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Hello, > > On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >> Vivek, >> >> I think Joe is asking you to write these tests as hotspot regression >> test in hotspot/test/compiler. > Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. > > Thanks, > > -Joe > >> Vladimir >> >> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>> Hi Joe >>> >>> It would be great if you would please share the additional tests with >>> us. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Thursday, December 03, 2015 1:17 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> I think it is unwise for this large of an implementation change to be >>> pushed with no tests targeting the specifics of the new implementation. >>> >>> The worst-case tests in the jdk repo are the mathematical worst cases >>> for floating-point approximations, in other words the cases were the >>> exact mathematical answer is closes to half-way between two >>> representation floating-point numbers. Passing such tests is >>> necessary but not sufficient condition for a new implementation. >>> >>> Chers, >>> >>> -Joe >>> >>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>> Okay, looks reasonable to me. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> This is the link for the updated webrev with latest hotspot source >>>>> as base for your review. >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: Deshpande, Vivek R >>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>> To: 'Vladimir Kozlov'; joe darcy >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Hi Vladimir >>>>> >>>>> This is the link for the updated webrev for your review. >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>> To: Deshpande, Vivek R; joe darcy >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Please send link to new webrev on cr server. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> Please find the webrev with your suggested updates attached with >>>>>> the mail. >>>>>> We will update it in the jbs entry soon. >>>>>> Please let me know if it needs further changes. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: Deshpande, Vivek R >>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> HI Vladimir, Joe >>>>>> >>>>>> I have done the jtreg tests in hotspot and tests from jdk you have >>>>>> mentioned. It passed those tests. >>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>> >>>>>> Could I get those tests around the boundary values. Would >>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>> If yes, then it has passed those boundary cases. >>>>>> >>>>>> I would work on adding either diagnostic flag or just one flag for >>>>>> libm and send out the webrev soon. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Hello, >>>>>> >>>>>> Just getting added to the thread.. >>>>>> >>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>> Thank you, for explanation, Vivek. >>>>>>> >>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>> Hotspot tests. >>>>>>> >>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>> switch between FDLIBM and LIBM. >>>>>>>> >>>>>>>> Quick explanation: >>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>> should be = 0.19457293629570216 >>>>>>>> (4596178249117717084L) (StrictMath result) or 0.1945729362957022 >>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>> library result is between the above two values and Exact result >>>>>>>> would be pretty close to it. >>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>> StrictMath >>>>>>>> - 1ulp, according to our test. >>>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>>> I >>>>>>> think) and it should be consistent for Interpreter and code >>>>>>> generated by JIT compilers: >>>>>>> >>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin% >>>>>>> 28 >>>>>>> do >>>>>>> u >>>>>>> ble%29 >>>>>>> >>>>>> That interpretation of the spec is not quite right. For the Math >>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>> closest to the exact result must be returned. For the methods with >>>>>> a >>>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>>> the true result can be returned, subject to the monotonicity >>>>>> constraints of the specification of the particular method. >>>>>> >>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>> I was thinking about using existing >>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>>> versions of functions which accept intrinsic ID instead of >>>>>>> methodHandle. >>>>>>> >>>>>>> If you still want to use flags make them diagnostic. >>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>> >>>>>>>> Also the performance gain ~4x is with >>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>> LIBM code and compilers use FDLIB? >>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>> >>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>> Port fdlibm to Java), which is providing a significant speed boost >>>>>> to the StrictMath methods that have been ported. >>>>>> >>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>> testing. >>>>>> For example, part of patch says >>>>>> >>>>>> # For sin >>>>>> >>>>>> +// This means that the main path is actually only taken for >>>>>> +// 2^-252 <= |X| < 90112. >>>>>> >>>>>> # For cos >>>>>> >>>>>> +// This means that the main path is actually only taken for >>>>>> +// 2^-252 <= |X| < 90112. >>>>>> >>>>>> If nothing else, there are no tests at around those boundary >>>>>> values, which is unacceptable. There should also be some tests of >>>>>> values of interest to the algorithm in question. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> Let me know your thoughts on this. I would answer more questions >>>>>>>> and give more data if needed. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>> log() changes did not have flags. >>>>>>>>> >>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>> >>>>>>>>> -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dexp >>>>>>>> Hi Vivek, >>>>>>>> >>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>> file bugs and fixed them after FC. >>>>>>>> >>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>> the only thing holding it from push. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>> Hi all >>>>>>>>>> >>>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>>> and >>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>> implementation. >>>>>>>>>> >>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>> >>>>>>>>>> The option to use the optimizations are >>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> >>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>> >>>>>>>>>> Bug-id: >>>>>>>>>> >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>> webrev: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>> >>>>>>>>>> Thanks and regards, >>>>>>>>>> >>>>>>>>>> Vivek >>>>>>>>>> From vladimir.kozlov at oracle.com Sat Dec 5 01:30:50 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 4 Dec 2015 17:30:50 -0800 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: Message-ID: <56623E4A.9040504@oracle.com> Thank you, Roland, for sending performance numbers. They are good - no regression. I was confused by naming first and by placement of code. And mixing code for casts and code in gcm. In reality the cast code tries to find only "immediate"/near dominating cast. So why (i >= 100) and not, lets say, >= 10? I am not sure that code should be in Phase classes. It is only work for cast nodes. I think it should be in ConstraintCast. The only thing you use is to set linear_only parameter. I know that in Identity we don't have a way to find is it GVN or IGVN. It is old problem we should fix. For example instead of passing can_reshape to Ideal it could be field in PhaseTransform set by constructor. Then you could access it from Identity and Ideal. An other thing is CheckCastPP. It should be subclass of ConstraintCastNode I think. Then make_cast() and dominating_cast() could be methods of ConstraintCastNode. Gcm changes are fine to me. I did not get how final_graph_reshaping change related to these changes. We try to assign control to memory which is dominated by current CastPP with control. So if you have chain of CastPP with control you will assign control of first (most dominating) and not the last which is near mem node. I don't think it is correct. I am worry a little about irreducible loops which have several entries. So you may get into a trouble by checking ordinary loop in IfNode::up_one_dom(). May be only counted loops? Or check the presence of irreducible loops. Remove -XX:+UseSerialGC flag from test. Otherwise we will get error when testing will try to use other GC. Thanks, Vladimir On 12/4/15 5:08 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8139771/webrev.00/ > > In PhiNode::Ideal(), when all uncasted inputs are identical, the Phi is removed which can cause a control dependency to be lost. See the test case (which includes a step by step description of how this can lead to a bad graph) for an example. The test case crashes on sparc with -XX:+StressGCM. > > To fix that I propose that rather than simply replacing the Phi by its uncasted input, we replace it by a CastPP that is specially marked as carrying a dependency (similar to what we have for CastII). That fixes this issue and should protect us from other similar issues: > > - I moved the _carry_dependency from CastII to ConstraintCast and CheckCastPP so it applies to CastPP, CastII and CheckCastPP > - I added code to remove the casts that carry a dependency if there?s a dominating cast with identical inputs and a more restrictive type. I made is_dominator() a virtual method of PhaseTransform so we can have an implementation in GVN and use the same code during GVN and loop opts > - We can now have a chain of CastPP with a control so the code in final_graph_reshaping shouldn?t go follow through a CastPP only if its control is null > - I changed the code in PhaseCFG::schedule_pinned_nodes() to handle controls that are in the same block > > I performed extensive perf testing on x86 and sparc and found not statistically significant regressions. > > Roland. > From jan.civlin at intel.com Sat Dec 5 04:07:52 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Sat, 5 Dec 2015 04:07:52 +0000 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare Message-ID: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> We would like to contribute AVX3 patch for MacroAssembler::string_compare. This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). Contributors: MacroAssembler::string_compare - Jan Civlin. Rest of code, including all x86 AVX3 extensions - Michael Berg Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ From fei.yang0953 at yahoo.com Sun Dec 6 14:33:46 2015 From: fei.yang0953 at yahoo.com (felix yang) Date: Sun, 6 Dec 2015 14:33:46 +0000 (UTC) Subject: [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <56605447.9070103@redhat.com> References: <56605447.9070103@redhat.com> Message-ID: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587 Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK? Thanks. On Thursday, December 3, 2015 10:40 PM, Andrew Haley wrote: It would help everybody if you did "hg commit" with an appropriate changeset comment before generating the webrev. Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Dec 7 09:42:13 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Dec 2015 09:42:13 +0000 Subject: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion In-Reply-To: <5662050A.5060609@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB41811656722865FE@DEWDFEMB19C.global.corp.sap> <5660B13E.7080509@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116567228683A@DEWDFEMB19C.global.corp.sap> <5662050A.5060609@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286ACC@DEWDFEMB19C.global.corp.sap> Hi Vladimir, thank you very much for reviewing and sponsoring. The GCM is far away from the most compile time consuming functions, now. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 4. Dezember 2015 22:27 To: Doerr, Martin ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion This looks good. I will push it through JPRT. Thanks, Vladimir On 12/4/15 8:01 AM, Doerr, Martin wrote: > Please use this one: > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.02/ > It prevents a build warning on windows. I have added an explicit type cast. > > Best regards, > Martin > > -----Original Message----- > From: Doerr, Martin > Sent: Freitag, 4. Dezember 2015 15:03 > To: 'Vladimir Kozlov' ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: RE: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion > > Hi Vladimir, > > thank you very much for your review. > > I have changed the Node_Stack initialization as you suggested. > > In addition, I changed the iterator to really support removal of nodes: > uint idx = MIN2(_stack.index(), self->outcnt()); // Support removal of nodes. > I believe it's currently not needed in openjdk, but it may be needed in the future. > Hope this is ok. > > The new webrev Is here: > http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.01/ > > Can anybody volunteer to sponsor, please? > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 3. Dezember 2015 22:17 > To: Doerr, Martin ; Roland Westrelin (roland.westrelin at oracle.com) ; hotspot-compiler-dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: Re: RFR(M): 8136445: Performance issue with Nashorn and C2's global code motion > > You reversed 8011858 changes which made stack much smaller - live_nodes was usually 1/10 of unique nodes: > > - stack.map((C->live_nodes() >> 1) + 16, NULL); > + Node_Stack stack(arena, (C->unique() >> 2) + 16); // pre-grow > > Please, use live_nodes with your >> 2 change: > > Node_Stack stack(arena, (C->live_nodes() >> 2) + 16); // pre-grow > > Iterator changes seems fine to me. > > Thanks, > Vladimir > > On 12/3/15 9:17 AM, Doerr, Martin wrote: >> Hi, >> >> I have implemented a change which makes Node_Backward_Iterator more efficient for large graphs. The purpose is to fix >> the performance problem we observe in Octane benchmarks. >> >> It lowers compile time dramatically in case JvmtiExport::_can_access_local_variables is on. >> >> The webrev is here: >> >> http://cr.openjdk.java.net/~mdoerr/8136445_c2_gcm/webrev.00/ >> >> Please review. >> >> The previous version uses an initial node stack size of (C->unique() >> 1) + 16 which can become pretty large. >> >> My webrev changes it to (C->unique() >> 2) + 16 which is still large. I didn't observe resizing because it was too small. >> >> I guess the stack depth typically stays far below this value, but it may be ok to spend e.g. 0.5 MB in extreme cases. >> >> How was that previous value determined? Should I implement it differently? >> >> Best regards, >> >> Martin >> From aph at redhat.com Mon Dec 7 09:48:36 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 7 Dec 2015 09:48:36 +0000 Subject: [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> References: <56605447.9070103@redhat.com> <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> Message-ID: <566555F4.6090202@redhat.com> On 06/12/15 14:33, felix yang wrote: > Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 > Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK? No, the comment is not complete. Please make sure that you have Jcheck installed in your Mercurial. http://openjdk.java.net/projects/code-tools/jcheck/ Andrew. From roland.westrelin at oracle.com Mon Dec 7 09:56:55 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 7 Dec 2015 10:56:55 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: <56623E4A.9040504@oracle.com> References: <56623E4A.9040504@oracle.com> Message-ID: Hi Vladimir, Thanks for looking at this. > I was confused by naming first and by placement of code. > And mixing code for casts and code in gcm. In reality the cast code tries to find only "immediate"/near dominating cast. So why (i >= 100) and not, lets say, >= 10? It?s arbitrary so if you think 100 is too much, sure we can go with 10. > I am not sure that code should be in Phase classes. It is only work for cast nodes. I think it should be in ConstraintCast. The reason I?d like PhaseTransform:: is_dominator() is that for 2 other prototypes I?m working on, I also had 2 copies of the same logic, one that I want apply during IGVN and one that I want apply during loop opts. I?d like to be able to write that logic only once in a clean way and be able to call it both from IGVN and loop opts. > The only thing you use is to set linear_only parameter. I know that in Identity we don't have a way to find is it GVN or IGVN. It is old problem we should fix. For example instead of passing can_reshape to Ideal it could be field in PhaseTransform set by constructor. Then you could access it from Identity and Ideal. > > An other thing is CheckCastPP. It should be subclass of ConstraintCastNode I think. Then make_cast() and dominating_cast() could be methods of ConstraintCastNode. Ok. > Gcm changes are fine to me. > > I did not get how final_graph_reshaping change related to these changes. We try to assign control to memory which is dominated by current CastPP with control. So if you have chain of CastPP with control you will assign control of first (most dominating) and not the last which is near mem node. I don't think it is correct. In final_graph_reshaping we collect all control the memory node depends on and the one we keep is picked in gcm so can it really be incorrect to collect more controls? The reason I made that change is that in the test case, the graph before the Phi is removed is: LoadI -> CastPP 1 (to non NULL) -> Phi after the Phi is removed, we create the new CastPP: LoadI -> CastPP 1 (to non NULL) -> CastPP 2 (carries dependency) CastPP 1 has a control that dominates CastPP 2, so in graph_final_reshaping we want to collect the control of both CastPPs. (Actually, with the new Ideal transforms, the CastPPs will simplify but it?s not guaranteed in every cases) > I am worry a little about irreducible loops which have several entries. So you may get into a trouble by checking ordinary loop in IfNode::up_one_dom(). May be only counted loops? Or check the presence of irreducible loops. In IdealLoopTree::beautify_loops(), as I understand: if (_head->req() > 3 && !_irreducible) { split_outer_loop( phase ); result = true; } else if (!_head->is_Loop() && !_irreducible) { // Make a new LoopNode to replace the old loop head Node *l = new LoopNode( _head->in(1), _head->in(2) ); don?t we skip irreducible loops when we create LoopNodes? > Remove -XX:+UseSerialGC flag from test. Otherwise we will get error when testing will try to use other GC. The reason I added that option is because the test doesn?t fail with the default GC (G1). The reason I think is that the G1 post barrier has a wide barrier that prevents the load of saved_not_null to be optimized out (that?s https://bugs.openjdk.java.net/browse/JDK-8087341). Roland. > > Thanks, > Vladimir > > On 12/4/15 5:08 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8139771/webrev.00/ >> >> In PhiNode::Ideal(), when all uncasted inputs are identical, the Phi is removed which can cause a control dependency to be lost. See the test case (which includes a step by step description of how this can lead to a bad graph) for an example. The test case crashes on sparc with -XX:+StressGCM. >> >> To fix that I propose that rather than simply replacing the Phi by its uncasted input, we replace it by a CastPP that is specially marked as carrying a dependency (similar to what we have for CastII). That fixes this issue and should protect us from other similar issues: >> >> - I moved the _carry_dependency from CastII to ConstraintCast and CheckCastPP so it applies to CastPP, CastII and CheckCastPP >> - I added code to remove the casts that carry a dependency if there?s a dominating cast with identical inputs and a more restrictive type. I made is_dominator() a virtual method of PhaseTransform so we can have an implementation in GVN and use the same code during GVN and loop opts >> - We can now have a chain of CastPP with a control so the code in final_graph_reshaping shouldn?t go follow through a CastPP only if its control is null >> - I changed the code in PhaseCFG::schedule_pinned_nodes() to handle controls that are in the same block >> >> I performed extensive perf testing on x86 and sparc and found not statistically significant regressions. >> >> Roland. >> From aph at redhat.com Mon Dec 7 10:06:29 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 7 Dec 2015 10:06:29 +0000 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <5661F625.9000204@oracle.com> References: <5661F625.9000204@oracle.com> Message-ID: <56655A25.5020506@redhat.com> On 04/12/15 20:23, Mikael Vidstedt wrote: > In general, inline functions (apart from trivial ones) should be defined > in the inline.hpp file and any .cpp file actually making use of them > should include the inline.hpp file instead. Why does this rule exist? I never did see any reason for it, and I've never seen it in any other project. I guess it can break circular header dependency problems in some cases, but that's no reason it should be a general rule. Just curious, Andrew. From volker.simonis at gmail.com Mon Dec 7 10:38:42 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 7 Dec 2015 11:38:42 +0100 Subject: RFR (M): 8144748: Move assembler/macroAssembler inline function definitions to corresponding inline.hpp files In-Reply-To: <56655A25.5020506@redhat.com> References: <5661F625.9000204@oracle.com> <56655A25.5020506@redhat.com> Message-ID: On Mon, Dec 7, 2015 at 11:06 AM, Andrew Haley wrote: > On 04/12/15 20:23, Mikael Vidstedt wrote: >> In general, inline functions (apart from trivial ones) should be defined >> in the inline.hpp file and any .cpp file actually making use of them >> should include the inline.hpp file instead. > > Why does this rule exist? I never did see any reason for it, and I've > never seen it in any other project. I guess it can break circular > header dependency problems in some cases, but that's no reason it > should be a general rule. > Breaking dependencies is exactly the reason. The implementations of inline methods usually have a lot more dependencies compared to the plain class definition. So if you only need the class definition (e.g. because you need to declare a field of corresponding type in another class in another .hpp file) you just have to include the .hpp file. If you call methods of the corresponding class (usually in .cpp files) you include the .inline.hpp file. Unfortunately, as with templates, C++ has no standard mechanism for finding definitions of inline (or template) functions. They just have to be available during the compilation of a compilation unit. So the only alternative to the current schema would be to put all the inline function definitions into the .hpp files. But this would be ugly and as you correctly mentioned it would introduce a lot of unnecessary circular dependencies. Following the current rules correctly may seem unnecessarily inconvenient, but I think it is the best we can do. Notice that the widespread use of precompiled headers often hides problems which are introduced by not following these rules (because the definition of a required inline function is present in the PCH database although it is not explicitly included into a compilation unit). That's why I, as somebody who has to work with some of the compilers which don't support precompiled headers, often request a test make without precompiled headers in JPRT. Regards, Volker > Just curious, > > Andrew. From martin.doerr at sap.com Mon Dec 7 12:08:19 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Dec 2015 12:08:19 +0000 Subject: RFR(XS): 8144822: Fix build after 8072008 Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286B3D@DEWDFEMB19C.global.corp.sap> Hi, 8072008 contained a small typo in the ppc.ad file. Fix is here: http://cr.openjdk.java.net/~mdoerr/8144822_fix_ppc_build/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From edward.nevill at gmail.com Mon Dec 7 12:22:14 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 12:22:14 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661CF8B.6040405@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> Message-ID: <1449490934.12382.49.camel@mint> On Fri, 2015-12-04 at 17:38 +0000, Andrew Haley wrote: > On 12/04/2015 04:14 PM, Andrew Haley wrote: > I'm going to suggest this as a simpler fix: > > address Relocation::pd_call_destination(address orig_addr) { > assert(is_call(), "should be a call here"); > if (NativeCall::is_call_at(addr())) { // is a BL instruction > address trampoline = nativeCall_at(addr())->get_trampoline(); > if (trampoline) { > return nativeCallTrampolineStub_at(trampoline)->destination(); > } > } > if (orig_addr != NULL) { > return MacroAssembler::pd_call_destination(orig_addr); > } > return MacroAssembler::pd_call_destination(addr()); > } > > I think it's right because this way we only follow real BL > instructions, and if these point to trampolines they must be within > the blob which is being relocated. I think this will fix your problem > because such BL instructions cannot point to anywhere wild. I am not sure this works. Firstly, in the case that far_branches are not enabled (IE the code cache is <= 128m), then there could be BL instructions to other addresses outside the current code blob. These are generated by far_call as follows. if (far_branches()) { unsigned long offset; // We can use ADRP here because we know that the total size of // the code cache cannot exceed 2Gb. adrp(tmp, entry, offset); add(tmp, tmp, offset); if (cbuf) cbuf->set_insts_mark(); blr(tmp); } else { if (cbuf) cbuf->set_insts_mark(); bl(entry); } I cannot see what prevents one of these BLs from being followed and since they may have been copied but not relocated then they may end up pointing somewhere random in the code buffer which just happens to look like a trampoline. Admittedly, the probability of failure is vastly reduced because there are no genuine trampolines for it to latch on to. This case can be avoided by adding a far_branches() predicate to pd_call_destination as follows. if (far_branches() && NativeCall::is_call_at(addr())) { // is a BL instruction Second, I am not such that your assertion > (When a trampoline call is first created it is a call to self; the > reloc is the only way to find the trampoline. For this reason, you > must use nativeCall_at(addr())->get_trampoline().) is correct. In MacroAssembler::trampoline_call I see if (Assembler::reachable_from_branch_at(pc(), entry.target())) { bl(entry.target()); } else { bl(pc()); } so it only creates a call to self if the branch does not reach and as before you could have a dangling BL when this is copied. I believe it would be possible to replace the above code section with simply bl(pc()); since it will always be relocated and therefore you can always generate the call to self. All of this seems very fragile and I am wondering about the value of trampolines. The alternative to using trampolines would be to always generate adrp Xn, target & ~0xfff add Xn, Xn, target & 0xfff blr Xn On most modern, out of order, dual issue implementations the ADRP and ADD will be folded into a single micro-op which will then be dual issued with the BLR so it doesn't end up costing us anything. I did some experiments on 2 different implementations comparing the following 3 code fragments (where 'tramp_dest' is the final destination to be called). 1) Straight BL tramp_test: mov x2, x30 tramp1: bl tramp_dest subs x0, x0, #1 bne tramp1 ret x2 2) Straight ADRP/ADD tramp_test: mov x2, x30 tramp1: adr x3, tramp_dest add x3, x3, #0x0 blr x3 subs x0, x0, #1 bne tramp1 ret x2 3) Trampoline tramp_test: mov x2, x30 tramp1: bl tramp subs x0, x0, #1 bne tramp1 ret x2 tramp: ldr x1, tramp_adcon br x1 tramp_adcon: .dword tramp_dest I ran the above tests on 2 different implementations for 1E9 iteration. The results were Imp 1: Straight BL = 4.50157 sec, ADRP/ADD = 4.50157 sec, trampoline = 6.00209 sec Imp 2: Straight BL = 3.00107 sec, ADRP/ADD = 3.00106 sec, trampoline = 4.16815 sec Maybe we could just get rid of trampolines? All the best, Ed. From pavel.punegov at oracle.com Mon Dec 7 12:20:50 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 7 Dec 2015 15:20:50 +0300 Subject: RFR(S): 8140667: CompilerControl: tests incorrectly set states for excluded methods In-Reply-To: <565CF076.7020704@oracle.com> References: <93AEE95F-3BDB-4A20-9547-C276019E54A2@oracle.com> <565CF076.7020704@oracle.com> Message-ID: <59A5ABB7-9717-4235-9ACE-92EFB2C745B2@oracle.com> Thanks for review, Vladimir. Pavel. > On 01 Dec 2015, at 03:57, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 11/25/15 1:58 PM, Pavel Punegov wrote: >> Please review this fix for tests. >> >> Issue: CompilerOracle checks CompileCommands for being in excluded list >> or compileonly list while deciding to compile method or not. >> CompilerControl test framework assumes that these commands override each >> other. >> >> This fix makes tests to behave in the same manner as CompilerOracle >> does. Fix adds an internal singleton class to AbstractCommandBuilder >> used for creating a test commands (-XX:CompileCommand and >> CompileCommandFile) >> >> webrev: http://cr.openjdk.java.net/~ppunegov/8140667/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8140667 >> >> ? Thanks, >> Pavel Punegov >> From goetz.lindenmaier at sap.com Mon Dec 7 13:28:54 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 7 Dec 2015 13:28:54 +0000 Subject: RFR(XS): 8144822: Fix build after 8072008 In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672286B3D@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286B3D@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDB087@DEWDFEMB12A.global.corp.sap> Hi Martin, thanks for fixing this, looks good. Could you please prefix the bug with 'ppc64'? (I don't need a webrev for this). Thanks, Goetz. From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Montag, 7. Dezember 2015 13:08 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(XS): 8144822: Fix build after 8072008 Hi, 8072008 contained a small typo in the ppc.ad file. Fix is here: http://cr.openjdk.java.net/~mdoerr/8144822_fix_ppc_build/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Mon Dec 7 14:20:37 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 7 Dec 2015 14:20:37 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449490934.12382.49.camel@mint> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> Message-ID: <566595B5.9060400@redhat.com> On 12/07/2015 12:22 PM, Edward Nevill wrote: > I cannot see what prevents one of these BLs from being followed and > since they may have been copied but not relocated then they may end > up pointing somewhere random in the code buffer which just happens > to look like a trampoline. Admittedly, the probability of failure is > vastly reduced because there are no genuine trampolines for it to > latch on to. You must look inside get_trampoline(). It checks for this. > Second, I am not such that your assertion > >> (When a trampoline call is first created it is a call to self; the >> reloc is the only way to find the trampoline. For this reason, you >> must use nativeCall_at(addr())->get_trampoline().) > > is correct. In MacroAssembler::trampoline_call I see > > if (Assembler::reachable_from_branch_at(pc(), entry.target())) { > bl(entry.target()); > } else { > bl(pc()); > } > > so it only creates a call to self if the branch does not reach and > as before you could have a dangling BL when this is copied. It doesn't matter, because get_trampoline() checks for BLs outside the current method. > I believe it would be possible to replace the above code section > with simply > > bl(pc()); > since it will always be relocated and therefore you can always > generate the call to self. True. There are some other tidy-ups which could also be made in this area, but none of it is terribly important as far as I can see. > Maybe we could just get rid of trampolines? There's no need. In the commonest case we BL directly to the destination, which is optimal. Your ADRP/ADD examples aren't patchable; if you are going to compare trampolines with something else, whatever else you choose must be patchable, and it will be slower and/or larger than BL. Andrew. From martin.doerr at sap.com Mon Dec 7 14:31:52 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Dec 2015 14:31:52 +0000 Subject: RFR(XS): 8144822: Fix build after 8072008 In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDB087@DEWDFEMB12A.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286B3D@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EDB087@DEWDFEMB12A.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286BA6@DEWDFEMB19C.global.corp.sap> Hi G?tz, thanks for reviewing and sponsoring. Best regards, Martin From: Lindenmaier, Goetz Sent: Montag, 7. Dezember 2015 14:29 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(XS): 8144822: Fix build after 8072008 Hi Martin, thanks for fixing this, looks good. Could you please prefix the bug with 'ppc64'? (I don't need a webrev for this). Thanks, Goetz. From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Montag, 7. Dezember 2015 13:08 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(XS): 8144822: Fix build after 8072008 Hi, 8072008 contained a small typo in the ppc.ad file. Fix is here: http://cr.openjdk.java.net/~mdoerr/8144822_fix_ppc_build/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Dec 7 14:39:19 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 7 Dec 2015 15:39:19 +0100 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> Message-ID: <56659A17.6010300@oracle.com> Hi Jan, the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: - The following comments are wrong: 8355 } else { //ae == StrIntrinsicNode::UL 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). - Missing whitespace after comma: 8143 cmpl(cnt2,stride2x2); I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). Best, Tobias On 05.12.2015 05:07, Civlin, Jan wrote: > We would like to contribute AVX3 patch for MacroAssembler::string_compare. > > This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). > > > Contributors: > MacroAssembler::string_compare - Jan Civlin. > Rest of code, including all x86 AVX3 extensions - Michael Berg > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 > Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ > From goetz.lindenmaier at sap.com Mon Dec 7 15:12:19 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 7 Dec 2015 15:12:19 +0000 Subject: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. Message-ID: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> Hi, I need to fix the calls to runtime for ppc because it expects int being properly sign extended to long. In 8086069, we tried to push this to the platform code, but for opto stubs this is not possible. Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/webrevs/8144466-ppcOptoStubs/webrev.00/ The change comes with a corresponding test. I also added a test for a problem we fixed before, where floating point args were passed wrong. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.yang at linaro.org Mon Dec 7 15:26:06 2015 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 7 Dec 2015 23:26:06 +0800 Subject: RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions Message-ID: Hi, I have corrected the webrev issues in my previous mail. Thanks Edward for providing the help. Now I am resending this mail: Can someone help review and sponsor this code generation improvement for aarch64 port? Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server as a test case. With this patch, the following code snippet by C2: 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s will be further optimized into: 0x0000007f9cdb86dc: mul v19.4s, v16.4s, v17.4s 0x0000007f9cdb86e0: mla v19.4s, v16.4s, v18.4s 0x0000007f9cdb86e4: mla v19.4s, v17.4s, v18.4s About 13% performance gain achieved for the test case on my aarch64 server. Tested with jtreg hotspot & langtools. Results are the same before and after. Is it OK to push? Felix, Thanks for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Mon Dec 7 15:32:43 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 7 Dec 2015 16:32:43 +0100 Subject: RFR(S): 8134883: C1 hard crash in range check elimination in Nashorn test262parallel In-Reply-To: <6A256077-5431-4842-8EA5-E1AA6582E5F2@oracle.com> References: <02C0AC97-22B8-415F-93D2-6F5D73036493@oracle.com> <6A256077-5431-4842-8EA5-E1AA6582E5F2@oracle.com> Message-ID: <96061182-9D07-4D7C-9636-D581BE6F2CCD@oracle.com> Thanks for the review, Igor. Roland. From edward.nevill at gmail.com Mon Dec 7 16:21:17 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 16:21:17 +0000 Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions In-Reply-To: References: Message-ID: <1449505277.12382.69.camel@mint> Hi Felix, Thanks for this. This optimisation looks good to me. Could we have an official reviewer please. Thanks, Ed. On Mon, 2015-12-07 at 23:26 +0800, Felix Yang wrote: > Hi, > > I have corrected the webrev issues in my previous mail. Thanks Edward for > providing the help. > Now I am resending this mail: > > Can someone help review and sponsor this code generation improvement for > aarch64 port? > Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 > > The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server > as a test case. > With this patch, the following code snippet by C2: > 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s > 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s > 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s > 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s > 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s > will be further optimized into: > 0x0000007f9cdb86dc: mul v19.4s, v16.4s, v17.4s > 0x0000007f9cdb86e0: mla v19.4s, v16.4s, v18.4s > 0x0000007f9cdb86e4: mla v19.4s, v17.4s, v18.4s > > About 13% performance gain achieved for the test case on my aarch64 > server. > Tested with jtreg hotspot & langtools. Results are the same before and > after. > Is it OK to push? > > Felix, > Thanks for your help. From roland.westrelin at oracle.com Mon Dec 7 16:31:06 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 7 Dec 2015 17:31:06 +0100 Subject: RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions In-Reply-To: References: Message-ID: > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 That looks good to me. Roland. From martin.doerr at sap.com Mon Dec 7 17:10:06 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Dec 2015 17:10:06 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Mon Dec 7 17:21:09 2015 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 7 Dec 2015 09:21:09 -0800 Subject: RFR (S): 8143571: [JVMCI] Double unregistering of nmethod during unloading In-Reply-To: References: Message-ID: <5DB81A6C-FFA9-4121-9347-B3163CE81BE8@oracle.com> Thanks for the reviews. tom > On Dec 4, 2015, at 4:09 PM, Christian Thalinger wrote: > > Yes, seems good. > >> On Dec 4, 2015, at 11:02 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8143571/webrev.00/ >> >> This is a follow on fix for 8142436 which introduced some assertion failures when running with G1. They were benign in practice but it indicated that more careful updating of those field was required. In the end the idea is that we never clear the _installed_code field using the barrier logic except during the do_unloading methods. The other places we want to clear the installed code is when becoming unloaded or zombie and those are explicitly unregistered anyway. We explicitly clear _installed_code after the unregister for clarity instead. >> >> It also became clear that we needed to ensure that updates to the installed code were performed under a lock to make sure it had a consistent state. Since part of the updates were already being done under the Patching_lock because they were part of make_not_entrant_or_zombie, I used the Patching_lock in other places to protect updates. This has been in our graal repo with nightly fastdebug testing for the last few weeks without issue. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.reinhold at oracle.com Mon Dec 7 17:58:25 2015 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Mon, 07 Dec 2015 09:58:25 -0800 Subject: Reference.reachabilityFence In-Reply-To: <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com>, , <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com>, <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> Message-ID: <20151207095825.952677@eggemoggin.niobe.net> 2015/12/4 5:47 -0800, paul.sandoz at oracle.com: >> On 3 Dec 2015, at 22:33, Mandy Chung wrote: >>> On Nov 26, 2015, at 8:22 AM, Paul Sandoz wrote: >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-hotspot/webrev/ >>> >>> There is now more documentation on Reference (copied and suitable >>> rearranged from 166 Fences.java). The method name remains the same. >> >> I think the addition to the Reference class specification should >> belong to the reachabilityFence method specification. Any reason why >> not? > > I thought it would be more visible in the JavaDoc, as it?s there > upfront. The api note may get larger if we include some additional > real world examples. I don?t have a strong opinion on this, if yours > is stronger i will move it :-) I agree with Mandy -- the new text about fences belongs in the method doc, not the class doc. Further comments, mostly minor: - In the opening sentence, s/strongly reachability/strong reachability/. - I'd remove the phrase "As illustrated in the sample usages of the api note below" from the normative text. The API note follows immediately; there's no need to point to it. - s/a Java Virtual Machine/the virtual machine/ - s/A garbage collector/The garbage collector/ - s/call to/invocation of/ - s/ for example /, for example,/ - s/if it were OK/if it were acceptable/ ("OK" is a bit too informal) - s!in general!, in general,! - s/Fences.reachabilityFence/Reference.reachabilityFence/ in the examples - I now agree with you and Doug about calling this a "fence". Can we just name it "fence" rather than the wordier "reachabilityFence"? Looking at a typical invocation, Reference.reachabilityFence(); seems a bit redundant while Reference.fence(); reads quite nicely. Is there, or will there ever be, any other kind of reference-related fence? - Mark From vivek.r.deshpande at intel.com Mon Dec 7 19:29:02 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 7 Dec 2015 19:29:02 +0000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <56620192.5050808@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> Hi Vladimir We have updated the jbs entry with your suggested changes for the flag. Would you please review it. jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, December 04, 2015 1:12 PM To: Deshpande, Vivek R; hotspot compiler Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). + #ifdef COMPILER2 + #ifdef _LP64 + if (UseSSE42Intrinsics) { Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. Thanks, Vladimir On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > We have updated the webrev at the jbs entry with the global flag. > This is the link for your review. > http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ > > Regards > Vivek > -----Original Message----- > From: Deshpande, Vivek R > Sent: Wednesday, December 02, 2015 11:21 AM > To: 'Vladimir Kozlov'; hotspot compiler > Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' > Subject: RE: RFR (M): 8143355: Update for addition of > vectorizedMismatch intrinsic for x86 > > Hi Vladimir > > Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). > We will update the patch and jbs entry with global flag and let you know soon. > > Regards, > Vivek > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, December 01, 2015 6:02 PM > To: Deshpande, Vivek R; hotspot compiler > Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric > Subject: Re: RFR (M): 8143355: Update for addition of > vectorizedMismatch intrinsic for x86 > > 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. > > If that is the case the flag should be global. > > Thanks, > Vladimir > > On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >> This seems fine. 2x is for AVX implementation? >> >> Thanks, >> Vladimir >> >> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>> Hi all >>> >>> We would like to contribute a patch from Intel which optimizes >>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>> architecture using AVX instructions. >>> >>> The improvement gives more than 2x gain over Unsafe implementation >>> for long arrays. >>> >>> >>> The bug is blocked by bug: vectorized support for array >>> equals/compare/mismatch using Unsafe >>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>> webrev: >>> >>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>> >>> Thanks and regards, >>> >>> Vivek >>> From christian.thalinger at oracle.com Mon Dec 7 21:10:15 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 7 Dec 2015 11:10:15 -1000 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: References: <22C54219-695B-486F-AEAA-7B96473DEDF4@oracle.com> Message-ID: <45717074-E9E8-4CA5-9C62-71FCB013CBDE@oracle.com> > On Dec 2, 2015, at 11:11 AM, Paul Sandoz wrote: > >> >> On 2 Dec 2015, at 21:58, Mandy Chung wrote: >> >> >>> On Nov 30, 2015, at 9:40 AM, Paul Sandoz wrote: >>> >>> Please review: >>> >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/ >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/ >> >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-hotspot/webrev/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/Stable.java.cdiff.html >> >> 32 * This annotation functions as an alias for the jdk.internal.Stable annotation within JVMCI >> >> s/jdk.internal.Stable/jdk.internal.vm.annotation.Stable >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java.frames.html >> >> 1327 mv.visitAnnotation("Ljdk/internal/DontInline;", true); >> >> need fixing. >> > > Oops that?s embarrassing, i fat fingered the search/replace. Our tests don?t catch such cases of non-existent annotations. I never liked the fact that we are using hardcoded strings here. Getting the name from the class would be better. > > Updated, thanks, > Paul. > > >> Otherwise, looks good. >> >> Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Dec 8 00:26:21 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Dec 2015 16:26:21 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> Message-ID: <566623AD.8060709@oracle.com> Looks good. I will push it when closed part (flag = false) reviewed. I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. Thanks, Vladimir On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > We have updated the jbs entry with your suggested changes for the flag. > Would you please review it. > jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 > webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ > > Regards, > Vivek > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, December 04, 2015 1:12 PM > To: Deshpande, Vivek R; hotspot compiler > Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz > Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 > > You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). > > + #ifdef COMPILER2 > + #ifdef _LP64 > + if (UseSSE42Intrinsics) { > > Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. > > Thanks, > Vladimir > > On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >> Hi Vladimir >> >> We have updated the webrev at the jbs entry with the global flag. >> This is the link for your review. >> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >> >> Regards >> Vivek >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Wednesday, December 02, 2015 11:21 AM >> To: 'Vladimir Kozlov'; hotspot compiler >> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >> Subject: RE: RFR (M): 8143355: Update for addition of >> vectorizedMismatch intrinsic for x86 >> >> Hi Vladimir >> >> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >> We will update the patch and jbs entry with global flag and let you know soon. >> >> Regards, >> Vivek >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, December 01, 2015 6:02 PM >> To: Deshpande, Vivek R; hotspot compiler >> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >> Subject: Re: RFR (M): 8143355: Update for addition of >> vectorizedMismatch intrinsic for x86 >> >> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >> >> If that is the case the flag should be global. >> >> Thanks, >> Vladimir >> >> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>> This seems fine. 2x is for AVX implementation? >>> >>> Thanks, >>> Vladimir >>> >>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>> Hi all >>>> >>>> We would like to contribute a patch from Intel which optimizes >>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>> architecture using AVX instructions. >>>> >>>> The improvement gives more than 2x gain over Unsafe implementation >>>> for long arrays. >>>> >>>> >>>> The bug is blocked by bug: vectorized support for array >>>> equals/compare/mismatch using Unsafe >>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>> >>>> Could you please review and sponsor this patch. >>>> >>>> Bug-id: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>> webrev: >>>> >>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>> >>>> Thanks and regards, >>>> >>>> Vivek >>>> From vladimir.kozlov at oracle.com Tue Dec 8 01:17:47 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Dec 2015 17:17:47 -0800 Subject: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> Message-ID: <56662FBB.3070006@oracle.com> Looks fine. Do you need to do anything for C1 which you have now? Thanks, Vladimir On 12/7/15 7:12 AM, Lindenmaier, Goetz wrote: > Hi, > > I need to fix the calls to runtime for ppc because it expects int > > being properly sign extended to long. > > In 8086069, we tried to push this to the platform code, but for > > opto stubs this is not possible. > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/webrevs/8144466-ppcOptoStubs/webrev.00/ > > The change comes with a corresponding test. > > I also added a test for a problem we fixed before, where > > floating point args were passed wrong. > > Best regards, > > Goetz. > From vladimir.kozlov at oracle.com Tue Dec 8 02:06:33 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Dec 2015 18:06:33 -0800 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: <56623E4A.9040504@oracle.com> Message-ID: <56663B29.7050508@oracle.com> On 12/7/15 1:56 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> I was confused by naming first and by placement of code. >> And mixing code for casts and code in gcm. In reality the cast code tries to find only "immediate"/near dominating cast. So why (i >= 100) and not, lets say, >= 10? > > It?s arbitrary so if you think 100 is too much, sure we can go with 10. What is reasonable number, you think, based on tests you have? I think 100 is waste of time but 10 could be not enough. > >> I am not sure that code should be in Phase classes. It is only work for cast nodes. I think it should be in ConstraintCast. > > The reason I?d like PhaseTransform:: is_dominator() is that for 2 other prototypes I?m working on, I also had 2 copies of the same logic, one that I want apply during IGVN and one that I want apply during loop opts. I?d like to be able to write that logic only once in a clean way and be able to call it both from IGVN and loop opts. Understood. Does all these cases check only Cast nodes or others too? It may not safe in general case. Also since you are look in not on whole grpaph the method name should be something like is_near_dominator(). > >> The only thing you use is to set linear_only parameter. I know that in Identity we don't have a way to find is it GVN or IGVN. It is old problem we should fix. For example instead of passing can_reshape to Ideal it could be field in PhaseTransform set by constructor. Then you could access it from Identity and Ideal. >> >> An other thing is CheckCastPP. It should be subclass of ConstraintCastNode I think. Then make_cast() and dominating_cast() could be methods of ConstraintCastNode. > > Ok. > >> Gcm changes are fine to me. >> >> I did not get how final_graph_reshaping change related to these changes. We try to assign control to memory which is dominated by current CastPP with control. So if you have chain of CastPP with control you will assign control of first (most dominating) and not the last which is near mem node. I don't think it is correct. > > In final_graph_reshaping we collect all control the memory node depends on and the one we keep is picked in gcm so can it really be incorrect to collect more controls? > The reason I made that change is that in the test case, the graph before the Phi is removed is: > > LoadI -> CastPP 1 (to non NULL) -> Phi > > after the Phi is removed, we create the new CastPP: > > LoadI -> CastPP 1 (to non NULL) -> CastPP 2 (carries dependency) > > CastPP 1 has a control that dominates CastPP 2, so in graph_final_reshaping we want to collect the control of both CastPPs. I see. So it is the case when control sequence is opposite to sequence of CastPP. You are right - we need collect all controls. > > (Actually, with the new Ideal transforms, the CastPPs will simplify but it?s not guaranteed in every cases) > >> I am worry a little about irreducible loops which have several entries. So you may get into a trouble by checking ordinary loop in IfNode::up_one_dom(). May be only counted loops? Or check the presence of irreducible loops. > > In IdealLoopTree::beautify_loops(), as I understand: > > if (_head->req() > 3 && !_irreducible) { > split_outer_loop( phase ); > result = true; > > } else if (!_head->is_Loop() && !_irreducible) { > // Make a new LoopNode to replace the old loop head > Node *l = new LoopNode( _head->in(1), _head->in(2) ); > > don?t we skip irreducible loops when we create LoopNodes? I thought on some path we may create it but it looks like it is not he case. Okay then. > >> Remove -XX:+UseSerialGC flag from test. Otherwise we will get error when testing will try to use other GC. > > The reason I added that option is because the test doesn?t fail with the default GC (G1). The reason I think is that the G1 post barrier has a wide barrier that prevents the load of saved_not_null to be optimized out (that?s https://bugs.openjdk.java.net/browse/JDK-8087341). Add @requires vm.gc=="Serial" See: https://bugs.openjdk.java.net/browse/JDK-8062537 Thanks, Vladimir > > Roland. > >> >> Thanks, >> Vladimir >> >> On 12/4/15 5:08 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8139771/webrev.00/ >>> >>> In PhiNode::Ideal(), when all uncasted inputs are identical, the Phi is removed which can cause a control dependency to be lost. See the test case (which includes a step by step description of how this can lead to a bad graph) for an example. The test case crashes on sparc with -XX:+StressGCM. >>> >>> To fix that I propose that rather than simply replacing the Phi by its uncasted input, we replace it by a CastPP that is specially marked as carrying a dependency (similar to what we have for CastII). That fixes this issue and should protect us from other similar issues: >>> >>> - I moved the _carry_dependency from CastII to ConstraintCast and CheckCastPP so it applies to CastPP, CastII and CheckCastPP >>> - I added code to remove the casts that carry a dependency if there?s a dominating cast with identical inputs and a more restrictive type. I made is_dominator() a virtual method of PhaseTransform so we can have an implementation in GVN and use the same code during GVN and loop opts >>> - We can now have a chain of CastPP with a control so the code in final_graph_reshaping shouldn?t go follow through a CastPP only if its control is null >>> - I changed the code in PhaseCFG::schedule_pinned_nodes() to handle controls that are in the same block >>> >>> I performed extensive perf testing on x86 and sparc and found not statistically significant regressions. >>> >>> Roland. >>> > From vladimir.kozlov at oracle.com Tue Dec 8 02:10:31 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Dec 2015 18:10:31 -0800 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> <56659A17.6010300@oracle.com> <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> Message-ID: <56663C17.9060408@oracle.com> http://cr.openjdk.java.net/~kvn/8144771/webrev.01/ Vladimir On 12/7/15 5:50 PM, Civlin, Jan wrote: > Tobias, > > Thank you for spotting this. > These comments were from the design and reflected the original order str1/str2. I'm removing them since the function calls say enough. > The order should remain str2/str1 since the "result" is modified in the "str1" line. > > Vladimir, > could you please upload the updated patch (I still do not have an access). > > > Yes, the test has been run: > > [jcivlin at SKY71 test]$ date; echo $JAVA_HOME; ls -l $JAVA_HOME/lib/amd64/server/libjvm.so; time /home/jcivlin/Tools/jtreg/bin/jtreg compiler/intrinsics/string/TestStringIntrinsics.java > Mon Dec 7 11:00:07 PST 2015 > /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk > -rwxrwxr-x 1 jcivlin jcivlin 17999532 Dec 2 22:13 /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so > Test results: passed: 1 > Report written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTreport/html/report.html > Results written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTwork > > Thank you, > > Jan > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Monday, December 07, 2015 6:39 AM > To: Civlin, Jan; hotspot compiler > Cc: Vladimir Kozlov > Subject: Re: RFR:8144771: AVX3 patch for MacroAssembler::string_compare > > Hi Jan, > > the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: > - The following comments are wrong: > 8355 } else { //ae == StrIntrinsicNode::UL > 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string > 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string > The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). > - Missing whitespace after comma: > 8143 cmpl(cnt2,stride2x2); > > I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). > > Best, > Tobias > > On 05.12.2015 05:07, Civlin, Jan wrote: >> We would like to contribute AVX3 patch for MacroAssembler::string_compare. >> >> This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). >> >> >> Contributors: >> MacroAssembler::string_compare - Jan Civlin. >> Rest of code, including all x86 AVX3 extensions - Michael Berg >> >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 >> Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ From tobias.hartmann at oracle.com Tue Dec 8 07:20:11 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Dec 2015 08:20:11 +0100 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <56663C17.9060408@oracle.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> <56659A17.6010300@oracle.com> <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> <56663C17.9060408@oracle.com> Message-ID: <566684AB.90600@oracle.com> Hi Jan, On 08.12.2015 03:10, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8144771/webrev.01/ > > Vladimir > > On 12/7/15 5:50 PM, Civlin, Jan wrote: >> Tobias, >> >> Thank you for spotting this. >> These comments were from the design and reflected the original order str1/str2. I'm removing them since the function calls say enough. >> The order should remain str2/str1 since the "result" is modified in the "str1" line. Right, looks good to me! Best, Tobias >> >> Vladimir, >> could you please upload the updated patch (I still do not have an access). >> >> >> Yes, the test has been run: >> >> [jcivlin at SKY71 test]$ date; echo $JAVA_HOME; ls -l $JAVA_HOME/lib/amd64/server/libjvm.so; time /home/jcivlin/Tools/jtreg/bin/jtreg compiler/intrinsics/string/TestStringIntrinsics.java >> Mon Dec 7 11:00:07 PST 2015 >> /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk >> -rwxrwxr-x 1 jcivlin jcivlin 17999532 Dec 2 22:13 /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so >> Test results: passed: 1 >> Report written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTreport/html/report.html >> Results written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTwork >> >> Thank you, >> >> Jan >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Monday, December 07, 2015 6:39 AM >> To: Civlin, Jan; hotspot compiler >> Cc: Vladimir Kozlov >> Subject: Re: RFR:8144771: AVX3 patch for MacroAssembler::string_compare >> >> Hi Jan, >> >> the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: >> - The following comments are wrong: >> 8355 } else { //ae == StrIntrinsicNode::UL >> 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string >> 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string >> The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). >> - Missing whitespace after comma: >> 8143 cmpl(cnt2,stride2x2); >> >> I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). >> >> Best, >> Tobias >> >> On 05.12.2015 05:07, Civlin, Jan wrote: >>> We would like to contribute AVX3 patch for MacroAssembler::string_compare. >>> >>> This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). >>> >>> >>> Contributors: >>> MacroAssembler::string_compare - Jan Civlin. >>> Rest of code, including all x86 AVX3 extensions - Michael Berg >>> >>> >>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 >>> Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ From jan.civlin at intel.com Tue Dec 8 08:16:22 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Tue, 8 Dec 2015 08:16:22 +0000 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <566684AB.90600@oracle.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> <56659A17.6010300@oracle.com> <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> <56663C17.9060408@oracle.com> <566684AB.90600@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F10905C@ORSMSX104.amr.corp.intel.com> Thank you, Tobias. Best, Jan -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Monday, December 7, 2015 11:20 PM To: Vladimir Kozlov ; Civlin, Jan ; hotspot compiler Subject: Re: RFR:8144771: AVX3 patch for MacroAssembler::string_compare Hi Jan, On 08.12.2015 03:10, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8144771/webrev.01/ > > Vladimir > > On 12/7/15 5:50 PM, Civlin, Jan wrote: >> Tobias, >> >> Thank you for spotting this. >> These comments were from the design and reflected the original order str1/str2. I'm removing them since the function calls say enough. >> The order should remain str2/str1 since the "result" is modified in the "str1" line. Right, looks good to me! Best, Tobias >> >> Vladimir, >> could you please upload the updated patch (I still do not have an access). >> >> >> Yes, the test has been run: >> >> [jcivlin at SKY71 test]$ date; echo $JAVA_HOME; ls -l >> $JAVA_HOME/lib/amd64/server/libjvm.so; time >> /home/jcivlin/Tools/jtreg/bin/jtreg >> compiler/intrinsics/string/TestStringIntrinsics.java >> Mon Dec 7 11:00:07 PST 2015 >> /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server- >> release/jdk -rwxrwxr-x 1 jcivlin jcivlin 17999532 Dec 2 22:13 >> /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server- >> release/jdk/lib/amd64/server/libjvm.so >> Test results: passed: 1 >> Report written to >> /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTreport/html/report >> .html Results written to >> /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTwork >> >> Thank you, >> >> Jan >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Monday, December 07, 2015 6:39 AM >> To: Civlin, Jan; hotspot compiler >> Cc: Vladimir Kozlov >> Subject: Re: RFR:8144771: AVX3 patch for >> MacroAssembler::string_compare >> >> Hi Jan, >> >> the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: >> - The following comments are wrong: >> 8355 } else { //ae == StrIntrinsicNode::UL >> 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string >> 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string >> The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). >> - Missing whitespace after comma: >> 8143 cmpl(cnt2,stride2x2); >> >> I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). >> >> Best, >> Tobias >> >> On 05.12.2015 05:07, Civlin, Jan wrote: >>> We would like to contribute AVX3 patch for MacroAssembler::string_compare. >>> >>> This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). >>> >>> >>> Contributors: >>> MacroAssembler::string_compare - Jan Civlin. >>> Rest of code, including all x86 AVX3 extensions - Michael Berg >>> >>> >>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 >>> Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ From thomas.stuefe at gmail.com Tue Dec 8 08:22:03 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 8 Dec 2015 09:22:03 +0100 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> Message-ID: Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin wrote: > Hi, > > > > I have created a webrev for further PPC64 updates: > > AIX supports Transactional Memory with a certain kernel patch level. Add a > detection for it and make UseRTMLocking usable on AIX. > In addition, implement Atomic::cmpxchg for jbyte. > > > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue Dec 8 09:41:05 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 8 Dec 2015 09:41:05 +0000 Subject: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. In-Reply-To: <56662FBB.3070006@oracle.com> References: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> <56662FBB.3070006@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286DCC@DEWDFEMB19C.global.corp.sap> Hi Vladimir, C1 does the int to long conversions in the platform code. I didn't find any missing conversion in the C1 code. The only C1 issue with respect to runtime calls is 8143817 for which I'll send out another email. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Dienstag, 8. Dezember 2015 02:18 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. Looks fine. Do you need to do anything for C1 which you have now? Thanks, Vladimir On 12/7/15 7:12 AM, Lindenmaier, Goetz wrote: > Hi, > > I need to fix the calls to runtime for ppc because it expects int > > being properly sign extended to long. > > In 8086069, we tried to push this to the platform code, but for > > opto stubs this is not possible. > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/webrevs/8144466-ppcOptoStubs/webrev.00/ > > The change comes with a corresponding test. > > I also added a test for a problem we fixed before, where > > floating point args were passed wrong. > > Best regards, > > Goetz. > From martin.doerr at sap.com Tue Dec 8 09:45:55 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 8 Dec 2015 09:45:55 +0000 Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286DF1@DEWDFEMB19C.global.corp.sap> Hi, I think this is a real bug even though it does not lead to errors on many platforms. The c_calling_convention should be consulted when we call C. Can anybody take a look, please? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Montag, 30. November 2015 15:22 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls Hi, Runtime calls usually use the c_calling_convention to compute the layout of the stack frames. Currently, C1 does not use c_calling_convention for runtime calls without arguments, e.g. for calling os::javaTimeNanos. Hence, the platform dependent out_preserve_stack_slots is not accounted for. A possible way to fix this is implemented in this webrev: http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.00 These ones are the 3 callers of the respective function which don't call c_calling_convention: do_RuntimeCall(CAST_FROM_FN_PTR(address, TRACE_TIME_METHOD), 0, x); do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeMillis), 0, x); do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeNanos), 0, x); I think it would be more error prone to call it in all of them. The parameter "expected_arguments" is always 0. Would it be better to assert "expected_arguments == 0"? SPARC also uses out_preserve_stack_slots. Is this problem also relevant for SPARC? Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Tue Dec 8 12:38:23 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 8 Dec 2015 13:38:23 +0100 Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> Message-ID: <7F71F2CA-B520-43EC-91C7-63CA2AA76194@oracle.com> Hi Martin, > Runtime calls usually use the c_calling_convention to compute the layout of the stack frames. > > Currently, C1 does not use c_calling_convention for runtime calls without arguments, e.g. for calling os::javaTimeNanos. Hence, the platform dependent out_preserve_stack_slots is not accounted for. > A possible way to fix this is implemented in this webrev: > http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.00 I think this is good. > These ones are the 3 callers of the respective function which don?t call c_calling_convention: > do_RuntimeCall(CAST_FROM_FN_PTR(address, TRACE_TIME_METHOD), 0, x); > do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeMillis), 0, x); > do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeNanos), 0, x); > > I think it would be more error prone to call it in all of them. The parameter ?expected_arguments? is always 0. > Would it be better to assert ?expected_arguments == 0?? Why not remove the expected_arguments argument and assert(x->number_of_arguments() == 0, ??) ? Roland. > > SPARC also uses out_preserve_stack_slots. Is this problem also relevant for SPARC? > > Best regards, > Martin From martin.doerr at sap.com Tue Dec 8 13:56:12 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 8 Dec 2015 13:56:12 +0000 Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls In-Reply-To: <7F71F2CA-B520-43EC-91C7-63CA2AA76194@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> <7F71F2CA-B520-43EC-91C7-63CA2AA76194@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286E8D@DEWDFEMB19C.global.corp.sap> Hi Roland, thank you very much for reviewing. I have removed the "expected_arguments" argument as you suggested and created a new webrev: http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.01/ Can anybody sponsor this fix, please? Best regards, Martin -----Original Message----- From: Roland Westrelin [mailto:roland.westrelin at oracle.com] Sent: Dienstag, 8. Dezember 2015 13:38 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls Hi Martin, > Runtime calls usually use the c_calling_convention to compute the layout of the stack frames. > > Currently, C1 does not use c_calling_convention for runtime calls without arguments, e.g. for calling os::javaTimeNanos. Hence, the platform dependent out_preserve_stack_slots is not accounted for. > A possible way to fix this is implemented in this webrev: > http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.00 I think this is good. > These ones are the 3 callers of the respective function which don?t call c_calling_convention: > do_RuntimeCall(CAST_FROM_FN_PTR(address, TRACE_TIME_METHOD), 0, x); > do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeMillis), 0, x); > do_RuntimeCall(CAST_FROM_FN_PTR(address, os::javaTimeNanos), 0, x); > > I think it would be more error prone to call it in all of them. The parameter ?expected_arguments? is always 0. > Would it be better to assert ?expected_arguments == 0?? Why not remove the expected_arguments argument and assert(x->number_of_arguments() == 0, ??) ? Roland. > > SPARC also uses out_preserve_stack_slots. Is this problem also relevant for SPARC? > > Best regards, > Martin From andreas.eriksson at oracle.com Tue Dec 8 14:04:51 2015 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Tue, 8 Dec 2015 15:04:51 +0100 Subject: [8u-dev] backport RFR: 6869327: Add new C2 flag to keep safepoints in counted loops Message-ID: <5666E383.3010102@oracle.com> Hi, Please review this backport of JDK-6869327: Add new C2 flag to keep safepoints in counted loops. The only change in this backport is to the test, where the testlibrary imports needed to be changed, and I also removed the @module tag. JDK 9 review: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-November/020110.html Webrev for changes between 9 and 8: http://cr.openjdk.java.net/~aeriksso/6869327/webrev.9_to_8/ Full 8u webrev: http://cr.openjdk.java.net/~aeriksso/6869327/webrev.jdk8u/ Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. https://bugs.openjdk.java.net/browse/JDK-6869327 Thanks, Andreas From martin.doerr at sap.com Tue Dec 8 14:08:46 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 8 Dec 2015 14:08:46 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> Hi Thomas, thanks for the hint. There are changes in hs-comp and hs-rt which would cause trouble with my change at the moment. I?ll wait until they get merged and create a new webrev which hopefully applies to both repositories. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Dienstag, 8. Dezember 2015 09:22 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Tue Dec 8 14:28:06 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 8 Dec 2015 15:28:06 +0100 Subject: 8144223: Move j.l.invoke.{ForceInline, DontInline, Stable} to jdk.internal.vm.annotation package In-Reply-To: <45717074-E9E8-4CA5-9C62-71FCB013CBDE@oracle.com> References: <22C54219-695B-486F-AEAA-7B96473DEDF4@oracle.com> <45717074-E9E8-4CA5-9C62-71FCB013CBDE@oracle.com> Message-ID: <6E6E8A0F-D617-4A2A-B9DA-AFE2D183D872@oracle.com> > On 7 Dec 2015, at 22:10, Christian Thalinger wrote: >>> >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8144223-move-stable-force-dont-inline-jdk/webrev/src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java.frames.html >>> >>> 1327 mv.visitAnnotation("Ljdk/internal/DontInline;", true); >>> >>> need fixing. >>> >> >> Oops that?s embarrassing, i fat fingered the search/replace. Our tests don?t catch such cases of non-existent annotations. > > I never liked the fact that we are using hardcoded strings here. Getting the name from the class would be better. > Agreed, issue logged: https://bugs.openjdk.java.net/browse/JDK-8144931 Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From edward.nevill at gmail.com Tue Dec 8 15:32:30 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 08 Dec 2015 15:32:30 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <566595B5.9060400@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> Message-ID: <1449588750.5880.28.camel@mylittlepony.linaroharston> On Mon, 2015-12-07 at 14:20 +0000, Andrew Haley wrote: > On 12/07/2015 12:22 PM, Edward Nevill wrote: > > > I cannot see what prevents one of these BLs from being followed and > > since they may have been copied but not relocated then they may end > > up pointing somewhere random in the code buffer which just happens > > to look like a trampoline. Admittedly, the probability of failure is > > vastly reduced because there are no genuine trampolines for it to > > latch on to. > > You must look inside get_trampoline(). It checks for this. OK. Thanks, I have satisfied myself that this is correct. New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 I was having difficulty understanding why the check inside get_trapoline() did not exclude the adrp/add relocation. However when I trap it doing the relocation in gdb I see Original: 0x3ff54170b50: adrp x8, 0x3ff54170000 <<< Not in code blob 0x3ff54170b54: add x8, x8, #0x400 0x3ff54170b58: blr x8 Copied but not relocated. 0x3ff5481d250: adrp x8, 0x3ff5481d000 <<< Within code blob 0x3ff5481d254: add x8, x8, #0x400 0x3ff5481d258: blr x8 So the destination offset in the original is 0x3ff54170400 - 0x3ff54170b50 = 0xfffffffffffff8b0, whereas in the copied but not relocated version it is 0x3ff5481d400 - 0x3ff5481d250 = 0x1b0 which is within the current code blob. This happens because of the half PC relative, half absolute nature of the adrp/add relocation in that the bottom 12 bits are always absolute whereas the adrp instruction is PC relative. I have retested this with JTreg hotspot & langtools with ReservedCodeCacheSize=256m Hotspot original: Test results: passed: 865; failed: 19; error: 85 Hotspot revised: Test results: passed: 953; failed: 9; error: 12 Langtools original: Test results: passed: 3,049; failed: 77; error: 223 Langtools revised: Test results: passed: 3,316; failed: 33 Thanks for the review, Ed. From aph at redhat.com Tue Dec 8 15:49:40 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Dec 2015 15:49:40 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> <1449588750.5880.28.camel@mylittlepony.linaroharston> Message-ID: <5666FC14.6020001@redhat.com> On 12/08/2015 03:32 PM, Edward Nevill wrote: > OK. Thanks, I have satisfied myself that this is correct. > > New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 That looks good to me. Thanks, Andrew. From pavel.punegov at oracle.com Tue Dec 8 16:35:18 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Tue, 8 Dec 2015 19:35:18 +0300 Subject: RFR (XXS): 8144933: CompilerControl: commandfile/ExcludeTest has incorrect jtreg run innotation Message-ID: <507299AD-9F7C-4ADC-AC61-C13DB921C879@oracle.com> Hi, please review this small fix to the test. Issue: test has incorrect @run annotation and hence runs another test instead. webrev: http://cr.openjdk.java.net/~ppunegov/8144933/webrev.00/ bug : https://bugs.openjdk.java.net/browse/JDK-8144933 ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Dec 8 17:31:48 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 8 Dec 2015 09:31:48 -0800 Subject: RFR (XXS): 8144933: CompilerControl: commandfile/ExcludeTest has incorrect jtreg run innotation In-Reply-To: <507299AD-9F7C-4ADC-AC61-C13DB921C879@oracle.com> References: <507299AD-9F7C-4ADC-AC61-C13DB921C879@oracle.com> Message-ID: <56671404.5030703@oracle.com> Good. Thanks,\ Vladimir On 12/8/15 8:35 AM, Pavel Punegov wrote: > Hi, > > please review this small fix to the test. > Issue: test has incorrect @run annotation and hence runs another test instead. > > webrev: http://cr.openjdk.java.net/~ppunegov/8144933/webrev.00/ > bug : https://bugs.openjdk.java.net/browse/JDK-8144933 > > ? Thanks, > Pavel Punegov > From vladimir.kozlov at oracle.com Tue Dec 8 17:54:42 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 8 Dec 2015 09:54:42 -0800 Subject: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672286DCC@DEWDFEMB19C.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> <56662FBB.3070006@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672286DCC@DEWDFEMB19C.global.corp.sap> Message-ID: <56671962.6090509@oracle.com> Okay. I need to get closed changes reviewed before push. Thanks, Vladimir On 12/8/15 1:41 AM, Doerr, Martin wrote: > Hi Vladimir, > > C1 does the int to long conversions in the platform code. I didn't find any missing conversion in the C1 code. > The only C1 issue with respect to runtime calls is 8143817 for which I'll send out another email. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 8. Dezember 2015 02:18 > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. > > Looks fine. Do you need to do anything for C1 which you have now? > > Thanks, > Vladimir > > On 12/7/15 7:12 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> I need to fix the calls to runtime for ppc because it expects int >> >> being properly sign extended to long. >> >> In 8086069, we tried to push this to the platform code, but for >> >> opto stubs this is not possible. >> >> Please review this change. I please need a sponsor. >> >> http://cr.openjdk.java.net/~goetz/webrevs/8144466-ppcOptoStubs/webrev.00/ >> >> The change comes with a corresponding test. >> >> I also added a test for a problem we fixed before, where >> >> floating point args were passed wrong. >> >> Best regards, >> >> Goetz. >> From christian.thalinger at oracle.com Tue Dec 8 18:57:57 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 8 Dec 2015 08:57:57 -1000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <566623AD.8060709@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> <566623AD.8060709@oracle.com> Message-ID: + product(bool, UseVectorizedMismatchIntrinsic, false, \ + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ Do all these really need to be product flags? > On Dec 7, 2015, at 2:26 PM, Vladimir Kozlov wrote: > > Looks good. I will push it when closed part (flag = false) reviewed. > I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. > > Thanks, > Vladimir > > On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: >> Hi Vladimir >> >> We have updated the jbs entry with your suggested changes for the flag. >> Would you please review it. >> jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 >> webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ >> >> Regards, >> Vivek >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Friday, December 04, 2015 1:12 PM >> To: Deshpande, Vivek R; hotspot compiler >> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz >> Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 >> >> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). >> >> + #ifdef COMPILER2 >> + #ifdef _LP64 >> + if (UseSSE42Intrinsics) { >> >> Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. >> >> Thanks, >> Vladimir >> >> On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> We have updated the webrev at the jbs entry with the global flag. >>> This is the link for your review. >>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >>> >>> Regards >>> Vivek >>> -----Original Message----- >>> From: Deshpande, Vivek R >>> Sent: Wednesday, December 02, 2015 11:21 AM >>> To: 'Vladimir Kozlov'; hotspot compiler >>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >>> Subject: RE: RFR (M): 8143355: Update for addition of >>> vectorizedMismatch intrinsic for x86 >>> >>> Hi Vladimir >>> >>> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >>> We will update the patch and jbs entry with global flag and let you know soon. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, December 01, 2015 6:02 PM >>> To: Deshpande, Vivek R; hotspot compiler >>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >>> Subject: Re: RFR (M): 8143355: Update for addition of >>> vectorizedMismatch intrinsic for x86 >>> >>> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >>> >>> If that is the case the flag should be global. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>>> This seems fine. 2x is for AVX implementation? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>>> Hi all >>>>> >>>>> We would like to contribute a patch from Intel which optimizes >>>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>>> architecture using AVX instructions. >>>>> >>>>> The improvement gives more than 2x gain over Unsafe implementation >>>>> for long arrays. >>>>> >>>>> >>>>> The bug is blocked by bug: vectorized support for array >>>>> equals/compare/mismatch using Unsafe >>>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>>> >>>>> Could you please review and sponsor this patch. >>>>> >>>>> Bug-id: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>>> >>>>> Thanks and regards, >>>>> >>>>> Vivek >>>>> From vladimir.kozlov at oracle.com Tue Dec 8 19:04:09 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 8 Dec 2015 11:04:09 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> <566623AD.8060709@oracle.com> Message-ID: <566729A9.2020609@oracle.com> Historically we have intrinsics flag as product. But in reality to have them diagnostic, for example, is also fine. But it would be different change. Thanks, Vladimir On 12/8/15 10:57 AM, Christian Thalinger wrote: > + product(bool, UseVectorizedMismatchIntrinsic, false, \ > + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ > > Do all these really need to be product flags? > >> On Dec 7, 2015, at 2:26 PM, Vladimir Kozlov wrote: >> >> Looks good. I will push it when closed part (flag = false) reviewed. >> I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. >> >> Thanks, >> Vladimir >> >> On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: >>> Hi Vladimir >>> >>> We have updated the jbs entry with your suggested changes for the flag. >>> Would you please review it. >>> jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 >>> webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Friday, December 04, 2015 1:12 PM >>> To: Deshpande, Vivek R; hotspot compiler >>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz >>> Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 >>> >>> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). >>> >>> + #ifdef COMPILER2 >>> + #ifdef _LP64 >>> + if (UseSSE42Intrinsics) { >>> >>> Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> We have updated the webrev at the jbs entry with the global flag. >>>> This is the link for your review. >>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >>>> >>>> Regards >>>> Vivek >>>> -----Original Message----- >>>> From: Deshpande, Vivek R >>>> Sent: Wednesday, December 02, 2015 11:21 AM >>>> To: 'Vladimir Kozlov'; hotspot compiler >>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >>>> Subject: RE: RFR (M): 8143355: Update for addition of >>>> vectorizedMismatch intrinsic for x86 >>>> >>>> Hi Vladimir >>>> >>>> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >>>> We will update the patch and jbs entry with global flag and let you know soon. >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, December 01, 2015 6:02 PM >>>> To: Deshpande, Vivek R; hotspot compiler >>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >>>> Subject: Re: RFR (M): 8143355: Update for addition of >>>> vectorizedMismatch intrinsic for x86 >>>> >>>> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >>>> >>>> If that is the case the flag should be global. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>>>> This seems fine. 2x is for AVX implementation? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>>>> Hi all >>>>>> >>>>>> We would like to contribute a patch from Intel which optimizes >>>>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>>>> architecture using AVX instructions. >>>>>> >>>>>> The improvement gives more than 2x gain over Unsafe implementation >>>>>> for long arrays. >>>>>> >>>>>> >>>>>> The bug is blocked by bug: vectorized support for array >>>>>> equals/compare/mismatch using Unsafe >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>>>> >>>>>> Could you please review and sponsor this patch. >>>>>> >>>>>> Bug-id: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>>>> >>>>>> Thanks and regards, >>>>>> >>>>>> Vivek >>>>>> > From christian.thalinger at oracle.com Tue Dec 8 19:05:23 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 8 Dec 2015 09:05:23 -1000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <566729A9.2020609@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> <566623AD.8060709@oracle.com> <566729A9.2020609@oracle.com> Message-ID: <76440B2F-C4A6-447B-A202-8882D579573E@oracle.com> > On Dec 8, 2015, at 9:04 AM, Vladimir Kozlov wrote: > > Historically we have intrinsics flag as product. But in reality to have them diagnostic, for example, is also fine. But it would be different change. Maybe we should file an enhancement and change all of them to diagnostic. > > Thanks, > Vladimir > > On 12/8/15 10:57 AM, Christian Thalinger wrote: >> + product(bool, UseVectorizedMismatchIntrinsic, false, \ >> + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ >> >> Do all these really need to be product flags? >> >>> On Dec 7, 2015, at 2:26 PM, Vladimir Kozlov wrote: >>> >>> Looks good. I will push it when closed part (flag = false) reviewed. >>> I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: >>>> Hi Vladimir >>>> >>>> We have updated the jbs entry with your suggested changes for the flag. >>>> Would you please review it. >>>> jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 >>>> webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ >>>> >>>> Regards, >>>> Vivek >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Friday, December 04, 2015 1:12 PM >>>> To: Deshpande, Vivek R; hotspot compiler >>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz >>>> Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 >>>> >>>> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). >>>> >>>> + #ifdef COMPILER2 >>>> + #ifdef _LP64 >>>> + if (UseSSE42Intrinsics) { >>>> >>>> Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> We have updated the webrev at the jbs entry with the global flag. >>>>> This is the link for your review. >>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >>>>> >>>>> Regards >>>>> Vivek >>>>> -----Original Message----- >>>>> From: Deshpande, Vivek R >>>>> Sent: Wednesday, December 02, 2015 11:21 AM >>>>> To: 'Vladimir Kozlov'; hotspot compiler >>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >>>>> Subject: RE: RFR (M): 8143355: Update for addition of >>>>> vectorizedMismatch intrinsic for x86 >>>>> >>>>> Hi Vladimir >>>>> >>>>> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >>>>> We will update the patch and jbs entry with global flag and let you know soon. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, December 01, 2015 6:02 PM >>>>> To: Deshpande, Vivek R; hotspot compiler >>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >>>>> Subject: Re: RFR (M): 8143355: Update for addition of >>>>> vectorizedMismatch intrinsic for x86 >>>>> >>>>> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >>>>> >>>>> If that is the case the flag should be global. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>>>>> This seems fine. 2x is for AVX implementation? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>>>>> Hi all >>>>>>> >>>>>>> We would like to contribute a patch from Intel which optimizes >>>>>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>>>>> architecture using AVX instructions. >>>>>>> >>>>>>> The improvement gives more than 2x gain over Unsafe implementation >>>>>>> for long arrays. >>>>>>> >>>>>>> >>>>>>> The bug is blocked by bug: vectorized support for array >>>>>>> equals/compare/mismatch using Unsafe >>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>>>>> >>>>>>> Could you please review and sponsor this patch. >>>>>>> >>>>>>> Bug-id: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>>>> webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>>>>> >>>>>>> Thanks and regards, >>>>>>> >>>>>>> Vivek >>>>>>> >> From paul.sandoz at oracle.com Tue Dec 8 21:01:57 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 8 Dec 2015 22:01:57 +0100 Subject: Reference.reachabilityFence In-Reply-To: <20151207095825.952677@eggemoggin.niobe.net> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> Message-ID: <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> > On 7 Dec 2015, at 18:58, mark.reinhold at oracle.com wrote: > > 2015/12/4 5:47 -0800, paul.sandoz at oracle.com: >>> On 3 Dec 2015, at 22:33, Mandy Chung wrote: >>>> On Nov 26, 2015, at 8:22 AM, Paul Sandoz wrote: >>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/ >>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-hotspot/webrev/ >>>> >>>> There is now more documentation on Reference (copied and suitable >>>> rearranged from 166 Fences.java). The method name remains the same. >>> >>> I think the addition to the Reference class specification should >>> belong to the reachabilityFence method specification. Any reason why >>> not? >> >> I thought it would be more visible in the JavaDoc, as it?s there >> upfront. The api note may get larger if we include some additional >> real world examples. I don?t have a strong opinion on this, if yours >> is stronger i will move it :-) > > I agree with Mandy -- the new text about fences belongs in the method > doc, not the class doc. Thanks, moved. > > Further comments, mostly minor: > > - In the opening sentence, s/strongly reachability/strong reachability/. > > - I'd remove the phrase "As illustrated in the sample usages of the > api note below" from the normative text. The API note follows > immediately; there's no need to point to it. > > - s/a Java Virtual Machine/the virtual machine/ > > - s/A garbage collector/The garbage collector/ > > - s/call to/invocation of/ > > - s/ for example /, for example,/ > > - s/if it were OK/if it were acceptable/ ("OK" is a bit too informal) > > - s!in general!, in general,! > > - s/Fences.reachabilityFence/Reference.reachabilityFence/ in the examples > Updated: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html I think there is an opportunity to add further examples, but i would like to take a swing at that later on. > - I now agree with you and Doug about calling this a "fence". Can we > just name it "fence" rather than the wordier "reachabilityFence"? > Looking at a typical invocation, > > Reference.reachabilityFence(); > > seems a bit redundant while > > Reference.fence(); > > reads quite nicely. Is there, or will there ever be, any other kind > of reference-related fence? > I doubt there will be another kind of reference fence, but it could be used in conjunction with other memory fences (currently on VarHandles) and if static imports are used it might look rather out of place as to what fence ?fence? actually refers to. That is why i prefer the longer more descriptive name. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pavel.punegov at oracle.com Tue Dec 8 21:28:25 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 9 Dec 2015 00:28:25 +0300 Subject: RFR (XXS): 8144933: CompilerControl: commandfile/ExcludeTest has incorrect jtreg run innotation In-Reply-To: <56671404.5030703@oracle.com> References: <507299AD-9F7C-4ADC-AC61-C13DB921C879@oracle.com> <56671404.5030703@oracle.com> Message-ID: <3EB82051-BED1-41AC-9B90-F2780D070F39@oracle.com> Thanks, Vladimir Pavel. > On 08 Dec 2015, at 20:31, Vladimir Kozlov wrote: > > Good. > > Thanks,\ > Vladimir > > On 12/8/15 8:35 AM, Pavel Punegov wrote: >> Hi, >> >> please review this small fix to the test. >> Issue: test has incorrect @run annotation and hence runs another test instead. >> >> webrev: http://cr.openjdk.java.net/~ppunegov/8144933/webrev.00/ >> bug : https://bugs.openjdk.java.net/browse/JDK-8144933 >> >> ? Thanks, >> Pavel Punegov >> From doug.simon at oracle.com Tue Dec 8 22:07:02 2015 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 8 Dec 2015 23:07:02 +0100 Subject: RFR: 8144944: JVMCI compiler initialization can happen on different thread than JVMCI initialization Message-ID: The timer used for JVMCI related initialization (enabled by -Djvmci.inittimer=true) currently assumes that all JVMCI related initialization happens on the same thread. This is wrong. While all related initialization should happen on the same thread, core JVMCI initialization may be performed by one compiler thread and JVMCI compiler (e.g., Graal) initialization may be performed on a different compiler thread. For example, one compiler thread may be the first to execute: http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l129 which can trigger core JVMCI initialization while a different compiler thread is the first to reach: http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l145 which triggers JVMCI compiler initialization. https://bugs.openjdk.java.net/browse/JDK-8144944 http://cr.openjdk.java.net/~dnsimon/8144944/ -Doug From thomas.stuefe at gmail.com Wed Dec 9 07:28:19 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 9 Dec 2015 08:28:19 +0100 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> Message-ID: Hi Martin, You could split the os kernel detection from the RTM change and submit the former to hs-rt now. Kind regards, Thomas On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin wrote: > Hi Thomas, > > > > thanks for the hint. There are changes in hs-comp and hs-rt which would > cause trouble with my change at the moment. I?ll wait until they get merged > and create a new webrev which hopefully applies to both repositories. > > > > Best regards, > > Martin > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Dienstag, 8. Dezember 2015 09:22 > *To:* Doerr, Martin > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > thanks for this addition :) > > > > It may make a lot of sense to rebase this change to hs-rt, because > os_aix.cpp is quite different there after > http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise > we may have problems later applying your change atop of my change. > > > > ------------- > > > > About the AIX kernel version recognition: I know we talked about this, but > I have second thoughts now. I guess I did not think it really through > before, sorry. > > So, now I have a change request: > > > > Instead of introducing os::Aix::os_kernel_version > (version,release,techlevel,sp) beside the already existing > os::Aix::os_version (version,release) I would prefer just one parameter, > os_version, end enriching this by techlevel and sp. So, exactly what you > did for os_kernel_version. > > > > Basically, as a prototype: > > > > // -1 = uninitialized, otherwise 32 bit number: > > // 0xVVRRTTSS > > // VV - major version > > // RR - minor version > > // TT - tech level, if known, 0 otherwise > > // SS - service pack, if known, 0 otherwise > > static uint32_t os_version (); > > > > Then please change the few users of os::Aix::os_version() to now expect a > 32bit unsigned number. As far as I see there are only 3 callsites. > > > > ------------------- > > > > Other small nitpicks: > > > > - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) > tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all > those tracecalls to Unified logging in the near future and this would help > me finding all trace occurrences. > > > > - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from > libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes > dlfcn.h and stdlib.h from libodm_aix.hpp. > > > > - I probably would change "static unsigned int > determine_os_kernel_version(int major_aix_version, int minor_aix_version);" > to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", > but that is just a matter of taste. > > > > Kind Regards, Thomas > > > > > > On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: > > Hi, > > > > I have created a webrev for further PPC64 updates: > > AIX supports Transactional Memory with a certain kernel patch level. Add a > detection for it and make UseRTMLocking usable on AIX. > In addition, implement Atomic::cmpxchg for jbyte. > > > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Wed Dec 9 07:35:01 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 9 Dec 2015 07:35:01 +0000 Subject: RFR(M): 8144466: ppc64: fix argument passing through opto stubs. In-Reply-To: <56671962.6090509@oracle.com> References: <4295855A5C1DE049A61835A1887419CC41EDB0FA@DEWDFEMB12A.global.corp.sap> <56662FBB.3070006@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672286DCC@DEWDFEMB19C.global.corp.sap> <56671962.6090509@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDB94E@DEWDFEMB12A.global.corp.sap> Hi Vladimir, thanks a lot for review and sponsoring! Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Dienstag, 8. Dezember 2015 18:55 > To: Doerr, Martin ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(M): 8144466: ppc64: fix argument passing through opto > stubs. > > Okay. I need to get closed changes reviewed before push. > > Thanks, > Vladimir > > On 12/8/15 1:41 AM, Doerr, Martin wrote: > > Hi Vladimir, > > > > C1 does the int to long conversions in the platform code. I didn't find any > missing conversion in the C1 code. > > The only C1 issue with respect to runtime calls is 8143817 for which I'll send > out another email. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > > Sent: Dienstag, 8. Dezember 2015 02:18 > > To: hotspot-compiler-dev at openjdk.java.net > > Subject: Re: RFR(M): 8144466: ppc64: fix argument passing through opto > stubs. > > > > Looks fine. Do you need to do anything for C1 which you have now? > > > > Thanks, > > Vladimir > > > > On 12/7/15 7:12 AM, Lindenmaier, Goetz wrote: > >> Hi, > >> > >> I need to fix the calls to runtime for ppc because it expects int > >> > >> being properly sign extended to long. > >> > >> In 8086069, we tried to push this to the platform code, but for > >> > >> opto stubs this is not possible. > >> > >> Please review this change. I please need a sponsor. > >> > >> http://cr.openjdk.java.net/~goetz/webrevs/8144466- > ppcOptoStubs/webrev.00/ > >> > >> The change comes with a corresponding test. > >> > >> I also added a test for a problem we fixed before, where > >> > >> floating point args were passed wrong. > >> > >> Best regards, > >> > >> Goetz. > >> From rahul.v.raghavan at oracle.com Wed Dec 9 09:12:45 2015 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 9 Dec 2015 01:12:45 -0800 (PST) Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler Message-ID: Hello, Please review the following patch for JDK-6378256. webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). sample unit test: public class Jdk6378256Test { public static void main(String[] args) { Object obj = new Object(); long time = System.nanoTime(); for(int i = 0 ; i < 1000000 ; i++) System.identityHashCode(obj); //compare to obj.hashCode(); System.out.println ("Result = " + (System.nanoTime() - time)); } } Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. (looks in the header for the hashCode before calling into the VM). Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. So also added required additional null check for System.identityHashCode case. Testing: - successful JPRT run (-testset hotspot). - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). (with -client / -XX:TieredStopAtLevel=1 etc. options). - Added 'noreg-perf' label for this performance bug. Manual testing done and confirmed expected performance values for unit tests with fix. Thanks, Rahul From tobias.hartmann at oracle.com Wed Dec 9 11:20:52 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 9 Dec 2015 12:20:52 +0100 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <56680E94.40306@oracle.com> Hi Rahul, looks good to me (not a reviewer). Best, Tobias On 09.12.2015 10:12, Rahul Raghavan wrote: > Hello, > > Please review the following patch for JDK-6378256. > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). > Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > > sample unit test: > public class Jdk6378256Test > { > public static void main(String[] args) > { > Object obj = new Object(); > long time = System.nanoTime(); > for(int i = 0 ; i < 1000000 ; i++) > System.identityHashCode(obj); //compare to obj.hashCode(); > System.out.println ("Result = " + (System.nanoTime() - time)); > } > } > > Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > (looks in the header for the hashCode before calling into the VM). > Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > So also added required additional null check for System.identityHashCode case. > > Testing: > - successful JPRT run (-testset hotspot). > - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > (with -client / -XX:TieredStopAtLevel=1 etc. options). > - Added 'noreg-perf' label for this performance bug. > Manual testing done and confirmed expected performance values for unit tests with fix. > > Thanks, > Rahul > From roland.westrelin at oracle.com Wed Dec 9 13:27:28 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 9 Dec 2015 14:27:28 +0100 Subject: RFR 8143628: Fork sun.misc.Unsafe and jdk.internal.misc.Unsafe native method tables In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8143628-unsafe-native-hotspot/ The hotspot changes look good to me. Roland. From nils.eliasson at oracle.com Wed Dec 9 13:32:05 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 9 Dec 2015 14:32:05 +0100 Subject: RFR(S): 8144091: CompilerControl: directive file doesn't override inlining rules Message-ID: <56682D55.3040403@oracle.com> Hi, Please review this small change. During testing it was discovered than directives didn't override compile commands correctly. This change corrects two things: 1) A typo in matches_inline, where the inline_action wasn't used as it should. 2) Making sure any inline directive overides any compilecommand, even though non matches. This is required because of the different way inlines are matched in C1 and C2. The compiled commands are unordered and have gotten different priority: "-XX:CompileCommmand=inline,*.* -XX:CompileCommmand=dontinline,*.*" Would inline everything in C2 but nothing in C1. In compiler directives the inline rules are ordered (they are part of the same list). The first rule that matches applies. inline:"+*.*,-*.*" always matches the first and the action (+) is force inline. Testing: Running all inline tests, covering both commands, directives and overrides between them. Webrev: http://cr.openjdk.java.net/~neliasso/8144091/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8144091 Regards, Nils Eliasson From nils.eliasson at oracle.com Wed Dec 9 13:32:09 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 9 Dec 2015 14:32:09 +0100 Subject: RFR(S): 8144601: Premature assert in directive inline parsing Message-ID: <56682D59.6040307@oracle.com> Hi, Please review this small change. It fixes a case when the directives parsers can trigger an assert when parsing malformed inline commands. This assert triggers right before the error handling, so it has always run fine in product builds. The fix makes sure we bail out directly. Testing: The test case that caught the bug has been verified Webrev: http://cr.openjdk.java.net/~neliasso/8144601/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8144601 Regards, Nils Eliasson From roland.westrelin at oracle.com Wed Dec 9 14:19:37 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 9 Dec 2015 15:19:37 +0100 Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672286E8D@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> <7F71F2CA-B520-43EC-91C7-63CA2AA76194@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672286E8D@DEWDFEMB19C.global.corp.sap> Message-ID: <6732578E-8791-4754-A3BE-0A3BFFBAB1F4@oracle.com> Hi Martin, > I have removed the "expected_arguments" argument as you suggested and created a new webrev: > http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.01/ > > Can anybody sponsor this fix, please? I?ll sponsor it. We?re trying to push less important fixes early in the week to take no risk of disrupting the hs-comp -> hs-main integration we do at the end of the week so I?ll push this early next week. Roland. From roland.westrelin at oracle.com Wed Dec 9 14:32:30 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 9 Dec 2015 15:32:30 +0100 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <8544A13B-B408-4387-912F-C418202E1508@oracle.com> > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would be nice. Shouldn?t we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? Roland. From roland.westrelin at oracle.com Wed Dec 9 14:36:17 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 9 Dec 2015 15:36:17 +0100 Subject: RFR(S): 8144091: CompilerControl: directive file doesn't override inlining rules In-Reply-To: <56682D55.3040403@oracle.com> References: <56682D55.3040403@oracle.com> Message-ID: > Webrev: http://cr.openjdk.java.net/~neliasso/8144091/webrev.02/ That looks good to me. Roland. From roland.westrelin at oracle.com Wed Dec 9 14:37:33 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 9 Dec 2015 15:37:33 +0100 Subject: RFR(S): 8144601: Premature assert in directive inline parsing In-Reply-To: <56682D59.6040307@oracle.com> References: <56682D59.6040307@oracle.com> Message-ID: <68A06BA5-520C-43B6-80BD-7F0EA3F4F1EE@oracle.com> > Webrev: http://cr.openjdk.java.net/~neliasso/8144601/webrev.02/ That looks good to me. Roland. From martin.doerr at sap.com Wed Dec 9 15:36:54 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 9 Dec 2015 15:36:54 +0000 Subject: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls In-Reply-To: <6732578E-8791-4754-A3BE-0A3BFFBAB1F4@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB4181165672284CE0@DEWDFEMB19C.global.corp.sap> <7F71F2CA-B520-43EC-91C7-63CA2AA76194@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672286E8D@DEWDFEMB19C.global.corp.sap> <6732578E-8791-4754-A3BE-0A3BFFBAB1F4@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228720E@DEWDFEMB19C.global.corp.sap> Ok, thank you very much. Best regards, Martin -----Original Message----- From: Roland Westrelin [mailto:roland.westrelin at oracle.com] Sent: Mittwoch, 9. Dezember 2015 15:20 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8143817: C1: Platform dependent stack space not preserved for all runtime calls Hi Martin, > I have removed the "expected_arguments" argument as you suggested and created a new webrev: > http://cr.openjdk.java.net/~mdoerr/8143817_c1_runtime_call/webrev.01/ > > Can anybody sponsor this fix, please? I?ll sponsor it. We?re trying to push less important fixes early in the week to take no risk of disrupting the hs-comp -> hs-main integration we do at the end of the week so I?ll push this early next week. Roland. From martin.doerr at sap.com Wed Dec 9 15:49:28 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 9 Dec 2015 15:49:28 +0000 Subject: RFR(S): 8144850: C1: operator delete needs an implementation Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672287230@DEWDFEMB19C.global.corp.sap> Hi, unfortunately, I didn't test the slow debug build when I overworked JDK-8138890. Product and fastdebug build are working fine. However, we need another fix to support the slow debug build with xlC on AIX. A webrev is here: http://cr.openjdk.java.net/~mdoerr/8144850_c1_delete/webrev.00/ It would be great if somebody could review and sponsor. Thanks and best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Wed Dec 9 20:18:22 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 9 Dec 2015 10:18:22 -1000 Subject: RFR: 8144944: JVMCI compiler initialization can happen on different thread than JVMCI initialization In-Reply-To: References: Message-ID: <87316B3A-E5BD-4604-85D3-94942E6187D6@oracle.com> > On Dec 8, 2015, at 12:07 PM, Doug Simon wrote: > > The timer used for JVMCI related initialization (enabled by -Djvmci.inittimer=true) currently assumes that all JVMCI related initialization happens on the same thread. This is wrong. While all related initialization should happen on the same thread, core JVMCI initialization may be performed by one compiler thread and JVMCI compiler (e.g., Graal) initialization may be performed on a different compiler thread. > > For example, one compiler thread may be the first to execute: > > http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l129 > > which can trigger core JVMCI initialization while a different compiler thread is the first to reach: > > http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l145 > > which triggers JVMCI compiler initialization. How and when did the bug manifest? I guess writing a test is impossible. If that?s the case please add the noreg-hard label. > > https://bugs.openjdk.java.net/browse/JDK-8144944 > http://cr.openjdk.java.net/~dnsimon/8144944/ @SuppressFBWarnings(value = "ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD", justification = "only the initializing thread accesses this field") Is that still required? Which field was it? > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Dec 9 21:33:40 2015 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 9 Dec 2015 22:33:40 +0100 Subject: RFR: 8144944: JVMCI compiler initialization can happen on different thread than JVMCI initialization In-Reply-To: <87316B3A-E5BD-4604-85D3-94942E6187D6@oracle.com> References: <87316B3A-E5BD-4604-85D3-94942E6187D6@oracle.com> Message-ID: > On 09 Dec 2015, at 21:18, Christian Thalinger wrote: > >> >> On Dec 8, 2015, at 12:07 PM, Doug Simon wrote: >> >> The timer used for JVMCI related initialization (enabled by -Djvmci.inittimer=true) currently assumes that all JVMCI related initialization happens on the same thread. This is wrong. While all related initialization should happen on the same thread, core JVMCI initialization may be performed by one compiler thread and JVMCI compiler (e.g., Graal) initialization may be performed on a different compiler thread. >> >> For example, one compiler thread may be the first to execute: >> >> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l129 >> >> which can trigger core JVMCI initialization while a different compiler thread is the first to reach: >> >> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l145 >> >> which triggers JVMCI compiler initialization. > > How and when did the bug manifest? It appears sporadically with an assertions-enabled bootstrap (i.e., -esa -XX:+BootstrapJVMCI). > I guess writing a test is impossible. If that?s the case please add the noreg-hard label. Done. > >> >> https://bugs.openjdk.java.net/browse/JDK-8144944 >> http://cr.openjdk.java.net/~dnsimon/8144944/ > > @SuppressFBWarnings(value = "ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD", justification = "only the initializing thread accesses this field") > > Is that still required? Which field was it? It used to be timerDepth but now it?s initializingThread. So yes, it is still required. -Doug From christian.thalinger at oracle.com Wed Dec 9 21:46:41 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 9 Dec 2015 11:46:41 -1000 Subject: RFR: 8144944: JVMCI compiler initialization can happen on different thread than JVMCI initialization In-Reply-To: References: <87316B3A-E5BD-4604-85D3-94942E6187D6@oracle.com> Message-ID: <4F4BD980-36E4-437C-AEAA-D6C214110DEF@oracle.com> > On Dec 9, 2015, at 11:33 AM, Doug Simon wrote: > >> >> On 09 Dec 2015, at 21:18, Christian Thalinger wrote: >> >>> >>> On Dec 8, 2015, at 12:07 PM, Doug Simon wrote: >>> >>> The timer used for JVMCI related initialization (enabled by -Djvmci.inittimer=true) currently assumes that all JVMCI related initialization happens on the same thread. This is wrong. While all related initialization should happen on the same thread, core JVMCI initialization may be performed by one compiler thread and JVMCI compiler (e.g., Graal) initialization may be performed on a different compiler thread. >>> >>> For example, one compiler thread may be the first to execute: >>> >>> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l129 >>> >>> which can trigger core JVMCI initialization while a different compiler thread is the first to reach: >>> >>> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l145 >>> >>> which triggers JVMCI compiler initialization. >> >> How and when did the bug manifest? > > It appears sporadically with an assertions-enabled bootstrap (i.e., -esa -XX:+BootstrapJVMCI). > >> I guess writing a test is impossible. If that?s the case please add the noreg-hard label. > > Done. > >> >>> >>> https://bugs.openjdk.java.net/browse/JDK-8144944 >>> http://cr.openjdk.java.net/~dnsimon/8144944/ >> >> @SuppressFBWarnings(value = "ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD", justification = "only the initializing thread accesses this field") >> >> Is that still required? Which field was it? > > It used to be timerDepth but now it?s initializingThread. So yes, it is still required. That?s what I figured but thanks for confirming. Looks good. > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Dec 9 22:01:06 2015 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 9 Dec 2015 23:01:06 +0100 Subject: RFR: 8144944: JVMCI compiler initialization can happen on different thread than JVMCI initialization In-Reply-To: <4F4BD980-36E4-437C-AEAA-D6C214110DEF@oracle.com> References: <87316B3A-E5BD-4604-85D3-94942E6187D6@oracle.com> <4F4BD980-36E4-437C-AEAA-D6C214110DEF@oracle.com> Message-ID: <382134AD-B8B4-4DD2-B29E-5A25F69FF1D1@oracle.com> Thanks for the review. -Doug > On 09 Dec 2015, at 22:46, Christian Thalinger wrote: > >> >> On Dec 9, 2015, at 11:33 AM, Doug Simon wrote: >> >>> >>> On 09 Dec 2015, at 21:18, Christian Thalinger wrote: >>> >>>> >>>> On Dec 8, 2015, at 12:07 PM, Doug Simon wrote: >>>> >>>> The timer used for JVMCI related initialization (enabled by -Djvmci.inittimer=true) currently assumes that all JVMCI related initialization happens on the same thread. This is wrong. While all related initialization should happen on the same thread, core JVMCI initialization may be performed by one compiler thread and JVMCI compiler (e.g., Graal) initialization may be performed on a different compiler thread. >>>> >>>> For example, one compiler thread may be the first to execute: >>>> >>>> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l129 >>>> >>>> which can trigger core JVMCI initialization while a different compiler thread is the first to reach: >>>> >>>> http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/file/7a570929c5e5/src/share/vm/jvmci/jvmciCompiler.cpp#l145 >>>> >>>> which triggers JVMCI compiler initialization. >>> >>> How and when did the bug manifest? >> >> It appears sporadically with an assertions-enabled bootstrap (i.e., -esa -XX:+BootstrapJVMCI). >> >>> I guess writing a test is impossible. If that?s the case please add the noreg-hard label. >> >> Done. >> >>> >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8144944 >>>> http://cr.openjdk.java.net/~dnsimon/8144944/ >>> >>> @SuppressFBWarnings(value = "ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD", justification = "only the initializing thread accesses this field") >>> >>> Is that still required? Which field was it? >> >> It used to be timerDepth but now it?s initializingThread. So yes, it is still required. > > That?s what I figured but thanks for confirming. Looks good. > >> >> -Doug From dean.long at oracle.com Wed Dec 9 23:30:35 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 9 Dec 2015 15:30:35 -0800 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <5668B99B.5030103@oracle.com> The new System.identityHashCode optimization can't be turned off on the command-line, because InlineObjectHash only applies to Object.hashCode. Does it matter? dl On 12/9/2015 1:12 AM, Rahul Raghavan wrote: > Hello, > > Please review the following patch for JDK-6378256. > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). > Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > > sample unit test: > public class Jdk6378256Test > { > public static void main(String[] args) > { > Object obj = new Object(); > long time = System.nanoTime(); > for(int i = 0 ; i < 1000000 ; i++) > System.identityHashCode(obj); //compare to obj.hashCode(); > System.out.println ("Result = " + (System.nanoTime() - time)); > } > } > > Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > (looks in the header for the hashCode before calling into the VM). > Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > So also added required additional null check for System.identityHashCode case. > > Testing: > - successful JPRT run (-testset hotspot). > - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > (with -client / -XX:TieredStopAtLevel=1 etc. options). > - Added 'noreg-perf' label for this performance bug. > Manual testing done and confirmed expected performance values for unit tests with fix. > > Thanks, > Rahul From vladimir.x.ivanov at oracle.com Thu Dec 10 00:09:48 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Dec 2015 03:09:48 +0300 Subject: [9] RFR (XS): 8144935: C2: safepoint is pruned from a non-counted loop Message-ID: <5668C2CC.4080400@oracle.com> http://cr.openjdk.java.net/~vlivanov/8144935/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8144935 Recent fix for 6869327 introduced a regression - C2 became too aggressive when pruning redundant safepoint checks in the loop. If it is required to keep a safepoint in a loop, it is not safe to remove any safepoints unless there's a safepoint dominating all paths in the loop body. The fix is to restore original behavior. Also, slightly enhanced -XX:+TraceLoopOpts output. Testing: failing test. Best regards, Vladimir Ivanov From nils.eliasson at oracle.com Thu Dec 10 09:17:28 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 10 Dec 2015 10:17:28 +0100 Subject: RFR(S): 8144601: Premature assert in directive inline parsing In-Reply-To: <68A06BA5-520C-43B6-80BD-7F0EA3F4F1EE@oracle.com> References: <56682D59.6040307@oracle.com> <68A06BA5-520C-43B6-80BD-7F0EA3F4F1EE@oracle.com> Message-ID: <56694328.2030602@oracle.com> Thank you, Nils On 2015-12-09 15:37, Roland Westrelin wrote: >> Webrev: http://cr.openjdk.java.net/~neliasso/8144601/webrev.02/ > That looks good to me. > > Roland. > From vladimir.x.ivanov at oracle.com Thu Dec 10 11:07:11 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Dec 2015 14:07:11 +0300 Subject: [9] RFR (XS): 8145026: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with: java.lang.NullPointerException in ClassFileInstaller.main Message-ID: <56695CDF.9030206@oracle.com> http://cr.openjdk.java.net/~vlivanov/8145026/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8145026 compiler/jsr292/NonInlinedCall/RedefineTest.java failed once in nightly testing when installing compiled classes on BCP. I don't have an explanation why it happened, but I decided to slightly clean up the tests to express test scenario in a clearer way. Testing: compiler/jsr292/NonInlinedCall/ tests, JPRT Best regards, Vladimir Ivanov From roland.westrelin at oracle.com Thu Dec 10 11:08:52 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 10 Dec 2015 12:08:52 +0100 Subject: [9] RFR (XS): 8145026: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with: java.lang.NullPointerException in ClassFileInstaller.main In-Reply-To: <56695CDF.9030206@oracle.com> References: <56695CDF.9030206@oracle.com> Message-ID: <2AC26F48-E5EB-4759-82CA-4EC26C4ED8B6@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8145026/webrev.00/ That looks good to me. Roland. From roland.westrelin at oracle.com Thu Dec 10 11:09:07 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 10 Dec 2015 12:09:07 +0100 Subject: [9] RFR (XS): 8144935: C2: safepoint is pruned from a non-counted loop In-Reply-To: <5668C2CC.4080400@oracle.com> References: <5668C2CC.4080400@oracle.com> Message-ID: <6AEA6CD2-7C23-46BC-A3D2-83558549DE94@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8144935/webrev.00/ That looks good to me. Roland. From vladimir.x.ivanov at oracle.com Thu Dec 10 11:15:13 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Dec 2015 14:15:13 +0300 Subject: [9] RFR (XS): 8145026: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with: java.lang.NullPointerException in ClassFileInstaller.main In-Reply-To: <2AC26F48-E5EB-4759-82CA-4EC26C4ED8B6@oracle.com> References: <56695CDF.9030206@oracle.com> <2AC26F48-E5EB-4759-82CA-4EC26C4ED8B6@oracle.com> Message-ID: <56695EC1.8010908@oracle.com> Thanks, Roland. Best regards, Vladimir Ivanov On 12/10/15 2:08 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8145026/webrev.00/ > > That looks good to me. > > Roland. > From vladimir.x.ivanov at oracle.com Thu Dec 10 11:15:20 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Dec 2015 14:15:20 +0300 Subject: [9] RFR (XS): 8144935: C2: safepoint is pruned from a non-counted loop In-Reply-To: <6AEA6CD2-7C23-46BC-A3D2-83558549DE94@oracle.com> References: <5668C2CC.4080400@oracle.com> <6AEA6CD2-7C23-46BC-A3D2-83558549DE94@oracle.com> Message-ID: <56695EC8.5040208@oracle.com> Thanks, Roland. Best regards, Vladimir Ivanov On 12/10/15 2:09 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8144935/webrev.00/ > > That looks good to me. > > Roland. > From hui.shi at linaro.org Thu Dec 10 14:48:05 2015 From: hui.shi at linaro.org (Hui Shi) Date: Thu, 10 Dec 2015 22:48:05 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode Message-ID: Hi All, Could some one help comments this change? Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ This patch aims to remove redundant memory barrier after allocation node, on AArch64 it removes redundant dmb when creating object. The motivation is dmb instructions after commonly used object allocation, for example string and boxing objects is redundant with dmb inserted for final field write. In following small case: String foo(String s) { String copy = new String(s); return copy; } There are two dmb instructions in generated code. First one is membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. Second one is membar_release, inserted at exit of initializer method as final fields write happens. Allocated String doesn't escape in String initializer method, membar_release includes membar_storestore semantic. So first one can be removed safely. 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256] 0x0000007f85bbfa90: str xzr, [x0,#16] 0x0000007f85bbfa94: dmb ishst // first dmb to remove .... 0x0000007fa01d83c0: ldrsb w10, [x20,#20] 0x0000007fa01d83c4: ldr w12, [x20,#16] 0x0000007fa01d83c8: ldr x11, [sp,#8] 0x0000007fa01d83cc: strb w10, [x11,#20] 0x0000007fa01d83d0: str w12, [x11,#16] 0x0000007fa01d83d4: dmb ish // second dmb Patch targets this pattern and remove redundant memory barrier for allocation node. 1. When inserting memory barrier for final field write. If final fields' object allocation node is available, invoke AllocationNode::compute_MemBar_redundancy(initializer method). 2. In AllocationNode: 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate if memory barrier after allocation node is redundant. 2.2 Add method compute_MemBar_redundancy, set _is_allocation_MemBar_redundant true if first parameter "this" does not escape in initializer method according to BCEscapeAnalyzer. 3. skip inserting memory barrier in PhaseMacroExpand::expand_allocate_common, when AllocationNode's _is_allocation_MemBar_redundant flag is true. Regards Hui -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrij.pochepko at oracle.com Thu Dec 10 15:08:11 2015 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 10 Dec 2015 18:08:11 +0300 Subject: RFR: 8141351 - Create tests for direct invoke instructions testing Message-ID: <5669955B.8060703@oracle.com> Hi all, please review a patch for JDK-8141351 - Create tests for direct invoke instructions testing There was no separate jtreg tests for invokevirtual, invokespecial, invokestatic, invokeinterface and invokedynamic instructions before, so, a tests to check it with combinations of compiled, interpreted and native code were created. There are 8 common classes and native part in test/compiler/calls/common which contains all generic logic. Other files are just test descriptions. Every test here basically consists of caller method and callee method, which is called using tested instruction. So, test descriptions runs different combinations of caller and callee being compiled/interpreted/native. I've tested these tests on several platforms(linux/macos/solaris). CR: https://bugs.openjdk.java.net/browse/JDK-8141351 A webrev: http://cr.openjdk.java.net/~dpochepk/8141351/webrev.01/ Thanks, Dmitrij From martin.doerr at sap.com Thu Dec 10 15:06:58 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 10 Dec 2015 15:06:58 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> Hi, this new webrev applies to hs-rt: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ It only touches PPC64 files. I have made the changes requested by Thomas. I only had to remove a minor interpreter variable name change. The remainder fits to hs-rt. Please have a look. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Mittwoch, 9. Dezember 2015 08:28 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, You could split the os kernel detection from the RTM change and submit the former to hs-rt now. Kind regards, Thomas On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: Hi Thomas, thanks for the hint. There are changes in hs-comp and hs-rt which would cause trouble with my change at the moment. I?ll wait until they get merged and create a new webrev which hopefully applies to both repositories. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Dienstag, 8. Dezember 2015 09:22 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Thu Dec 10 15:37:09 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 10 Dec 2015 16:37:09 +0100 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> Message-ID: Hi Martin, thanks for working in my requests! here some more nits: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/cpu/ppc/vm/vm_version_ppc.cpp.udiff.html + if (os::Aix::os_version() >= 0x0701031e) { // at least AIX 7.1.3.30 Do we support RTM on PASE? if not, do we need to exclude it? http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html + _os_version = (major << 24) | (minor << 16); + char ver_str[20] = {0}; + char *name_str = "unknown OS"; please make name_str a const char*; - if (_os_version < 0x0503) { + // Determine detailed AIX version: Version, Release, Modification, Fix Level. + odmWrapper::determine_os_kernel_version(&_os_version); + if (os_version_short() < 0x0503) { Lets do the kernel version number query only if needed, so inside the "if (os_version_short() < 0x0503) {" http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html - // -1 = uninitialized, otherwise 16 bit number: + // 0 = uninitialized, otherwise 16 bit number: // lower 8 bit - minor version // higher 8 bit - major version // For AIX, e.g. 0x0601 for AIX 6.1 // for OS/400 e.g. 0x0504 for OS/400 V5R4 - static int _os_version; - - // 4 Byte kernel version: Version, Release, Tech Level, Service Pack. - static unsigned int _os_kernel_version; + static uint32_t _os_version; comment needs to be adapted. --------- I did not look closely at the RTM-related changes, just at the AIX side, so this is not a full review. Kind Regards, Thomas On Thu, Dec 10, 2015 at 4:06 PM, Doerr, Martin wrote: > Hi, > > > > this new webrev applies to hs-rt: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ > > > > It only touches PPC64 files. > > > > I have made the changes requested by Thomas. > > I only had to remove a minor interpreter variable name change. The > remainder fits to hs-rt. > > > > Please have a look. > > > > Best regards, > > Martin > > > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Mittwoch, 9. Dezember 2015 08:28 > > *To:* Doerr, Martin > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > You could split the os kernel detection from the RTM change and submit the > former to hs-rt now. > > > > Kind regards, Thomas > > > > On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: > > Hi Thomas, > > > > thanks for the hint. There are changes in hs-comp and hs-rt which would > cause trouble with my change at the moment. I?ll wait until they get merged > and create a new webrev which hopefully applies to both repositories. > > > > Best regards, > > Martin > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Dienstag, 8. Dezember 2015 09:22 > *To:* Doerr, Martin > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > thanks for this addition :) > > > > It may make a lot of sense to rebase this change to hs-rt, because > os_aix.cpp is quite different there after > http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise > we may have problems later applying your change atop of my change. > > > > ------------- > > > > About the AIX kernel version recognition: I know we talked about this, but > I have second thoughts now. I guess I did not think it really through > before, sorry. > > So, now I have a change request: > > > > Instead of introducing os::Aix::os_kernel_version > (version,release,techlevel,sp) beside the already existing > os::Aix::os_version (version,release) I would prefer just one parameter, > os_version, end enriching this by techlevel and sp. So, exactly what you > did for os_kernel_version. > > > > Basically, as a prototype: > > > > // -1 = uninitialized, otherwise 32 bit number: > > // 0xVVRRTTSS > > // VV - major version > > // RR - minor version > > // TT - tech level, if known, 0 otherwise > > // SS - service pack, if known, 0 otherwise > > static uint32_t os_version (); > > > > Then please change the few users of os::Aix::os_version() to now expect a > 32bit unsigned number. As far as I see there are only 3 callsites. > > > > ------------------- > > > > Other small nitpicks: > > > > - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) > tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all > those tracecalls to Unified logging in the near future and this would help > me finding all trace occurrences. > > > > - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from > libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes > dlfcn.h and stdlib.h from libodm_aix.hpp. > > > > - I probably would change "static unsigned int > determine_os_kernel_version(int major_aix_version, int minor_aix_version);" > to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", > but that is just a matter of taste. > > > > Kind Regards, Thomas > > > > > > On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: > > Hi, > > > > I have created a webrev for further PPC64 updates: > > AIX supports Transactional Memory with a certain kernel patch level. Add a > detection for it and make UseRTMLocking usable on AIX. > In addition, implement Atomic::cmpxchg for jbyte. > > > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Dec 10 16:30:29 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 10 Dec 2015 16:30:29 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672287514@DEWDFEMB19C.global.corp.sap> Hi Thomas, thanks for the review. Current as400 implementations configure the processor in a way that Transactional Memory can?t be used (which is already detected). But, it is safer to check if we?re really on AIX before we activate Transactional Memory. I made the changes in this new webrev: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.02/ Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Donnerstag, 10. Dezember 2015 16:37 To: Doerr, Martin Cc: Lindenmaier, Goetz ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for working in my requests! here some more nits: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/cpu/ppc/vm/vm_version_ppc.cpp.udiff.html + if (os::Aix::os_version() >= 0x0701031e) { // at least AIX 7.1.3.30 Do we support RTM on PASE? if not, do we need to exclude it? http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html + _os_version = (major << 24) | (minor << 16); + char ver_str[20] = {0}; + char *name_str = "unknown OS"; please make name_str a const char*; - if (_os_version < 0x0503) { + // Determine detailed AIX version: Version, Release, Modification, Fix Level. + odmWrapper::determine_os_kernel_version(&_os_version); + if (os_version_short() < 0x0503) { Lets do the kernel version number query only if needed, so inside the "if (os_version_short() < 0x0503) {" http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html - // -1 = uninitialized, otherwise 16 bit number: + // 0 = uninitialized, otherwise 16 bit number: // lower 8 bit - minor version // higher 8 bit - major version // For AIX, e.g. 0x0601 for AIX 6.1 // for OS/400 e.g. 0x0504 for OS/400 V5R4 - static int _os_version; - - // 4 Byte kernel version: Version, Release, Tech Level, Service Pack. - static unsigned int _os_kernel_version; + static uint32_t _os_version; comment needs to be adapted. --------- I did not look closely at the RTM-related changes, just at the AIX side, so this is not a full review. Kind Regards, Thomas On Thu, Dec 10, 2015 at 4:06 PM, Doerr, Martin > wrote: Hi, this new webrev applies to hs-rt: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ It only touches PPC64 files. I have made the changes requested by Thomas. I only had to remove a minor interpreter variable name change. The remainder fits to hs-rt. Please have a look. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Mittwoch, 9. Dezember 2015 08:28 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, You could split the os kernel detection from the RTM change and submit the former to hs-rt now. Kind regards, Thomas On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: Hi Thomas, thanks for the hint. There are changes in hs-comp and hs-rt which would cause trouble with my change at the moment. I?ll wait until they get merged and create a new webrev which hopefully applies to both repositories. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Dienstag, 8. Dezember 2015 09:22 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Thu Dec 10 16:52:12 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 10 Dec 2015 17:52:12 +0100 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672287514@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672287514@DEWDFEMB19C.global.corp.sap> Message-ID: Looks fine. Thanks, Thomas On Thu, Dec 10, 2015 at 5:30 PM, Doerr, Martin wrote: > Hi Thomas, > > > > thanks for the review. > > > > Current as400 implementations configure the processor in a way that > Transactional Memory can?t be used (which is already detected). > > But, it is safer to check if we?re really on AIX before we activate > Transactional Memory. > > > > I made the changes in this new webrev: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.02/ > > > > Best regards, > > Martin > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Donnerstag, 10. Dezember 2015 16:37 > *To:* Doerr, Martin > *Cc:* Lindenmaier, Goetz ; > hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > thanks for working in my requests! > > > > here some more nits: > > > > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/cpu/ppc/vm/vm_version_ppc.cpp.udiff.html > > > > + if (os::Aix::os_version() >= 0x0701031e) { // at least AIX 7.1.3.30 > > > > Do we support RTM on PASE? if not, do we need to exclude it? > > > > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html > > > > + _os_version = (major << 24) | (minor << 16); > > + char ver_str[20] = {0}; > > + char *name_str = "unknown OS"; > > > > please make name_str a const char*; > > > > - if (_os_version < 0x0503) { > > + // Determine detailed AIX version: Version, Release, Modification, > Fix Level. > > + odmWrapper::determine_os_kernel_version(&_os_version); > > + if (os_version_short() < 0x0503) { > > > > Lets do the kernel version number query only if needed, so inside the "if > (os_version_short() < 0x0503) {" > > > > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html > > > > - // -1 = uninitialized, otherwise 16 bit number: > > + // 0 = uninitialized, otherwise 16 bit number: > > // lower 8 bit - minor version > > // higher 8 bit - major version > > // For AIX, e.g. 0x0601 for AIX 6.1 > > // for OS/400 e.g. 0x0504 for OS/400 V5R4 > > - static int _os_version; > > - > > - // 4 Byte kernel version: Version, Release, Tech Level, Service Pack. > > - static unsigned int _os_kernel_version; > > + static uint32_t _os_version; > > > > comment needs to be adapted. > > > > --------- > > > > I did not look closely at the RTM-related changes, just at the AIX side, > so this is not a full review. > > > > Kind Regards, Thomas > > > > On Thu, Dec 10, 2015 at 4:06 PM, Doerr, Martin > wrote: > > Hi, > > > > this new webrev applies to hs-rt: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ > > > > It only touches PPC64 files. > > > > I have made the changes requested by Thomas. > > I only had to remove a minor interpreter variable name change. The > remainder fits to hs-rt. > > > > Please have a look. > > > > Best regards, > > Martin > > > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Mittwoch, 9. Dezember 2015 08:28 > > > *To:* Doerr, Martin > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > You could split the os kernel detection from the RTM change and submit the > former to hs-rt now. > > > > Kind regards, Thomas > > > > On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: > > Hi Thomas, > > > > thanks for the hint. There are changes in hs-comp and hs-rt which would > cause trouble with my change at the moment. I?ll wait until they get merged > and create a new webrev which hopefully applies to both repositories. > > > > Best regards, > > Martin > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Dienstag, 8. Dezember 2015 09:22 > *To:* Doerr, Martin > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(M): 8144847: PPC64: Update Transactional Memory and > Atomic::cmpxchg code > > > > Hi Martin, > > > > thanks for this addition :) > > > > It may make a lot of sense to rebase this change to hs-rt, because > os_aix.cpp is quite different there after > http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise > we may have problems later applying your change atop of my change. > > > > ------------- > > > > About the AIX kernel version recognition: I know we talked about this, but > I have second thoughts now. I guess I did not think it really through > before, sorry. > > So, now I have a change request: > > > > Instead of introducing os::Aix::os_kernel_version > (version,release,techlevel,sp) beside the already existing > os::Aix::os_version (version,release) I would prefer just one parameter, > os_version, end enriching this by techlevel and sp. So, exactly what you > did for os_kernel_version. > > > > Basically, as a prototype: > > > > // -1 = uninitialized, otherwise 32 bit number: > > // 0xVVRRTTSS > > // VV - major version > > // RR - minor version > > // TT - tech level, if known, 0 otherwise > > // SS - service pack, if known, 0 otherwise > > static uint32_t os_version (); > > > > Then please change the few users of os::Aix::os_version() to now expect a > 32bit unsigned number. As far as I see there are only 3 callsites. > > > > ------------------- > > > > Other small nitpicks: > > > > - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) > tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all > those tracecalls to Unified logging in the near future and this would help > me finding all trace occurrences. > > > > - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from > libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes > dlfcn.h and stdlib.h from libodm_aix.hpp. > > > > - I probably would change "static unsigned int > determine_os_kernel_version(int major_aix_version, int minor_aix_version);" > to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", > but that is just a matter of taste. > > > > Kind Regards, Thomas > > > > > > On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: > > Hi, > > > > I have created a webrev for further PPC64 updates: > > AIX supports Transactional Memory with a certain kernel patch level. Add a > detection for it and make UseRTMLocking usable on AIX. > In addition, implement Atomic::cmpxchg for jbyte. > > > > The webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Dec 10 19:43:38 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Dec 2015 22:43:38 +0300 Subject: [9] RFR (XS): 8145137: Incorrent call signature can be used in nmethod::preserve_callee_argument_oops Message-ID: <5669D5EA.8030608@oracle.com> http://cr.openjdk.java.net/~vlivanov/8145137/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8145137 It's a followup on JDK-8072008 [1]. I missed one place where VM consults bytecode, but it is inaccurate when there's an attached method in generated code. Consider MH.linkTo*(..., MemberName) case. When C2 inlines through the linker, but doesn't inline callee, it issues a virtual/direct call and attaches a method to the call site. There's no MemberName oop pushed on stack, it is present only in scopes for deoptimization purposes. But GC uses method signature from bytecode, so it tries to extract MemberName from the stack and usually gets some garbage. The crash happens when GC tries to make sense out of it. The fix is to use attached method signature when it is present. Also, adjusted the unit test to deoptimize everyting. WB::deoptimize() doesn't work as expected since it triggers deoptimization only in f1/f2/... nmethods (marked w/ @DontInline) and not linkTo* where call sites are. Testing: JPRT, failing test case with focused changes to trigger problematic case (GC during call site resolution - SharedRuntime::resolve_*_C) Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8072008 "Emit direct call instead of linkTo* for recursive indy/MH.invoke* calls" From christian.thalinger at oracle.com Thu Dec 10 21:03:26 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 10 Dec 2015 11:03:26 -1000 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: <5668B99B.5030103@oracle.com> References: <5668B99B.5030103@oracle.com> Message-ID: > On Dec 9, 2015, at 1:30 PM, Dean Long wrote: > > The new System.identityHashCode optimization can't be turned off on the command-line, > because InlineObjectHash only applies to Object.hashCode. Does it matter? All these command line flags are a pain since they are all product flags. Doesn?t Compiler Control provide a way to disable an intrinsic? Actually, InlineObjectHash is a develop flag so you can?t turn it off in a release build: src/share/vm/runtime/globals.hpp 806: develop(bool, InlineObjectHash, true, \ > > dl > > On 12/9/2015 1:12 AM, Rahul Raghavan wrote: >> Hello, >> >> Please review the following patch for JDK-6378256. >> >> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). >> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >> >> sample unit test: >> public class Jdk6378256Test >> { >> public static void main(String[] args) >> { >> Object obj = new Object(); >> long time = System.nanoTime(); >> for(int i = 0 ; i < 1000000 ; i++) >> System.identityHashCode(obj); //compare to obj.hashCode(); >> System.out.println ("Result = " + (System.nanoTime() - time)); >> } >> } >> >> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >> (looks in the header for the hashCode before calling into the VM). >> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >> So also added required additional null check for System.identityHashCode case. >> >> Testing: >> - successful JPRT run (-testset hotspot). >> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >> (with -client / -XX:TieredStopAtLevel=1 etc. options). >> - Added 'noreg-perf' label for this performance bug. >> Manual testing done and confirmed expected performance values for unit tests with fix. >> >> Thanks, >> Rahul > From vladimir.x.ivanov at oracle.com Thu Dec 10 23:38:37 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Dec 2015 02:38:37 +0300 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: <5668B99B.5030103@oracle.com> Message-ID: <566A0CFD.2050000@oracle.com> Also, doesn't -XX:DisableIntrinsic=_identityHashCode,_hashCode already solve the problem? diagnostic(ccstrlist, DisableIntrinsic, "", \ "do not expand intrinsics whose (internal) names appear here") \ Best regards, Vladimir Ivanov On 12/11/15 12:03 AM, Christian Thalinger wrote: > >> On Dec 9, 2015, at 1:30 PM, Dean Long wrote: >> >> The new System.identityHashCode optimization can't be turned off on the command-line, >> because InlineObjectHash only applies to Object.hashCode. Does it matter? > > All these command line flags are a pain since they are all product flags. Doesn?t Compiler Control provide a way to disable an intrinsic? > > Actually, InlineObjectHash is a develop flag so you can?t turn it off in a release build: > > src/share/vm/runtime/globals.hpp > 806: develop(bool, InlineObjectHash, true, \ > >> >> dl >> >> On 12/9/2015 1:12 AM, Rahul Raghavan wrote: >>> Hello, >>> >>> Please review the following patch for JDK-6378256. >>> >>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). >>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >>> >>> sample unit test: >>> public class Jdk6378256Test >>> { >>> public static void main(String[] args) >>> { >>> Object obj = new Object(); >>> long time = System.nanoTime(); >>> for(int i = 0 ; i < 1000000 ; i++) >>> System.identityHashCode(obj); //compare to obj.hashCode(); >>> System.out.println ("Result = " + (System.nanoTime() - time)); >>> } >>> } >>> >>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >>> (looks in the header for the hashCode before calling into the VM). >>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >>> So also added required additional null check for System.identityHashCode case. >>> >>> Testing: >>> - successful JPRT run (-testset hotspot). >>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >>> (with -client / -XX:TieredStopAtLevel=1 etc. options). >>> - Added 'noreg-perf' label for this performance bug. >>> Manual testing done and confirmed expected performance values for unit tests with fix. >>> >>> Thanks, >>> Rahul >> > From jan.civlin at intel.com Tue Dec 8 01:50:04 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Tue, 8 Dec 2015 01:50:04 +0000 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <56659A17.6010300@oracle.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> <56659A17.6010300@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> Tobias, Thank you for spotting this. These comments were from the design and reflected the original order str1/str2. I'm removing them since the function calls say enough. The order should remain str2/str1 since the "result" is modified in the "str1" line. Vladimir, could you please upload the updated patch (I still do not have an access). Yes, the test has been run: [jcivlin at SKY71 test]$ date; echo $JAVA_HOME; ls -l $JAVA_HOME/lib/amd64/server/libjvm.so; time /home/jcivlin/Tools/jtreg/bin/jtreg compiler/intrinsics/string/TestStringIntrinsics.java Mon Dec 7 11:00:07 PST 2015 /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk -rwxrwxr-x 1 jcivlin jcivlin 17999532 Dec 2 22:13 /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so Test results: passed: 1 Report written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTreport/html/report.html Results written to /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTwork Thank you, Jan -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Monday, December 07, 2015 6:39 AM To: Civlin, Jan; hotspot compiler Cc: Vladimir Kozlov Subject: Re: RFR:8144771: AVX3 patch for MacroAssembler::string_compare Hi Jan, the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: - The following comments are wrong: 8355 } else { //ae == StrIntrinsicNode::UL 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). - Missing whitespace after comma: 8143 cmpl(cnt2,stride2x2); I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). Best, Tobias On 05.12.2015 05:07, Civlin, Jan wrote: > We would like to contribute AVX3 patch for MacroAssembler::string_compare. > > This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). > > > Contributors: > MacroAssembler::string_compare - Jan Civlin. > Rest of code, including all x86 AVX3 extensions - Michael Berg > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 > Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ > -------------- next part -------------- A non-text attachment was scrubbed... Name: oarle-clean.1.tar.bz2 Type: application/octet-stream Size: 1009516 bytes Desc: oarle-clean.1.tar.bz2 URL: From dean.long at oracle.com Fri Dec 11 03:36:10 2015 From: dean.long at oracle.com (Dean Long) Date: Thu, 10 Dec 2015 19:36:10 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod Message-ID: <566A44AA.1040101@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8144852 http://cr.openjdk.java.net/~dlong//8144852/webrev/ The fix for [1] introduced new functions nmethod::print_recorded_oops and nmethod::print_recorded_metadata that print all oop and metadata values in an nmethod. Currently NULL values are handled OK, but Universe::non_oop_word values cause a crash. (This bug is marked confidential because it was reported against one of our closed ports.) dl [1] JDK-8072008: Emit direct call instead of linkTo* for recursive indy/MH.invoke* calls From christian.thalinger at oracle.com Fri Dec 11 05:10:14 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 10 Dec 2015 19:10:14 -1000 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: <566A0CFD.2050000@oracle.com> References: <5668B99B.5030103@oracle.com> <566A0CFD.2050000@oracle.com> Message-ID: <4B2A25E2-C1B6-43A9-A4A6-10C6970DF651@oracle.com> > On Dec 10, 2015, at 1:38 PM, Vladimir Ivanov wrote: > > Also, doesn't -XX:DisableIntrinsic=_identityHashCode,_hashCode already solve the problem? > > diagnostic(ccstrlist, DisableIntrinsic, "", \ > "do not expand intrinsics whose (internal) names appear here") \ Even better! Let?s not add more flags. > > > Best regards, > Vladimir Ivanov > > On 12/11/15 12:03 AM, Christian Thalinger wrote: >> >>> On Dec 9, 2015, at 1:30 PM, Dean Long wrote: >>> >>> The new System.identityHashCode optimization can't be turned off on the command-line, >>> because InlineObjectHash only applies to Object.hashCode. Does it matter? >> >> All these command line flags are a pain since they are all product flags. Doesn?t Compiler Control provide a way to disable an intrinsic? >> >> Actually, InlineObjectHash is a develop flag so you can?t turn it off in a release build: >> >> src/share/vm/runtime/globals.hpp >> 806: develop(bool, InlineObjectHash, true, \ >> >>> >>> dl >>> >>> On 12/9/2015 1:12 AM, Rahul Raghavan wrote: >>>> Hello, >>>> >>>> Please review the following patch for JDK-6378256. >>>> >>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >>>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times slower). >>>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >>>> >>>> sample unit test: >>>> public class Jdk6378256Test >>>> { >>>> public static void main(String[] args) >>>> { >>>> Object obj = new Object(); >>>> long time = System.nanoTime(); >>>> for(int i = 0 ; i < 1000000 ; i++) >>>> System.identityHashCode(obj); //compare to obj.hashCode(); >>>> System.out.println ("Result = " + (System.nanoTime() - time)); >>>> } >>>> } >>>> >>>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >>>> (looks in the header for the hashCode before calling into the VM). >>>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >>>> So also added required additional null check for System.identityHashCode case. >>>> >>>> Testing: >>>> - successful JPRT run (-testset hotspot). >>>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >>>> (with -client / -XX:TieredStopAtLevel=1 etc. options). >>>> - Added 'noreg-perf' label for this performance bug. >>>> Manual testing done and confirmed expected performance values for unit tests with fix. >>>> >>>> Thanks, >>>> Rahul >>> >> From roland.westrelin at oracle.com Fri Dec 11 08:23:26 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 11 Dec 2015 09:23:26 +0100 Subject: [9] RFR (XS): 8145137: Incorrent call signature can be used in nmethod::preserve_callee_argument_oops In-Reply-To: <5669D5EA.8030608@oracle.com> References: <5669D5EA.8030608@oracle.com> Message-ID: <793877C9-1E29-456A-95E2-84D05F9F9E9A@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8145137/webrev.00/ That looks good to me. Roland. From goetz.lindenmaier at sap.com Fri Dec 11 09:36:10 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 11 Dec 2015 09:36:10 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDDE3B@DEWDFEMB12A.global.corp.sap> Hi Martin, thanks for doing this, looks good. I?ll sponsor it. Best regards, Goetz. From: Doerr, Martin Sent: Donnerstag, 10. Dezember 2015 16:07 To: Thomas St?fe ; Lindenmaier, Goetz Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi, this new webrev applies to hs-rt: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ It only touches PPC64 files. I have made the changes requested by Thomas. I only had to remove a minor interpreter variable name change. The remainder fits to hs-rt. Please have a look. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Mittwoch, 9. Dezember 2015 08:28 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, You could split the os kernel detection from the RTM change and submit the former to hs-rt now. Kind regards, Thomas On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: Hi Thomas, thanks for the hint. There are changes in hs-comp and hs-rt which would cause trouble with my change at the moment. I?ll wait until they get merged and create a new webrev which hopefully applies to both repositories. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Dienstag, 8. Dezember 2015 09:22 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Fri Dec 11 09:36:47 2015 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 11 Dec 2015 01:36:47 -0800 (PST) Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: <56680E94.40306@oracle.com> References: <56680E94.40306@oracle.com> Message-ID: <662bf26a-e9dc-4175-8960-c9369a23ad77@default> > -----Original Message----- > From: Tobias Hartmann > Sent: Wednesday, December 09, 2015 4:51 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler > > Hi Rahul, > > looks good to me (not a reviewer). Thank you Tobias. > > Best, > Tobias > > On 09.12.2015 10:12, Rahul Raghavan wrote: > > Hello, > > > > Please review the following patch for JDK-6378256. > > > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > > Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times > slower). > > Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > > > > sample unit test: > > public class Jdk6378256Test > > { > > public static void main(String[] args) > > { > > Object obj = new Object(); > > long time = System.nanoTime(); > > for(int i = 0 ; i < 1000000 ; i++) > > System.identityHashCode(obj); //compare to obj.hashCode(); > > System.out.println ("Result = " + (System.nanoTime() - time)); > > } > > } > > > > Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > > (looks in the header for the hashCode before calling into the VM). > > Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > > So also added required additional null check for System.identityHashCode case. > > > > Testing: > > - successful JPRT run (-testset hotspot). > > - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > > (with -client / -XX:TieredStopAtLevel=1 etc. options). > > - Added 'noreg-perf' label for this performance bug. > > Manual testing done and confirmed expected performance values for unit tests with fix. > > > > Thanks, > > Rahul > > From paul.sandoz at oracle.com Fri Dec 11 09:36:58 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 11 Dec 2015 10:36:58 +0100 Subject: Reference.reachabilityFence In-Reply-To: <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> Message-ID: <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> Unless any strong objections are raised I plan to push the latest path on Monday. > > Updated: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html > > I think there is an opportunity to add further examples, but i would like to take a swing at that later on. > > >> - I now agree with you and Doug about calling this a "fence". Can we >> just name it "fence" rather than the wordier "reachabilityFence"? >> Looking at a typical invocation, >> >> Reference.reachabilityFence(); >> >> seems a bit redundant while >> >> Reference.fence(); >> >> reads quite nicely. Is there, or will there ever be, any other kind >> of reference-related fence? >> > > I doubt there will be another kind of reference fence, but it could be used in conjunction with other memory fences (currently on VarHandles) and if static imports are used it might look rather out of place as to what fence ?fence? actually refers to. That is why i prefer the longer more descriptive name. > > Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jamsheed.c.m at oracle.com Fri Dec 11 09:37:10 2015 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 11 Dec 2015 15:07:10 +0530 Subject: RFR: 8144856 (XS): Fix assert in CompiledStaticCall::set_to_interpreted Message-ID: <566A9946.6010101@oracle.com> Hi All, Summary: Assert code in CompiledStaticCall::set_to_interpreted is fixed. Previous implementation did multiple reads for multiple valid state check in same assert for the MT safe updates that can happen from other threads. This could cause bogus failures as after each read state could have changed. As a fix to this, made the code to read the value at once. Bug:https://bugs.openjdk.java.net/browse/JDK-8144856 webrev: http://cr.openjdk.java.net/~thartmann/8144856/webrev.00/ Testing: jprt hotspot. Thanks and Best Regards, Jamsheed From tobias.hartmann at oracle.com Fri Dec 11 10:18:21 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 11 Dec 2015 11:18:21 +0100 Subject: RFR: 8144856 (XS): Fix assert in CompiledStaticCall::set_to_interpreted In-Reply-To: <566A9946.6010101@oracle.com> References: <566A9946.6010101@oracle.com> Message-ID: <566AA2ED.7000201@oracle.com> Hi Jamsheed, looks good to me. Best, Tobias On 11.12.2015 10:37, Jamsheed C m wrote: > Hi All, > > > Summary: Assert code in CompiledStaticCall::set_to_interpreted is fixed. Previous implementation did multiple reads for multiple valid state check in same assert for the MT safe updates that can happen from other threads. This could cause bogus failures as after each read state could have changed. As a fix to this, made the code to read the value at once. > > Bug:https://bugs.openjdk.java.net/browse/JDK-8144856 > > webrev: http://cr.openjdk.java.net/~thartmann/8144856/webrev.00/ > > Testing: jprt hotspot. > > Thanks and Best Regards, > Jamsheed > > From martin.doerr at sap.com Fri Dec 11 10:20:14 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 11 Dec 2015 10:20:14 +0000 Subject: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDDE3B@DEWDFEMB12A.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672286C26@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB4181165672286EB4@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB41811656722874B8@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EDDE3B@DEWDFEMB12A.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB41811656722886FF@DEWDFEMB19C.global.corp.sap> Thanks for reviewing and sponsoring. Best regards, Martin From: Lindenmaier, Goetz Sent: Freitag, 11. Dezember 2015 10:36 To: Doerr, Martin ; Thomas St?fe Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for doing this, looks good. I?ll sponsor it. Best regards, Goetz. From: Doerr, Martin Sent: Donnerstag, 10. Dezember 2015 16:07 To: Thomas St?fe >; Lindenmaier, Goetz > Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi, this new webrev applies to hs-rt: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.01/ It only touches PPC64 files. I have made the changes requested by Thomas. I only had to remove a minor interpreter variable name change. The remainder fits to hs-rt. Please have a look. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Mittwoch, 9. Dezember 2015 08:28 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, You could split the os kernel detection from the RTM change and submit the former to hs-rt now. Kind regards, Thomas On Tue, Dec 8, 2015 at 3:08 PM, Doerr, Martin > wrote: Hi Thomas, thanks for the hint. There are changes in hs-comp and hs-rt which would cause trouble with my change at the moment. I?ll wait until they get merged and create a new webrev which hopefully applies to both repositories. Best regards, Martin From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Dienstag, 8. Dezember 2015 09:22 To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8144847: PPC64: Update Transactional Memory and Atomic::cmpxchg code Hi Martin, thanks for this addition :) It may make a lot of sense to rebase this change to hs-rt, because os_aix.cpp is quite different there after http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/ce87b1141c12. Otherwise we may have problems later applying your change atop of my change. ------------- About the AIX kernel version recognition: I know we talked about this, but I have second thoughts now. I guess I did not think it really through before, sorry. So, now I have a change request: Instead of introducing os::Aix::os_kernel_version (version,release,techlevel,sp) beside the already existing os::Aix::os_version (version,release) I would prefer just one parameter, os_version, end enriching this by techlevel and sp. So, exactly what you did for os_kernel_version. Basically, as a prototype: // -1 = uninitialized, otherwise 32 bit number: // 0xVVRRTTSS // VV - major version // RR - minor version // TT - tech level, if known, 0 otherwise // SS - service pack, if known, 0 otherwise static uint32_t os_version (); Then please change the few users of os::Aix::os_version() to now expect a 32bit unsigned number. As far as I see there are only 3 callsites. ------------------- Other small nitpicks: - in libodm_aix.cpp, please use trcVerbose() instead of if (Verbose) tty->.. . Please include misc_aix.hpp for trcVerbose(). We will change all those tracecalls to Unified logging in the near future and this would help me finding all trace occurrences. - please move ~dynamicOdm() and odmWrapper::clean_wrapper() from libodm_aix.hpp to libodm_aix.cpp and accordingly remove the includes dlfcn.h and stdlib.h from libodm_aix.hpp. - I probably would change "static unsigned int determine_os_kernel_version(int major_aix_version, int minor_aix_version);" to " "static bool fill_in_os_kernel_version(unsigned int* p_os_version);", but that is just a matter of taste. Kind Regards, Thomas On Mon, Dec 7, 2015 at 6:10 PM, Doerr, Martin > wrote: Hi, I have created a webrev for further PPC64 updates: AIX supports Transactional Memory with a certain kernel patch level. Add a detection for it and make UseRTMLocking usable on AIX. In addition, implement Atomic::cmpxchg for jbyte. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8144847_ppc_updates/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Fri Dec 11 11:37:03 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Dec 2015 14:37:03 +0300 Subject: [9] RFR (XS): 8145137: Incorrent call signature can be used in nmethod::preserve_callee_argument_oops In-Reply-To: <793877C9-1E29-456A-95E2-84D05F9F9E9A@oracle.com> References: <5669D5EA.8030608@oracle.com> <793877C9-1E29-456A-95E2-84D05F9F9E9A@oracle.com> Message-ID: <566AB55F.506@oracle.com> Thanks, Roland. Best regards, Vladimir Ivanov On 12/11/15 11:23 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8145137/webrev.00/ > > That looks good to me. > > Roland. > From vladimir.x.ivanov at oracle.com Fri Dec 11 11:49:32 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Dec 2015 14:49:32 +0300 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <566A44AA.1040101@oracle.com> References: <566A44AA.1040101@oracle.com> Message-ID: <566AB84C.1000603@oracle.com> Dean, thanks for taking care of it. Can oopDesc::print_value_on and print_value_on_maybe_null be enhanced instead to handle non_oop_word case (in addition to NULL case)? Also, the following is slightly misleading since metadata pointers aren't oops: void nmethod::print_recorded_metadata() { + if (m == (Metadata*)Universe::non_oop_word()) { + tty->print("non-oop word"); Best regards, Vladimir Ivanov On 12/11/15 6:36 AM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8144852 > http://cr.openjdk.java.net/~dlong//8144852/webrev/ > > The fix for [1] introduced new functions nmethod::print_recorded_oops > and nmethod::print_recorded_metadata that print all oop and metadata > values in an nmethod. Currently NULL values are handled OK, but > Universe::non_oop_word values cause a crash. > > (This bug is marked confidential because it was reported against one of > our closed ports.) > > dl > > [1] JDK-8072008: Emit direct call instead of linkTo* for recursive > indy/MH.invoke* calls From vitalyd at gmail.com Fri Dec 11 15:52:58 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 11 Dec 2015 10:52:58 -0500 Subject: Reference.reachabilityFence In-Reply-To: <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> Message-ID: Hi Paul, No objections, but just wanted to summarize a couple of possible key performance issues that were raised on the concurrency-interest thread. You may have picked them up already, so pardon the repetition: 1) current impl/prototype is purposely barred from inlining - this will be a compiler optimization fence, particularly bad in loops 2) the expected "try { ... use(r) ... } finally { reachabilityFence(r); }" idiom will significantly increase bytecode size, possibly impacting inlining. I'm sure you guys will address this in the end, but just wanted to reiterate those just in case :). Thanks sent from my phone On Dec 11, 2015 4:37 AM, "Paul Sandoz" wrote: > Unless any strong objections are raised I plan to push the latest path on > Monday. > > > > > Updated: > > > > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html > < > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html > > > > > > I think there is an opportunity to add further examples, but i would > like to take a swing at that later on. > > > > > >> - I now agree with you and Doug about calling this a "fence". Can we > >> just name it "fence" rather than the wordier "reachabilityFence"? > >> Looking at a typical invocation, > >> > >> Reference.reachabilityFence(); > >> > >> seems a bit redundant while > >> > >> Reference.fence(); > >> > >> reads quite nicely. Is there, or will there ever be, any other kind > >> of reference-related fence? > >> > > > > I doubt there will be another kind of reference fence, but it could be > used in conjunction with other memory fences (currently on VarHandles) and > if static imports are used it might look rather out of place as to what > fence ?fence? actually refers to. That is why i prefer the longer more > descriptive name. > > > > Paul. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Dec 11 15:59:42 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 11 Dec 2015 10:59:42 -0500 Subject: Reference.reachabilityFence In-Reply-To: References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> Message-ID: Sorry, one more point I forgot to mention: 3) what impact will this have, if any, on register allocation when a ref's lifetime is artificially extended without any "real" use. The thinking here is compiler should spill it and never reload, but it was unclear if it will do the right thing in its current form. sent from my phone On Dec 11, 2015 10:52 AM, "Vitaly Davidovich" wrote: > Hi Paul, > > No objections, but just wanted to summarize a couple of possible key > performance issues that were raised on the concurrency-interest thread. > You may have picked them up already, so pardon the repetition: > > 1) current impl/prototype is purposely barred from inlining - this will be > a compiler optimization fence, particularly bad in loops > > 2) the expected "try { ... use(r) ... } finally { reachabilityFence(r); > }" idiom will significantly increase bytecode size, possibly impacting > inlining. > > I'm sure you guys will address this in the end, but just wanted to > reiterate those just in case :). > > Thanks > > sent from my phone > On Dec 11, 2015 4:37 AM, "Paul Sandoz" wrote: > >> Unless any strong objections are raised I plan to push the latest path on >> Monday. >> >> > >> > Updated: >> > >> > >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html >> < >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html >> > >> > >> > I think there is an opportunity to add further examples, but i would >> like to take a swing at that later on. >> > >> > >> >> - I now agree with you and Doug about calling this a "fence". Can we >> >> just name it "fence" rather than the wordier "reachabilityFence"? >> >> Looking at a typical invocation, >> >> >> >> Reference.reachabilityFence(); >> >> >> >> seems a bit redundant while >> >> >> >> Reference.fence(); >> >> >> >> reads quite nicely. Is there, or will there ever be, any other kind >> >> of reference-related fence? >> >> >> > >> > I doubt there will be another kind of reference fence, but it could be >> used in conjunction with other memory fences (currently on VarHandles) and >> if static imports are used it might look rather out of place as to what >> fence ?fence? actually refers to. That is why i prefer the longer more >> descriptive name. >> > >> > Paul. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Fri Dec 11 19:18:03 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 11 Dec 2015 11:18:03 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <566AB84C.1000603@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> Message-ID: <566B216B.1020204@oracle.com> [adding hotspot-runtime-dev] On 12/11/2015 3:49 AM, Vladimir Ivanov wrote: > Dean, thanks for taking care of it. > > Can oopDesc::print_value_on and print_value_on_maybe_null be enhanced > instead to handle non_oop_word case (in addition to NULL case)? > I thought of that, but didn't want to add print_value_on_maybe_null_or_non_oop :-) If you feel strongly about that, then I should probably get input from runtime too, since I think they own that code. > Also, the following is slightly misleading since metadata pointers > aren't oops: > void nmethod::print_recorded_metadata() { > + if (m == (Metadata*)Universe::non_oop_word()) { > + tty->print("non-oop word"); > Would "non-metadata word" be better? dl > Best regards, > Vladimir Ivanov > > On 12/11/15 6:36 AM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8144852 >> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >> >> The fix for [1] introduced new functions nmethod::print_recorded_oops >> and nmethod::print_recorded_metadata that print all oop and metadata >> values in an nmethod. Currently NULL values are handled OK, but >> Universe::non_oop_word values cause a crash. >> >> (This bug is marked confidential because it was reported against one of >> our closed ports.) >> >> dl >> >> [1] JDK-8072008: Emit direct call instead of linkTo* for recursive >> indy/MH.invoke* calls From dean.long at oracle.com Fri Dec 11 19:38:31 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 11 Dec 2015 11:38:31 -0800 Subject: RFR: 8144856 (XS): Fix assert in CompiledStaticCall::set_to_interpreted In-Reply-To: <566A9946.6010101@oracle.com> References: <566A9946.6010101@oracle.com> Message-ID: <566B2637.1070405@oracle.com> I'm wondering if this fix is enough, because 8067247 reported the same problem, and it is still happening, even after the fix on x86 to read the data value only once. dl On 12/11/2015 1:37 AM, Jamsheed C m wrote: > Hi All, > > > Summary: Assert code in CompiledStaticCall::set_to_interpreted is > fixed. Previous implementation did multiple reads for multiple valid > state check in same assert for the MT safe updates that can happen > from other threads. This could cause bogus failures as after each read > state could have changed. As a fix to this, made the code to read the > value at once. > > Bug:https://bugs.openjdk.java.net/browse/JDK-8144856 > > webrev: http://cr.openjdk.java.net/~thartmann/8144856/webrev.00/ > > Testing: jprt hotspot. > > Thanks and Best Regards, > Jamsheed > > From peter.levart at gmail.com Sat Dec 12 11:27:22 2015 From: peter.levart at gmail.com (Peter Levart) Date: Sat, 12 Dec 2015 12:27:22 +0100 Subject: Reference.reachabilityFence In-Reply-To: <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> Message-ID: <566C049A.90406@gmail.com> Hi Paul, Your latest code does not build with jdk9/dev as it uses @jdk.internal.vm.annotation.DontInline, but in jdk9/dev the @DontInline is still in java.lang.invoke. Is there a plan to push the move of DontInline annotation before this change as I haven's seen any RFR for the move yet? Regards, Peter On 12/11/2015 10:36 AM, Paul Sandoz wrote: > Unless any strong objections are raised I plan to push the latest path on Monday. > >> Updated: >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html >> >> I think there is an opportunity to add further examples, but i would like to take a swing at that later on. >> >> >>> - I now agree with you and Doug about calling this a "fence". Can we >>> just name it "fence" rather than the wordier "reachabilityFence"? >>> Looking at a typical invocation, >>> >>> Reference.reachabilityFence(); >>> >>> seems a bit redundant while >>> >>> Reference.fence(); >>> >>> reads quite nicely. Is there, or will there ever be, any other kind >>> of reference-related fence? >>> >> I doubt there will be another kind of reference fence, but it could be used in conjunction with other memory fences (currently on VarHandles) and if static imports are used it might look rather out of place as to what fence ?fence? actually refers to. That is why i prefer the longer more descriptive name. >> >> Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.hegarty at oracle.com Sat Dec 12 11:42:35 2015 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Sat, 12 Dec 2015 11:42:35 +0000 Subject: Reference.reachabilityFence In-Reply-To: <566C049A.90406@gmail.com> References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> <566C049A.90406@gmail.com> Message-ID: > On 12 Dec 2015, at 11:27 a.m., Peter Levart wrote: > > Hi Paul, > > Your latest code does not build with jdk9/dev as it uses @jdk.internal.vm.annotation.DontInline, but in jdk9/dev the @DontInline is still in java.lang.invoke. > > Is there a plan to push the move of DontInline annotation before this change as I haven's seen any RFR for the move yet? https://bugs.openjdk.java.net/browse/JDK-8144223 The change is making its way through hs-comp. -Chris. > Regards, Peter > >> On 12/11/2015 10:36 AM, Paul Sandoz wrote: >> Unless any strong objections are raised I plan to push the latest path on Monday. >> >>> Updated: >>> >>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html >>> >>> I think there is an opportunity to add further examples, but i would like to take a swing at that later on. >>> >>> >>>> - I now agree with you and Doug about calling this a "fence". Can we >>>> just name it "fence" rather than the wordier "reachabilityFence"? >>>> Looking at a typical invocation, >>>> >>>> Reference.reachabilityFence(); >>>> >>>> seems a bit redundant while >>>> >>>> Reference.fence(); >>>> >>>> reads quite nicely. Is there, or will there ever be, any other kind >>>> of reference-related fence? >>> I doubt there will be another kind of reference fence, but it could be used in conjunction with other memory fences (currently on VarHandles) and if static imports are used it might look rather out of place as to what fence ?fence? actually refers to. That is why i prefer the longer more descriptive name. >>> >>> Paul. > From peter.levart at gmail.com Sat Dec 12 11:44:49 2015 From: peter.levart at gmail.com (Peter Levart) Date: Sat, 12 Dec 2015 12:44:49 +0100 Subject: Reference.reachabilityFence In-Reply-To: References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> <566C049A.90406@gmail.com> Message-ID: <566C08B1.6090909@gmail.com> On 12/12/2015 12:42 PM, Chris Hegarty wrote: >> On 12 Dec 2015, at 11:27 a.m., Peter Levart wrote: >> >> Hi Paul, >> >> Your latest code does not build with jdk9/dev as it uses @jdk.internal.vm.annotation.DontInline, but in jdk9/dev the @DontInline is still in java.lang.invoke. >> >> Is there a plan to push the move of DontInline annotation before this change as I haven's seen any RFR for the move yet? > https://bugs.openjdk.java.net/browse/JDK-8144223 > > The change is making its way through hs-comp. > > -Chris. Thanks Chris. Regards, Peter >> Regards, Peter >> >>> On 12/11/2015 10:36 AM, Paul Sandoz wrote: >>> Unless any strong objections are raised I plan to push the latest path on Monday. >>> >>>> Updated: >>>> >>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8133348-reachability-fence-jdk/webrev/src/java.base/share/classes/java/lang/ref/Reference.java.sdiff.html >>>> >>>> I think there is an opportunity to add further examples, but i would like to take a swing at that later on. >>>> >>>> >>>>> - I now agree with you and Doug about calling this a "fence". Can we >>>>> just name it "fence" rather than the wordier "reachabilityFence"? >>>>> Looking at a typical invocation, >>>>> >>>>> Reference.reachabilityFence(); >>>>> >>>>> seems a bit redundant while >>>>> >>>>> Reference.fence(); >>>>> >>>>> reads quite nicely. Is there, or will there ever be, any other kind >>>>> of reference-related fence? >>>> I doubt there will be another kind of reference fence, but it could be used in conjunction with other memory fences (currently on VarHandles) and if static imports are used it might look rather out of place as to what fence ?fence? actually refers to. That is why i prefer the longer more descriptive name. >>>> >>>> Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Sun Dec 13 21:49:46 2015 From: doug.simon at oracle.com (Doug Simon) Date: Sun, 13 Dec 2015 22:49:46 +0100 Subject: RFR: 8145270: Need to eagerly initialize JVMCI compiler under -Xcomp Message-ID: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> In blocking compilation mode (i.e., -UseInterpreter), certain compilations are forced to be non-blocking if a JVMCI compiler is being used (i.e., +UseJVMCICompiler). This is to prevent deadlocks than can occur between an application thread and a JVMCI compiler thread. One condition for forcing a compilation to be non-blocking is if the JVMCI compiler not yet initialized. This is problematic for tests that attempt to force a method to be compiled by JVMCI (e.g., -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:-TieredCompilation -Xcomp -XX:CompileCommand=compileonly,Tester_*::*). If the test is small enough, JVMCI initialization (which is lazy) may still be executing when the test methods are scheduled to be compiled. The solution is to make JVMCI compiler initialization eager in blocking compilation mode. https://bugs.openjdk.java.net/browse/JDK-8145270 http://cr.openjdk.java.net/~dnsimon/8145270/ From rahul.v.raghavan at oracle.com Mon Dec 14 05:50:41 2015 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Sun, 13 Dec 2015 21:50:41 -0800 (PST) Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: <8544A13B-B408-4387-912F-C418202E1508@oracle.com> References: <8544A13B-B408-4387-912F-C418202E1508@oracle.com> Message-ID: <59e6ab1a-b422-42fd-8cfe-74b651ac9eda@default> > -----Original Message----- > From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler > > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would be > nice. > Shouldn?t we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? Thank you Roland for comments. Yes, I will check adding the optimization for sharedRuntime_x86_64. > > Roland. From goetz.lindenmaier at sap.com Mon Dec 14 09:28:13 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 14 Dec 2015 09:28:13 +0000 Subject: RFR(XS): 8145300: ppc64: fix port of "8072008: Emit direct call instead of linkTo* for recursive indy/MH.invoke* calls" Message-ID: <4295855A5C1DE049A61835A1887419CC41EDE68D@DEWDFEMB12A.global.corp.sap> Hi, First, thanks for porting 8072008 to ppc. Good it came with a test, that showed a small nit still was missing. Ppc utilizes postalloc expand where Call nodes are expanded. 8072007 introduced field _override_symbolic_info to Call nodes that needs to be preserved in postalloc expand. Please review this small change: http://cr.openjdk.java.net/~goetz/webrevs/8145300-jtregInvk/webrev.00/ Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Mon Dec 14 10:10:14 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 14 Dec 2015 11:10:14 +0100 Subject: RFR(XS): 8145300: ppc64: fix port of "8072008: Emit direct call instead of linkTo* for recursive indy/MH.invoke* calls" In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDE68D@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41EDE68D@DEWDFEMB12A.global.corp.sap> Message-ID: Hi Goetz, looks good! Thanks, Volker On Mon, Dec 14, 2015 at 10:28 AM, Lindenmaier, Goetz wrote: > Hi, > > > > First, thanks for porting 8072008 to ppc. Good it came with a test, > > that showed a small nit still was missing. > > Ppc utilizes postalloc expand where Call nodes are expanded. > 8072007 introduced field _override_symbolic_info to Call nodes that > > needs to be preserved in postalloc expand. > > > > Please review this small change: > > http://cr.openjdk.java.net/~goetz/webrevs/8145300-jtregInvk/webrev.00/ > > > > Best regards, > > Goetz. From goetz.lindenmaier at sap.com Mon Dec 14 10:20:05 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 14 Dec 2015 10:20:05 +0000 Subject: RFR(XS): 8145300: ppc64: fix port of "8072008: Emit direct call instead of linkTo* for recursive indy/MH.invoke* calls" In-Reply-To: References: <4295855A5C1DE049A61835A1887419CC41EDE68D@DEWDFEMB12A.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDE6E6@DEWDFEMB12A.global.corp.sap> Thanks Volker! Best regads, Goetz. > -----Original Message----- > From: Volker Simonis [mailto:volker.simonis at gmail.com] > Sent: Montag, 14. Dezember 2015 11:10 > To: Lindenmaier, Goetz > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XS): 8145300: ppc64: fix port of "8072008: Emit direct call > instead of linkTo* for recursive indy/MH.invoke* calls" > > Hi Goetz, > > looks good! > > Thanks, > Volker > > > On Mon, Dec 14, 2015 at 10:28 AM, Lindenmaier, Goetz > wrote: > > Hi, > > > > > > > > First, thanks for porting 8072008 to ppc. Good it came with a test, > > > > that showed a small nit still was missing. > > > > Ppc utilizes postalloc expand where Call nodes are expanded. > > 8072007 introduced field _override_symbolic_info to Call nodes that > > > > needs to be preserved in postalloc expand. > > > > > > > > Please review this small change: > > > > http://cr.openjdk.java.net/~goetz/webrevs/8145300- > jtregInvk/webrev.00/ > > > > > > > > Best regards, > > > > Goetz. From paul.sandoz at oracle.com Mon Dec 14 10:26:07 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 14 Dec 2015 11:26:07 +0100 Subject: Reference.reachabilityFence In-Reply-To: References: <2D27BCFB-77ED-4C83-985E-102DC4B41C97@oracle.com> <0CCC1C56-EDC9-47C4-B170-5A66A6C81495@oracle.com> <7B0271EB-A012-435F-95D2-4F9E64E20220@oracle.com> <20151207095825.952677@eggemoggin.niobe.net> <430729B7-AA2B-499A-8660-C0BBFFC69E5E@oracle.com> <2AE57802-9204-4E48-81E0-98E65D43F1E0@oracle.com> Message-ID: > On 11 Dec 2015, at 16:52, Vitaly Davidovich wrote: > > Hi Paul, > > No objections, but just wanted to summarize a couple of possible key performance issues that were raised on the concurrency-interest thread. You may have picked them up already, so pardon the repetition: > > Thanks, that?s a useful summary from a very long thread. I have added those (and that in your other email) as a comment on the current issue with links to the discussion: https://bugs.openjdk.java.net/browse/JDK-8133348?focusedCommentId=13877606&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13877606 Paul. > 1) current impl/prototype is purposely barred from inlining - this will be a compiler optimization fence, particularly bad in loops > > 2) the expected "try { ... use(r) ... } finally { reachabilityFence(r); }" idiom will significantly increase bytecode size, possibly impacting inlining. > > I'm sure you guys will address this in the end, but just wanted to reiterate those just in case :). > > Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From nils.eliasson at oracle.com Mon Dec 14 14:53:07 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 14 Dec 2015 15:53:07 +0100 Subject: RFR(S/M): 8144246: CompilerControl: adding lots of directives via jcmd may produce OOM crash Message-ID: <566ED7D3.1010200@oracle.com> Hi, Please review this minor change. It introduced a limit to how many directives can be added. The limit can be controlled by the diagnostic flag CompilerDirectivesLimit. For normal use it would be very unusual to have more than a few directives. The Flag PrintCompilerDirectives was changed to CompilerDirectivesPrint to have a consistent naming for all directives flag. This is a new flag and is not used anywhere yet. Testing: All the compiler control tests will have been run before submit. Bug: https://bugs.openjdk.java.net/browse/JDK-8144246 Webrev: http://cr.openjdk.java.net/~neliasso/8144246/webrev.01/ Regards, Nils From roland.westrelin at oracle.com Mon Dec 14 15:19:04 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 14 Dec 2015 16:19:04 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: <56663B29.7050508@oracle.com> References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> Message-ID: <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> Hi Vladimir, >>> I was confused by naming first and by placement of code. >>> And mixing code for casts and code in gcm. In reality the cast code tries to find only "immediate"/near dominating cast. So why (i >= 100) and not, lets say, >= 10? >> >> It?s arbitrary so if you think 100 is too much, sure we can go with 10. > > What is reasonable number, you think, based on tests you have? I think 100 is waste of time but 10 could be not enough. I don?t have evidence that 100 is justified so let?s go with 10 and increase it later on if we find it?s not sufficient. All performance testing I?ve performed was with 100. Should I redo perf testing? >>> I am not sure that code should be in Phase classes. It is only work for cast nodes. I think it should be in ConstraintCast. >> >> The reason I?d like PhaseTransform:: is_dominator() is that for 2 other prototypes I?m working on, I also had 2 copies of the same logic, one that I want apply during IGVN and one that I want apply during loop opts. I?d like to be able to write that logic only once in a clean way and be able to call it both from IGVN and loop opts. > > Understood. Does all these cases check only Cast nodes or others too? It may not safe in general case. Also since you are look in not on whole grpaph the method name should be something like is_near_dominator(). These cases are with different nodes, not only cast nodes. If it?s called is_near_dominator(), then it can?t be a method shared by both loop opts and gvn so a logic that is applied during loop opts and gvn can't be written only once. >>> Remove -XX:+UseSerialGC flag from test. Otherwise we will get error when testing will try to use other GC. >> >> The reason I added that option is because the test doesn?t fail with the default GC (G1). The reason I think is that the G1 post barrier has a wide barrier that prevents the load of saved_not_null to be optimized out (that?s https://bugs.openjdk.java.net/browse/JDK-8087341). > > Add @requires vm.gc=="Serial" > > See: > https://bugs.openjdk.java.net/browse/JDK-8062537 Ok. Here is a new webrev: http://cr.openjdk.java.net/~roland/8139771/webrev.01/ As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. Roland. From roland.westrelin at oracle.com Mon Dec 14 16:42:44 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 14 Dec 2015 17:42:44 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved Message-ID: http://cr.openjdk.java.net/~roland/8145322/webrev.00/ Paul spotted the following small inefficiencies: for (; wi < l; wi++) { long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; long av = U.getLongUnaligned(a, aOffset + bi); long bv = U.getLongUnaligned(b, bOffset + bi); if (av != bv) { is compiled to: 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 0b0 movl RDX, RDI # spill 0b2 # castII of RDX 0b2 movq RBX, [R9 + #16 + RDX << #3] # long 0b7 movq RAX, [RSI + #16 + RDX << #3] # long 0bc cmpq RBX, RAX 0bf jne B28 P=0.000000 C=7836.000000 0bf 0c5 B10: # B28 B11 <- B9 Freq: 977.66 0c5 movl RDX, RDI # spill 0c7 incl RDX # int 0c9 # castII of RDX 0c9 movq RBX, [R9 + #16 + RDX << #3] # long 0ce movq RAX, [RSI + #16 + RDX << #3] # long 0d3 cmpq RBX, RAX 0d6 jne B28 P=0.000000 C=7836.000000 0d6 0dc B11: # B28 B12 <- B10 Freq: 977.66 0dc movl RDX, RDI # spill 0de addl RDX, #2 # int 0e1 # castII of RDX 0e1 movq RBX, [R9 + #16 + RDX << #3] # long 0e6 movq RAX, [RSI + #16 + RDX << #3] # long 0eb cmpq RBX, RAX 0ee jne B28 P=0.000000 C=7836.000000 0ee 0f4 B12: # B28 B13 <- B11 Freq: 977.659 0f4 movl RDX, RDI # spill 0f6 addl RDX, #3 # int 0f9 # castII of RDX 0f9 movq RBX, [R9 + #16 + RDX << #3] # long 0fe movq RAX, [RSI + #16 + RDX << #3] # long 103 cmpq RBX, RAX 106 jne B28 P=0.000000 C=7836.000000 106 10c B13: # B9 B14 <- B12 Freq: 977.659 10c addl RDI, #4 # int 10f cmpl RDI, RBP 111 jl,s B9 # loop end P=0.998980 C=7836.000000 But the intermediate increment of the induction variable: 0c7 incl RDX # int 0de addl RDX, #2 # int 0f6 addl RDX, #3 # int should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. for (; wi < length >> valuesPerWidth; wi++) { long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; long av = U.getLongUnaligned(a, aOffset + bi); long bv = U.getLongUnaligned(b, bOffset + bi); if (av != bv) { 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 0b0 movslq R8, RSI # i2l 0b3 movq RAX, [RDX + #16 + R8 << #3] # long 0b8 movq RDI, [RBP + #16 + R8 << #3] # long 0bd cmpq RAX, RDI 0c0 jne B32 P=0.000000 C=7836.000000 0c0 0c6 B8: # B33 B9 <- B7 Freq: 975.842 0c6 movl R8, RSI # spill 0c9 incl R8 # int 0cc movslq RDI, R8 # i2l 0cf movq RAX, [RDX + #16 + RDI << #3] # long 0d4 movq RDI, [RBP + #16 + RDI << #3] # long 0d9 cmpq RAX, RDI 0dc jne B33 P=0.000000 C=7836.000000 0dc 0e2 B9: # B33 B10 <- B8 Freq: 975.842 0e2 movl R8, RSI # spill 0e5 addl R8, #2 # int 0e9 movslq RDI, R8 # i2l 0ec movq RAX, [RDX + #16 + RDI << #3] # long 0f1 movq RDI, [RBP + #16 + RDI << #3] # long 0f6 cmpq RAX, RDI 0f9 jne B33 P=0.000000 C=7836.000000 0f9 0ff B10: # B33 B11 <- B9 Freq: 975.842 0ff movl R8, RSI # spill 102 addl R8, #3 # int 106 movslq RDI, R8 # i2l 109 movq RAX, [RDX + #16 + RDI << #3] # long 10e movq RDI, [RBP + #16 + RDI << #3] # long 113 cmpq RAX, RDI 116 jne B33 P=0.000000 C=7836.000000 116 11c B11: # B33 B12 <- B10 Freq: 975.841 11c movl R8, RSI # spill 11f addl R8, #4 # int 123 movslq RDI, R8 # i2l 126 movq RAX, [RDX + #16 + RDI << #3] # long 12b movq RDI, [RBP + #16 + RDI << #3] # long 130 cmpq RAX, RDI 133 jne B33 P=0.000000 C=7836.000000 133 139 B12: # B33 B13 <- B11 Freq: 975.841 139 movl R8, RSI # spill 13c addl R8, #5 # int 140 movslq RDI, R8 # i2l 143 movq RAX, [RDX + #16 + RDI << #3] # long 148 movq RDI, [RBP + #16 + RDI << #3] # long 14d cmpq RAX, RDI 150 jne B33 P=0.000000 C=7836.000000 150 156 B13: # B33 B14 <- B12 Freq: 975.84 156 movl R8, RSI # spill 159 addl R8, #6 # int 15d movslq RDI, R8 # i2l 160 movq RAX, [RDX + #16 + RDI << #3] # long 165 movq RDI, [RBP + #16 + RDI << #3] # long 16a cmpq RAX, RDI 16d jne B33 P=0.000000 C=7836.000000 16d 173 B14: # B33 B15 <- B13 Freq: 975.84 173 movl R8, RSI # spill 176 addl R8, #7 # int 17a movslq RDI, R8 # i2l 17d movq RAX, [RDX + #16 + RDI << #3] # long 182 movq RDI, [RBP + #16 + RDI << #3] # long 187 cmpq RAX, RDI 18a jne B33 P=0.000000 C=7836.000000 18a 190 B15: # B7 B16 <- B14 Freq: 975.839 190 addl RSI, #8 # int 193 cmpl RSI, R11 196 jl B7 # loop end P=0.998980 C=7836.000000 Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. Roland. From vladimir.kozlov at oracle.com Mon Dec 14 19:30:47 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 11:30:47 -0800 Subject: Status of LogCompilation in latest JDK9 builds? In-Reply-To: References: Message-ID: <566F18E7.1050707@oracle.com> Thank you for reporting, Chris Confirmed that the problem started in 1.9.0-ea-b92 after Nils pushed: 8137167: JEP165: Compiler Control: Implementation task I filed P1 bug and assigned it to Nils: https://bugs.openjdk.java.net/browse/JDK-8145345 Thanks, Vladimir On 12/11/15 10:22 AM, John Rose wrote: > On Dec 11, 2015, at 3:45 AM, Chris Newland wrote: >> >> Is this expected behaviour at this point in the development of unified >> logging? > > It had better *not* be. If somebody broke something, somebody better fix it. We need LC to service the JIT. > > https://bugs.openjdk.java.net/browse/JDK-8046148?focusedCommentId=13568278&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568278 > > ? John > From vladimir.kozlov at oracle.com Mon Dec 14 19:57:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 11:57:07 -0800 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <76440B2F-C4A6-447B-A202-8882D579573E@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> <566623AD.8060709@oracle.com> <566729A9.2020609@oracle.com> <76440B2F-C4A6-447B-A202-8882D579573E@oracle.com> Message-ID: <566F1F13.9060500@oracle.com> Filed https://bugs.openjdk.java.net/browse/JDK-8145348 Vladimir On 12/8/15 11:05 AM, Christian Thalinger wrote: > >> On Dec 8, 2015, at 9:04 AM, Vladimir Kozlov wrote: >> >> Historically we have intrinsics flag as product. But in reality to have them diagnostic, for example, is also fine. But it would be different change. > > Maybe we should file an enhancement and change all of them to diagnostic. > >> >> Thanks, >> Vladimir >> >> On 12/8/15 10:57 AM, Christian Thalinger wrote: >>> + product(bool, UseVectorizedMismatchIntrinsic, false, \ >>> + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ >>> >>> Do all these really need to be product flags? >>> >>>> On Dec 7, 2015, at 2:26 PM, Vladimir Kozlov wrote: >>>> >>>> Looks good. I will push it when closed part (flag = false) reviewed. >>>> I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> We have updated the jbs entry with your suggested changes for the flag. >>>>> Would you please review it. >>>>> jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>> webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Friday, December 04, 2015 1:12 PM >>>>> To: Deshpande, Vivek R; hotspot compiler >>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz >>>>> Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 >>>>> >>>>> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). >>>>> >>>>> + #ifdef COMPILER2 >>>>> + #ifdef _LP64 >>>>> + if (UseSSE42Intrinsics) { >>>>> >>>>> Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> We have updated the webrev at the jbs entry with the global flag. >>>>>> This is the link for your review. >>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >>>>>> >>>>>> Regards >>>>>> Vivek >>>>>> -----Original Message----- >>>>>> From: Deshpande, Vivek R >>>>>> Sent: Wednesday, December 02, 2015 11:21 AM >>>>>> To: 'Vladimir Kozlov'; hotspot compiler >>>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >>>>>> Subject: RE: RFR (M): 8143355: Update for addition of >>>>>> vectorizedMismatch intrinsic for x86 >>>>>> >>>>>> Hi Vladimir >>>>>> >>>>>> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >>>>>> We will update the patch and jbs entry with global flag and let you know soon. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, December 01, 2015 6:02 PM >>>>>> To: Deshpande, Vivek R; hotspot compiler >>>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >>>>>> Subject: Re: RFR (M): 8143355: Update for addition of >>>>>> vectorizedMismatch intrinsic for x86 >>>>>> >>>>>> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >>>>>> >>>>>> If that is the case the flag should be global. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>>>>>> This seems fine. 2x is for AVX implementation? >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi all >>>>>>>> >>>>>>>> We would like to contribute a patch from Intel which optimizes >>>>>>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>>>>>> architecture using AVX instructions. >>>>>>>> >>>>>>>> The improvement gives more than 2x gain over Unsafe implementation >>>>>>>> for long arrays. >>>>>>>> >>>>>>>> >>>>>>>> The bug is blocked by bug: vectorized support for array >>>>>>>> equals/compare/mismatch using Unsafe >>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>>>>>> >>>>>>>> Could you please review and sponsor this patch. >>>>>>>> >>>>>>>> Bug-id: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>>>>> webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>>>>>> >>>>>>>> Thanks and regards, >>>>>>>> >>>>>>>> Vivek >>>>>>>> >>> > From christian.thalinger at oracle.com Mon Dec 14 20:06:33 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 10:06:33 -1000 Subject: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 In-Reply-To: <566F1F13.9060500@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568F16C1@ORSMSX106.amr.corp.intel.com> <565E4DD2.1030200@oracle.com> <565E511E.9020503@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CF892@ORSMSX106.amr.corp.intel.com> <56620192.5050808@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569D1AC3@ORSMSX106.amr.corp.intel.com> <566623AD.8060709@oracle.com> <566729A9.2020609@oracle.com> <76440B2F-C4A6-447B-A202-8882D579573E@oracle.com> <566F1F13.9060500@oracle.com> Message-ID: <296C476D-3FC3-45A0-9EC5-A36173503F92@oracle.com> > On Dec 14, 2015, at 9:57 AM, Vladimir Kozlov wrote: > > Filed https://bugs.openjdk.java.net/browse/JDK-8145348 Thanks. > > Vladimir > > On 12/8/15 11:05 AM, Christian Thalinger wrote: >> >>> On Dec 8, 2015, at 9:04 AM, Vladimir Kozlov wrote: >>> >>> Historically we have intrinsics flag as product. But in reality to have them diagnostic, for example, is also fine. But it would be different change. >> >> Maybe we should file an enhancement and change all of them to diagnostic. >> >>> >>> Thanks, >>> Vladimir >>> >>> On 12/8/15 10:57 AM, Christian Thalinger wrote: >>>> + product(bool, UseVectorizedMismatchIntrinsic, false, \ >>>> + "Enables intrinsification of ArraysSupport.vectorizedMismatch()") \ >>>> >>>> Do all these really need to be product flags? >>>> >>>>> On Dec 7, 2015, at 2:26 PM, Vladimir Kozlov wrote: >>>>> >>>>> Looks good. I will push it when closed part (flag = false) reviewed. >>>>> I will modify vm_version_x86.cpp to move setting to false in 32-bit VM code to be #else part of flag's setting. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/7/15 11:29 AM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> We have updated the jbs entry with your suggested changes for the flag. >>>>>> Would you please review it. >>>>>> jbs entry: https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>>> webrev is at: http://cr.openjdk.java.net/~mcberg/8143355/webrev.03/ >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Friday, December 04, 2015 1:12 PM >>>>>> To: Deshpande, Vivek R; hotspot compiler >>>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; Paul Sandoz >>>>>> Subject: Re: RFR (M): 8143355: Update for addition of vectorizedMismatch intrinsic for x86 >>>>>> >>>>>> You don't need now #ifdef COMPILER2 (in vm_version_x86.cpp). >>>>>> >>>>>> + #ifdef COMPILER2 >>>>>> + #ifdef _LP64 >>>>>> + if (UseSSE42Intrinsics) { >>>>>> >>>>>> Also you need to add to all other platforms vm_version_.cpp setting flag to false. See UseAdler32Intrinsics settings as example. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/4/15 11:26 AM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> We have updated the webrev at the jbs entry with the global flag. >>>>>>> This is the link for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.02/ >>>>>>> >>>>>>> Regards >>>>>>> Vivek >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Wednesday, December 02, 2015 11:21 AM >>>>>>> To: 'Vladimir Kozlov'; hotspot compiler >>>>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric; 'Paul Sandoz' >>>>>>> Subject: RE: RFR (M): 8143355: Update for addition of >>>>>>> vectorizedMismatch intrinsic for x86 >>>>>>> >>>>>>> Hi Vladimir >>>>>>> >>>>>>> Yes the 2x performance gain is using AVX2 instructions for big arrays(~1k). >>>>>>> We will update the patch and jbs entry with global flag and let you know soon. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Tuesday, December 01, 2015 6:02 PM >>>>>>> To: Deshpande, Vivek R; hotspot compiler >>>>>>> Cc: Yi, Liqi; Viswanathan, Sandhya; Kaczmarek, Eric >>>>>>> Subject: Re: RFR (M): 8143355: Update for addition of >>>>>>> vectorizedMismatch intrinsic for x86 >>>>>>> >>>>>>> 2) improving C1 (perhaps even the interpreter?) since the intrinsic is a stub which IIUC makes it easier to plug in. >>>>>>> >>>>>>> If that is the case the flag should be global. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 12/1/15 5:48 PM, Vladimir Kozlov wrote: >>>>>>>> This seems fine. 2x is for AVX implementation? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/24/15 4:00 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> We would like to contribute a patch from Intel which optimizes >>>>>>>>> vectorizedMismatch() method in java.util.ArraysSupport.java for X86 >>>>>>>>> architecture using AVX instructions. >>>>>>>>> >>>>>>>>> The improvement gives more than 2x gain over Unsafe implementation >>>>>>>>> for long arrays. >>>>>>>>> >>>>>>>>> >>>>>>>>> The bug is blocked by bug: vectorized support for array >>>>>>>>> equals/compare/mismatch using Unsafe >>>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8136924.) >>>>>>>>> >>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>> >>>>>>>>> Bug-id: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143355 >>>>>>>>> webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143355/webrev.01/ >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> >>>>>>>>> Vivek >>>>>>>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Dec 14 21:11:51 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 13:11:51 -0800 Subject: [8u-dev] backport RFR: 6869327: Add new C2 flag to keep safepoints in counted loops In-Reply-To: <5666E383.3010102@oracle.com> References: <5666E383.3010102@oracle.com> Message-ID: <566F3097.5040403@oracle.com> You 8u changes looks fine but they introduced bug in jdk9: https://bugs.openjdk.java.net/browse/JDK-8144935 You need to backport 8144935 fix too as separate changes. Regards, Vladimir On 12/8/15 6:04 AM, Andreas Eriksson wrote: > Hi, > > Please review this backport of JDK-6869327: Add new C2 flag to keep > safepoints in counted loops. > The only change in this backport is to the test, where the testlibrary > imports needed to be changed, and I also removed the @module tag. > > JDK 9 review: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-November/020110.html > > > Webrev for changes between 9 and 8: > http://cr.openjdk.java.net/~aeriksso/6869327/webrev.9_to_8/ > > Full 8u webrev: > http://cr.openjdk.java.net/~aeriksso/6869327/webrev.jdk8u/ > > Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. > https://bugs.openjdk.java.net/browse/JDK-6869327 > > Thanks, > Andreas From tom.rodriguez at oracle.com Mon Dec 14 21:14:04 2015 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 14 Dec 2015 13:14:04 -0800 Subject: RFR(XS): 8145338: compiler/jsr292/CallSiteDepContextTest.java fails: assert(dep_implicit_context_arg(dept) == 0) failed: sanity Message-ID: http://cr.openjdk.java.net/~never/8145338/webrev/ https://bugs.openjdk.java.net/browse/JDK-8145338 call_site_target_value dropped its implicit arg in JDK9 but the code brought over from JDK8 still had this assert. It should be dropped. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Dec 14 22:05:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 14:05:07 -0800 Subject: RFR:8144771: AVX3 patch for MacroAssembler::string_compare In-Reply-To: <39F83597C33E5F408096702907E6C4500F10905C@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F108D6C@ORSMSX104.amr.corp.intel.com> <56659A17.6010300@oracle.com> <39F83597C33E5F408096702907E6C4500F108F7F@ORSMSX104.amr.corp.intel.com> <56663C17.9060408@oracle.com> <566684AB.90600@oracle.com> <39F83597C33E5F408096702907E6C4500F10905C@ORSMSX104.amr.corp.intel.com> Message-ID: <566F3D13.7030008@oracle.com> Looks fine to me too so I am pushing it. Thanks, Vladimir On 12/8/15 12:16 AM, Civlin, Jan wrote: > Thank you, Tobias. > > Best, > Jan > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Monday, December 7, 2015 11:20 PM > To: Vladimir Kozlov ; Civlin, Jan ; hotspot compiler > Subject: Re: RFR:8144771: AVX3 patch for MacroAssembler::string_compare > > Hi Jan, > > On 08.12.2015 03:10, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8144771/webrev.01/ >> >> Vladimir >> >> On 12/7/15 5:50 PM, Civlin, Jan wrote: >>> Tobias, >>> >>> Thank you for spotting this. >>> These comments were from the design and reflected the original order str1/str2. I'm removing them since the function calls say enough. >>> The order should remain str2/str1 since the "result" is modified in the "str1" line. > > Right, looks good to me! > > Best, > Tobias > >>> >>> Vladimir, >>> could you please upload the updated patch (I still do not have an access). >>> >>> >>> Yes, the test has been run: >>> >>> [jcivlin at SKY71 test]$ date; echo $JAVA_HOME; ls -l >>> $JAVA_HOME/lib/amd64/server/libjvm.so; time >>> /home/jcivlin/Tools/jtreg/bin/jtreg >>> compiler/intrinsics/string/TestStringIntrinsics.java >>> Mon Dec 7 11:00:07 PST 2015 >>> /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server- >>> release/jdk -rwxrwxr-x 1 jcivlin jcivlin 17999532 Dec 2 22:13 >>> /home/jcivlin/Java/mberg-100915-11K/build/linux-x86_64-normal-server- >>> release/jdk/lib/amd64/server/libjvm.so >>> Test results: passed: 1 >>> Report written to >>> /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTreport/html/report >>> .html Results written to >>> /home/jcivlin/Java/mberg-100915-11K/hotspot/test/JTwork >>> >>> Thank you, >>> >>> Jan >>> >>> -----Original Message----- >>> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >>> Sent: Monday, December 07, 2015 6:39 AM >>> To: Civlin, Jan; hotspot compiler >>> Cc: Vladimir Kozlov >>> Subject: Re: RFR:8144771: AVX3 patch for >>> MacroAssembler::string_compare >>> >>> Hi Jan, >>> >>> the intrinsic looks good to me (not a reviewer). Here are two minor suggestions: >>> - The following comments are wrong: >>> 8355 } else { //ae == StrIntrinsicNode::UL >>> 8356 load_unsigned_short(cnt1, Address(str2, result, scale2)); // L string >>> 8357 load_unsigned_byte(result, Address(str1, result, scale1)); // U string >>> The first line then loads a UTF16 (two-byte) String and the second line loads a Latin1 (one-byte) String. Maybe you should also exchange the lines to first load str1 and then load str2. I would omit the comment after "else" because ae could either be UL or LU (both have the Latin1 string in str1). >>> - Missing whitespace after comma: >>> 8143 cmpl(cnt2,stride2x2); >>> >>> I assume you executed the hotspot JTREG tests (including /compiler/intrinsics/string/TestStringIntrinsics.java). >>> >>> Best, >>> Tobias >>> >>> On 05.12.2015 05:07, Civlin, Jan wrote: >>>> We would like to contribute AVX3 patch for MacroAssembler::string_compare. >>>> >>>> This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up) on long strings at about x 1.33 and on random string about x 1.22. This was measured vs AVX2 (256 bits registers). >>>> >>>> >>>> Contributors: >>>> MacroAssembler::string_compare - Jan Civlin. >>>> Rest of code, including all x86 AVX3 extensions - Michael Berg >>>> >>>> >>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8144771 >>>> Webrev: http://cr.openjdk.java.net/~kvn/8144771/webrev/ From christian.thalinger at oracle.com Mon Dec 14 22:24:14 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 12:24:14 -1000 Subject: RFR: 8145270: Need to eagerly initialize JVMCI compiler under -Xcomp In-Reply-To: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> References: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> Message-ID: > On Dec 13, 2015, at 11:49 AM, Doug Simon wrote: > > In blocking compilation mode (i.e., -UseInterpreter), certain compilations are forced to be non-blocking if a JVMCI compiler is being used (i.e., +UseJVMCICompiler). This is to prevent deadlocks than can occur between an application thread and a JVMCI compiler thread. One condition for forcing a compilation to be non-blocking is if the JVMCI compiler not yet initialized. This is problematic for tests that attempt to force a method to be compiled by JVMCI (e.g., -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:-TieredCompilation -Xcomp -XX:CompileCommand=compileonly,Tester_*::*). If the test is small enough, JVMCI initialization (which is lazy) may still be executing when the test methods are scheduled to be compiled. > > The solution is to make JVMCI compiler initialization eager in blocking compilation mode. > > https://bugs.openjdk.java.net/browse/JDK-8145270 > http://cr.openjdk.java.net/~dnsimon/8145270/ Looks good. The only comment I have is at that point in initialization we can handle exceptions already and I think we should. This gives us even useful stack traces: Error occurred during initialization of VM java.lang.Error: just testing at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:181) Here is the updated patch: diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.cpp --- a/src/share/vm/compiler/compileBroker.cpp Wed Dec 09 14:54:40 2015 +0100 +++ b/src/share/vm/compiler/compileBroker.cpp Mon Dec 14 12:20:18 2015 -1000 @@ -56,6 +56,7 @@ #if INCLUDE_JVMCI #include "jvmci/jvmciCompiler.hpp" #include "jvmci/jvmciRuntime.hpp" +#include "jvmci/jvmciJavaClasses.hpp" #include "runtime/vframe.hpp" #endif #ifdef COMPILER2 @@ -498,7 +499,7 @@ CompilerCounters::CompilerCounters() { // CompileBroker::compilation_init // // Initialize the Compilation object -void CompileBroker::compilation_init() { +void CompileBroker::compilation_init(TRAPS) { _last_method_compiled[0] = '\0'; // No need to initialize compilation system if we do not use it. @@ -529,6 +530,17 @@ void CompileBroker::compilation_init() { } else { c1_count = JVMCIHostThreads; } + + if (!UseInterpreter) { + // Force initialization of JVMCI compiler otherwise JVMCI + // compilations will not block until JVMCI is initialized + ResourceMark rm; + TempNewSymbol getCompiler = SymbolTable::new_symbol("getCompiler", CHECK); + TempNewSymbol sig = SymbolTable::new_symbol("()Ljdk/vm/ci/runtime/JVMCICompiler;", CHECK); + Handle jvmciRuntime = JVMCIRuntime::get_HotSpotJVMCIRuntime(CHECK); + JavaValue result(T_OBJECT); + JavaCalls::call_virtual(&result, jvmciRuntime, HotSpotJVMCIRuntime::klass(), getCompiler, sig, CHECK); + } } } #endif // INCLUDE_JVMCI diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.hpp --- a/src/share/vm/compiler/compileBroker.hpp Wed Dec 09 14:54:40 2015 +0100 +++ b/src/share/vm/compiler/compileBroker.hpp Mon Dec 14 12:20:18 2015 -1000 @@ -276,7 +276,7 @@ public: CompileQueue *q = compile_queue(comp_level); return q != NULL ? q->size() : 0; } - static void compilation_init(); + static void compilation_init(TRAPS); static void init_compiler_thread_log(); static nmethod* compile_method(const methodHandle& method, int osr_bci, diff -r d84bd22ab531 src/share/vm/runtime/thread.cpp --- a/src/share/vm/runtime/thread.cpp Wed Dec 09 14:54:40 2015 +0100 +++ b/src/share/vm/runtime/thread.cpp Mon Dec 14 12:20:18 2015 -1000 @@ -3628,7 +3628,7 @@ jint Threads::create_vm(JavaVMInitArgs* // initialize compiler(s) #if defined(COMPILER1) || defined(COMPILER2) || defined(SHARK) || INCLUDE_JVMCI - CompileBroker::compilation_init(); + CompileBroker::compilation_init(CHECK_JNI_ERR); #endif // Pre-initialize some JSR292 core classes to avoid deadlock during class loading. From vladimir.kozlov at oracle.com Mon Dec 14 22:43:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 14:43:05 -0800 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> Message-ID: <566F45F9.5000304@oracle.com> On 12/14/15 7:19 AM, Roland Westrelin wrote: > Hi Vladimir, > >>>> I was confused by naming first and by placement of code. >>>> And mixing code for casts and code in gcm. In reality the cast code tries to find only "immediate"/near dominating cast. So why (i >= 100) and not, lets say, >= 10? >>> >>> It?s arbitrary so if you think 100 is too much, sure we can go with 10. >> >> What is reasonable number, you think, based on tests you have? I think 100 is waste of time but 10 could be not enough. > > I don?t have evidence that 100 is justified so let?s go with 10 and increase it later on if we find it?s not sufficient. All performance testing I?ve performed was with 100. Should I redo perf testing? No need to re-test. Less iterations should not increase time. >>>> I am not sure that code should be in Phase classes. It is only work for cast nodes. I think it should be in ConstraintCast. >>> >>> The reason I?d like PhaseTransform:: is_dominator() is that for 2 other prototypes I?m working on, I also had 2 copies of the same logic, one that I want apply during IGVN and one that I want apply during loop opts. I?d like to be able to write that logic only once in a clean way and be able to call it both from IGVN and loop opts. >> >> Understood. Does all these cases check only Cast nodes or others too? It may not safe in general case. Also since you are look in not on whole grpaph the method name should be something like is_near_dominator(). > > These cases are with different nodes, not only cast nodes. > If it?s called is_near_dominator(), then it can?t be a method shared by both loop opts and gvn so a logic that is applied during loop opts and gvn can't be written only once. I see. Okay. >>>> Remove -XX:+UseSerialGC flag from test. Otherwise we will get error when testing will try to use other GC. >>> >>> The reason I added that option is because the test doesn?t fail with the default GC (G1). The reason I think is that the G1 post barrier has a wide barrier that prevents the load of saved_not_null to be optimized out (that?s https://bugs.openjdk.java.net/browse/JDK-8087341). >> >> Add @requires vm.gc=="Serial" >> >> See: >> https://bugs.openjdk.java.net/browse/JDK-8062537 > > Ok. > > Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8139771/webrev.01/ Good. Except predicates - see next. > > As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? Thanks, Vladimir > > Roland. > From doug.simon at oracle.com Mon Dec 14 22:46:18 2015 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 14 Dec 2015 22:46:18 +0000 Subject: RFR: 8145270: Need to eagerly initialize JVMCI compiler under -Xcomp In-Reply-To: References: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> Message-ID: <3A7730EE-43D5-4827-967C-5F86CECA9795@oracle.com> I was tempted to do that myself but opted for keeping the patch minimal. However, the solution that propagates the exception is obviously better. > On 14 Dec 2015, at 22:24, Christian Thalinger wrote: > > >> On Dec 13, 2015, at 11:49 AM, Doug Simon wrote: >> >> In blocking compilation mode (i.e., -UseInterpreter), certain compilations are forced to be non-blocking if a JVMCI compiler is being used (i.e., +UseJVMCICompiler). This is to prevent deadlocks than can occur between an application thread and a JVMCI compiler thread. One condition for forcing a compilation to be non-blocking is if the JVMCI compiler not yet initialized. This is problematic for tests that attempt to force a method to be compiled by JVMCI (e.g., -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:-TieredCompilation -Xcomp -XX:CompileCommand=compileonly,Tester_*::*). If the test is small enough, JVMCI initialization (which is lazy) may still be executing when the test methods are scheduled to be compiled. >> >> The solution is to make JVMCI compiler initialization eager in blocking compilation mode. >> >> https://bugs.openjdk.java.net/browse/JDK-8145270 >> http://cr.openjdk.java.net/~dnsimon/8145270/ > > Looks good. The only comment I have is at that point in initialization we can handle exceptions already and I think we should. This gives us even useful stack traces: > > Error occurred during initialization of VM > java.lang.Error: just testing > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:181) > > Here is the updated patch: > > diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.cpp > --- a/src/share/vm/compiler/compileBroker.cpp Wed Dec 09 14:54:40 2015 +0100 > +++ b/src/share/vm/compiler/compileBroker.cpp Mon Dec 14 12:20:18 2015 -1000 > @@ -56,6 +56,7 @@ > #if INCLUDE_JVMCI > #include "jvmci/jvmciCompiler.hpp" > #include "jvmci/jvmciRuntime.hpp" > +#include "jvmci/jvmciJavaClasses.hpp" > #include "runtime/vframe.hpp" > #endif > #ifdef COMPILER2 > @@ -498,7 +499,7 @@ CompilerCounters::CompilerCounters() { > // CompileBroker::compilation_init > // > // Initialize the Compilation object > -void CompileBroker::compilation_init() { > +void CompileBroker::compilation_init(TRAPS) { > _last_method_compiled[0] = '\0'; > > // No need to initialize compilation system if we do not use it. > @@ -529,6 +530,17 @@ void CompileBroker::compilation_init() { > } else { > c1_count = JVMCIHostThreads; > } > + > + if (!UseInterpreter) { > + // Force initialization of JVMCI compiler otherwise JVMCI > + // compilations will not block until JVMCI is initialized > + ResourceMark rm; > + TempNewSymbol getCompiler = SymbolTable::new_symbol("getCompiler", CHECK); > + TempNewSymbol sig = SymbolTable::new_symbol("()Ljdk/vm/ci/runtime/JVMCICompiler;", CHECK); > + Handle jvmciRuntime = JVMCIRuntime::get_HotSpotJVMCIRuntime(CHECK); > + JavaValue result(T_OBJECT); > + JavaCalls::call_virtual(&result, jvmciRuntime, HotSpotJVMCIRuntime::klass(), getCompiler, sig, CHECK); > + } > } > } > #endif // INCLUDE_JVMCI > diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.hpp > --- a/src/share/vm/compiler/compileBroker.hpp Wed Dec 09 14:54:40 2015 +0100 > +++ b/src/share/vm/compiler/compileBroker.hpp Mon Dec 14 12:20:18 2015 -1000 > @@ -276,7 +276,7 @@ public: > CompileQueue *q = compile_queue(comp_level); > return q != NULL ? q->size() : 0; > } > - static void compilation_init(); > + static void compilation_init(TRAPS); > static void init_compiler_thread_log(); > static nmethod* compile_method(const methodHandle& method, > int osr_bci, > diff -r d84bd22ab531 src/share/vm/runtime/thread.cpp > --- a/src/share/vm/runtime/thread.cpp Wed Dec 09 14:54:40 2015 +0100 > +++ b/src/share/vm/runtime/thread.cpp Mon Dec 14 12:20:18 2015 -1000 > @@ -3628,7 +3628,7 @@ jint Threads::create_vm(JavaVMInitArgs* > > // initialize compiler(s) > #if defined(COMPILER1) || defined(COMPILER2) || defined(SHARK) || INCLUDE_JVMCI > - CompileBroker::compilation_init(); > + CompileBroker::compilation_init(CHECK_JNI_ERR); > #endif > > // Pre-initialize some JSR292 core classes to avoid deadlock during class loading. > From vladimir.kozlov at oracle.com Mon Dec 14 22:59:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 14:59:05 -0800 Subject: RFR(S/M): 8144246: CompilerControl: adding lots of directives via jcmd may produce OOM crash In-Reply-To: <566ED7D3.1010200@oracle.com> References: <566ED7D3.1010200@oracle.com> Message-ID: <566F49B9.8090104@oracle.com> Looks good. Thanks, Vladimir On 12/14/15 6:53 AM, Nils Eliasson wrote: > Hi, > > Please review this minor change. It introduced a limit to how many > directives can be added. The limit can be controlled by the diagnostic > flag CompilerDirectivesLimit. For normal use it would be very unusual to > have more than a few directives. > > The Flag PrintCompilerDirectives was changed to CompilerDirectivesPrint > to have a consistent naming for all directives flag. This is a new flag > and is not used anywhere yet. > > Testing: > All the compiler control tests will have been run before submit. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8144246 > Webrev: http://cr.openjdk.java.net/~neliasso/8144246/webrev.01/ > > Regards, > Nils > From christian.thalinger at oracle.com Mon Dec 14 23:00:59 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 13:00:59 -1000 Subject: RFR: 8145270: Need to eagerly initialize JVMCI compiler under -Xcomp In-Reply-To: <3A7730EE-43D5-4827-967C-5F86CECA9795@oracle.com> References: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> <3A7730EE-43D5-4827-967C-5F86CECA9795@oracle.com> Message-ID: <81055BC5-133C-4222-8A34-59E68E2D14CC@oracle.com> Alright, I?ll push that. > On Dec 14, 2015, at 12:46 PM, Doug Simon wrote: > > I was tempted to do that myself but opted for keeping the patch minimal. However, the solution that propagates the exception is obviously better. > >> On 14 Dec 2015, at 22:24, Christian Thalinger wrote: >> >> >>> On Dec 13, 2015, at 11:49 AM, Doug Simon wrote: >>> >>> In blocking compilation mode (i.e., -UseInterpreter), certain compilations are forced to be non-blocking if a JVMCI compiler is being used (i.e., +UseJVMCICompiler). This is to prevent deadlocks than can occur between an application thread and a JVMCI compiler thread. One condition for forcing a compilation to be non-blocking is if the JVMCI compiler not yet initialized. This is problematic for tests that attempt to force a method to be compiled by JVMCI (e.g., -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:-TieredCompilation -Xcomp -XX:CompileCommand=compileonly,Tester_*::*). If the test is small enough, JVMCI initialization (which is lazy) may still be executing when the test methods are scheduled to be compiled. >>> >>> The solution is to make JVMCI compiler initialization eager in blocking compilation mode. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8145270 >>> http://cr.openjdk.java.net/~dnsimon/8145270/ >> >> Looks good. The only comment I have is at that point in initialization we can handle exceptions already and I think we should. This gives us even useful stack traces: >> >> Error occurred during initialization of VM >> java.lang.Error: just testing >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:181) >> >> Here is the updated patch: >> >> diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.cpp >> --- a/src/share/vm/compiler/compileBroker.cpp Wed Dec 09 14:54:40 2015 +0100 >> +++ b/src/share/vm/compiler/compileBroker.cpp Mon Dec 14 12:20:18 2015 -1000 >> @@ -56,6 +56,7 @@ >> #if INCLUDE_JVMCI >> #include "jvmci/jvmciCompiler.hpp" >> #include "jvmci/jvmciRuntime.hpp" >> +#include "jvmci/jvmciJavaClasses.hpp" >> #include "runtime/vframe.hpp" >> #endif >> #ifdef COMPILER2 >> @@ -498,7 +499,7 @@ CompilerCounters::CompilerCounters() { >> // CompileBroker::compilation_init >> // >> // Initialize the Compilation object >> -void CompileBroker::compilation_init() { >> +void CompileBroker::compilation_init(TRAPS) { >> _last_method_compiled[0] = '\0'; >> >> // No need to initialize compilation system if we do not use it. >> @@ -529,6 +530,17 @@ void CompileBroker::compilation_init() { >> } else { >> c1_count = JVMCIHostThreads; >> } >> + >> + if (!UseInterpreter) { >> + // Force initialization of JVMCI compiler otherwise JVMCI >> + // compilations will not block until JVMCI is initialized >> + ResourceMark rm; >> + TempNewSymbol getCompiler = SymbolTable::new_symbol("getCompiler", CHECK); >> + TempNewSymbol sig = SymbolTable::new_symbol("()Ljdk/vm/ci/runtime/JVMCICompiler;", CHECK); >> + Handle jvmciRuntime = JVMCIRuntime::get_HotSpotJVMCIRuntime(CHECK); >> + JavaValue result(T_OBJECT); >> + JavaCalls::call_virtual(&result, jvmciRuntime, HotSpotJVMCIRuntime::klass(), getCompiler, sig, CHECK); >> + } >> } >> } >> #endif // INCLUDE_JVMCI >> diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.hpp >> --- a/src/share/vm/compiler/compileBroker.hpp Wed Dec 09 14:54:40 2015 +0100 >> +++ b/src/share/vm/compiler/compileBroker.hpp Mon Dec 14 12:20:18 2015 -1000 >> @@ -276,7 +276,7 @@ public: >> CompileQueue *q = compile_queue(comp_level); >> return q != NULL ? q->size() : 0; >> } >> - static void compilation_init(); >> + static void compilation_init(TRAPS); >> static void init_compiler_thread_log(); >> static nmethod* compile_method(const methodHandle& method, >> int osr_bci, >> diff -r d84bd22ab531 src/share/vm/runtime/thread.cpp >> --- a/src/share/vm/runtime/thread.cpp Wed Dec 09 14:54:40 2015 +0100 >> +++ b/src/share/vm/runtime/thread.cpp Mon Dec 14 12:20:18 2015 -1000 >> @@ -3628,7 +3628,7 @@ jint Threads::create_vm(JavaVMInitArgs* >> >> // initialize compiler(s) >> #if defined(COMPILER1) || defined(COMPILER2) || defined(SHARK) || INCLUDE_JVMCI >> - CompileBroker::compilation_init(); >> + CompileBroker::compilation_init(CHECK_JNI_ERR); >> #endif >> >> // Pre-initialize some JSR292 core classes to avoid deadlock during class loading. >> > From doug.simon at oracle.com Mon Dec 14 23:03:02 2015 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 14 Dec 2015 23:03:02 +0000 Subject: RFR: 8145270: Need to eagerly initialize JVMCI compiler under -Xcomp In-Reply-To: <81055BC5-133C-4222-8A34-59E68E2D14CC@oracle.com> References: <6E277F4F-DF34-4915-9641-96F3940BEB90@oracle.com> <3A7730EE-43D5-4827-967C-5F86CECA9795@oracle.com> <81055BC5-133C-4222-8A34-59E68E2D14CC@oracle.com> Message-ID: Excellent - thanks for the review and the improvement! > On 14 Dec 2015, at 23:00, Christian Thalinger wrote: > > Alright, I?ll push that. > >> On Dec 14, 2015, at 12:46 PM, Doug Simon wrote: >> >> I was tempted to do that myself but opted for keeping the patch minimal. However, the solution that propagates the exception is obviously better. >> >>> On 14 Dec 2015, at 22:24, Christian Thalinger wrote: >>> >>> >>>> On Dec 13, 2015, at 11:49 AM, Doug Simon wrote: >>>> >>>> In blocking compilation mode (i.e., -UseInterpreter), certain compilations are forced to be non-blocking if a JVMCI compiler is being used (i.e., +UseJVMCICompiler). This is to prevent deadlocks than can occur between an application thread and a JVMCI compiler thread. One condition for forcing a compilation to be non-blocking is if the JVMCI compiler not yet initialized. This is problematic for tests that attempt to force a method to be compiled by JVMCI (e.g., -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -XX:-TieredCompilation -Xcomp -XX:CompileCommand=compileonly,Tester_*::*). If the test is small enough, JVMCI initialization (which is lazy) may still be executing when the test methods are scheduled to be compiled. >>>> >>>> The solution is to make JVMCI compiler initialization eager in blocking compilation mode. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8145270 >>>> http://cr.openjdk.java.net/~dnsimon/8145270/ >>> >>> Looks good. The only comment I have is at that point in initialization we can handle exceptions already and I think we should. This gives us even useful stack traces: >>> >>> Error occurred during initialization of VM >>> java.lang.Error: just testing >>> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:181) >>> >>> Here is the updated patch: >>> >>> diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.cpp >>> --- a/src/share/vm/compiler/compileBroker.cpp Wed Dec 09 14:54:40 2015 +0100 >>> +++ b/src/share/vm/compiler/compileBroker.cpp Mon Dec 14 12:20:18 2015 -1000 >>> @@ -56,6 +56,7 @@ >>> #if INCLUDE_JVMCI >>> #include "jvmci/jvmciCompiler.hpp" >>> #include "jvmci/jvmciRuntime.hpp" >>> +#include "jvmci/jvmciJavaClasses.hpp" >>> #include "runtime/vframe.hpp" >>> #endif >>> #ifdef COMPILER2 >>> @@ -498,7 +499,7 @@ CompilerCounters::CompilerCounters() { >>> // CompileBroker::compilation_init >>> // >>> // Initialize the Compilation object >>> -void CompileBroker::compilation_init() { >>> +void CompileBroker::compilation_init(TRAPS) { >>> _last_method_compiled[0] = '\0'; >>> >>> // No need to initialize compilation system if we do not use it. >>> @@ -529,6 +530,17 @@ void CompileBroker::compilation_init() { >>> } else { >>> c1_count = JVMCIHostThreads; >>> } >>> + >>> + if (!UseInterpreter) { >>> + // Force initialization of JVMCI compiler otherwise JVMCI >>> + // compilations will not block until JVMCI is initialized >>> + ResourceMark rm; >>> + TempNewSymbol getCompiler = SymbolTable::new_symbol("getCompiler", CHECK); >>> + TempNewSymbol sig = SymbolTable::new_symbol("()Ljdk/vm/ci/runtime/JVMCICompiler;", CHECK); >>> + Handle jvmciRuntime = JVMCIRuntime::get_HotSpotJVMCIRuntime(CHECK); >>> + JavaValue result(T_OBJECT); >>> + JavaCalls::call_virtual(&result, jvmciRuntime, HotSpotJVMCIRuntime::klass(), getCompiler, sig, CHECK); >>> + } >>> } >>> } >>> #endif // INCLUDE_JVMCI >>> diff -r d84bd22ab531 src/share/vm/compiler/compileBroker.hpp >>> --- a/src/share/vm/compiler/compileBroker.hpp Wed Dec 09 14:54:40 2015 +0100 >>> +++ b/src/share/vm/compiler/compileBroker.hpp Mon Dec 14 12:20:18 2015 -1000 >>> @@ -276,7 +276,7 @@ public: >>> CompileQueue *q = compile_queue(comp_level); >>> return q != NULL ? q->size() : 0; >>> } >>> - static void compilation_init(); >>> + static void compilation_init(TRAPS); >>> static void init_compiler_thread_log(); >>> static nmethod* compile_method(const methodHandle& method, >>> int osr_bci, >>> diff -r d84bd22ab531 src/share/vm/runtime/thread.cpp >>> --- a/src/share/vm/runtime/thread.cpp Wed Dec 09 14:54:40 2015 +0100 >>> +++ b/src/share/vm/runtime/thread.cpp Mon Dec 14 12:20:18 2015 -1000 >>> @@ -3628,7 +3628,7 @@ jint Threads::create_vm(JavaVMInitArgs* >>> >>> // initialize compiler(s) >>> #if defined(COMPILER1) || defined(COMPILER2) || defined(SHARK) || INCLUDE_JVMCI >>> - CompileBroker::compilation_init(); >>> + CompileBroker::compilation_init(CHECK_JNI_ERR); >>> #endif >>> >>> // Pre-initialize some JSR292 core classes to avoid deadlock during class loading. >>> >> > From christian.thalinger at oracle.com Mon Dec 14 23:03:49 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 13:03:49 -1000 Subject: RFR(XS): 8145338: compiler/jsr292/CallSiteDepContextTest.java fails: assert(dep_implicit_context_arg(dept) == 0) failed: sanity In-Reply-To: References: Message-ID: <42E72D3D-C090-436C-BCF3-3E35C0FBDEE4@oracle.com> Looks good. Are you going to integrate this yourself? > On Dec 14, 2015, at 11:14 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8145338/webrev/ > https://bugs.openjdk.java.net/browse/JDK-8145338 > > call_site_target_value dropped its implicit arg in JDK9 but the code brought over from JDK8 still had this assert. It should be dropped. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Dec 14 23:08:25 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 13:08:25 -1000 Subject: RFR(XS): 8145338: compiler/jsr292/CallSiteDepContextTest.java fails: assert(dep_implicit_context_arg(dept) == 0) failed: sanity In-Reply-To: <42E72D3D-C090-436C-BCF3-3E35C0FBDEE4@oracle.com> References: <42E72D3D-C090-436C-BCF3-3E35C0FBDEE4@oracle.com> Message-ID: Never mind. I?ll submit one JPRT job for both JVMCI changes. Saves some time. > On Dec 14, 2015, at 1:03 PM, Christian Thalinger wrote: > > Looks good. Are you going to integrate this yourself? > >> On Dec 14, 2015, at 11:14 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8145338/webrev/ >> https://bugs.openjdk.java.net/browse/JDK-8145338 >> >> call_site_target_value dropped its implicit arg in JDK9 but the code brought over from JDK8 still had this assert. It should be dropped. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Dec 15 02:32:18 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 16:32:18 -1000 Subject: RFR: 8144856 (XS): Fix assert in CompiledStaticCall::set_to_interpreted In-Reply-To: <566A9946.6010101@oracle.com> References: <566A9946.6010101@oracle.com> Message-ID: <59A392C9-D463-45B6-A274-7C3AA8A7B4FF@oracle.com> AArch64 folk, is the fact that the second assert is different on AArch64 than all the other platforms on purpose? - assert(method_holder->data() == 0 || jump->jump_destination() == entry, + assert(data == 0 || destination == entry, vs. - assert(jump->jump_destination() == (address)-1 || jump->jump_destination() == entry, + assert(destination == (address)-1 || destination == entry, > On Dec 10, 2015, at 11:37 PM, Jamsheed C m wrote: > > Hi All, > > > Summary: Assert code in CompiledStaticCall::set_to_interpreted is fixed. Previous implementation did multiple reads for multiple valid state check in same assert for the MT safe updates that can happen from other threads. This could cause bogus failures as after each read state could have changed. As a fix to this, made the code to read the value at once. > > Bug:https://bugs.openjdk.java.net/browse/JDK-8144856 > > webrev: http://cr.openjdk.java.net/~thartmann/8144856/webrev.00/ > > Testing: jprt hotspot. > > Thanks and Best Regards, > Jamsheed > > From vladimir.kozlov at oracle.com Tue Dec 15 02:40:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 18:40:02 -0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: Message-ID: <566F7D82.6030806@oracle.com> Very interesting! Please, add short statement to the comment in /macro.cpp for your case. Changes looks fine to me. One nit could be to delay bytecode analysis until macro expansion - it may reduce compilation time. Bytecode analysis of each constructor could be expensive. Thanks, Vladimir On 12/10/15 6:48 AM, Hui Shi wrote: > Hi All, > > > Could some one help comments this change? > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 > > webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ > > > This patch aims to remove redundant memory barrier after allocation > node, on AArch64 it removes redundant dmb when creating object. The > motivation is dmb instructions after commonly used object allocation, > for example string and boxing objects is redundant with dmb inserted for > final field write. In following small case:____ > > __ __ > > String foo(String s)____ > > {____ > > String copy = new String(s);____ > > return copy;____ > > }____ > > __ __ > > There are two dmb instructions in generated code. First one is > membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. > Second one is membar_release, inserted at exit of initializer method as > final fields write happens. Allocated String doesn't escape in String > initializer method, membar_release includes membar_storestore semantic. > So first one can be removed safely.____ > > __ __ > > 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ > > 0x0000007f85bbfa90: str xzr, [x0,#16]____ > > 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ > > ....____ > > ____ > > 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ > > 0x0000007fa01d83c4: ldr w12, [x20,#16]____ > > 0x0000007fa01d83c8: ldr x11, [sp,#8]____ > > 0x0000007fa01d83cc: strb w10, [x11,#20]____ > > 0x0000007fa01d83d0: str w12, [x11,#16]____ > > 0x0000007fa01d83d4: dmb ish // second dmb____ > > __ __ > > > Patch targets this pattern and remove redundant memory barrier for > allocation node.____ > > 1. When inserting memory barrier for final field write. If final fields' > object allocation node is available, invoke > AllocationNode::compute_MemBar_redundancy(initializer method).____ > > 2. In AllocationNode:____ > > 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate > if memory barrier after allocation node is redundant.____ > > 2.2 Add method compute_MemBar_redundancy, set > _is_allocation_MemBar_redundant true if first parameter "this" does > not escape in initializer method according to BCEscapeAnalyzer.____ > > 3. skip inserting memory barrier in > PhaseMacroExpand::expand_allocate_common, when AllocationNode's > _is_allocation_MemBar_redundant flagis true. > > > Regards > > Hui > From vladimir.kozlov at oracle.com Tue Dec 15 02:56:55 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 18:56:55 -0800 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: References: Message-ID: <566F8177.8080000@oracle.com> Second assembler output still have intermediate increments and also new movslq instructions. Why it should be better. Thanks, Vladimir On 12/14/15 8:42 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8145322/webrev.00/ > > Paul spotted the following small inefficiencies: > > for (; wi < l; wi++) { > long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; > long av = U.getLongUnaligned(a, aOffset + bi); > long bv = U.getLongUnaligned(b, bOffset + bi); > if (av != bv) { > > is compiled to: > > 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 > 0b0 movl RDX, RDI # spill > 0b2 # castII of RDX > 0b2 movq RBX, [R9 + #16 + RDX << #3] # long > 0b7 movq RAX, [RSI + #16 + RDX << #3] # long > 0bc cmpq RBX, RAX > 0bf jne B28 P=0.000000 C=7836.000000 > 0bf > 0c5 B10: # B28 B11 <- B9 Freq: 977.66 > 0c5 movl RDX, RDI # spill > 0c7 incl RDX # int > 0c9 # castII of RDX > 0c9 movq RBX, [R9 + #16 + RDX << #3] # long > 0ce movq RAX, [RSI + #16 + RDX << #3] # long > 0d3 cmpq RBX, RAX > 0d6 jne B28 P=0.000000 C=7836.000000 > 0d6 > 0dc B11: # B28 B12 <- B10 Freq: 977.66 > 0dc movl RDX, RDI # spill > 0de addl RDX, #2 # int > 0e1 # castII of RDX > 0e1 movq RBX, [R9 + #16 + RDX << #3] # long > 0e6 movq RAX, [RSI + #16 + RDX << #3] # long > 0eb cmpq RBX, RAX > 0ee jne B28 P=0.000000 C=7836.000000 > 0ee > 0f4 B12: # B28 B13 <- B11 Freq: 977.659 > 0f4 movl RDX, RDI # spill > 0f6 addl RDX, #3 # int > 0f9 # castII of RDX > 0f9 movq RBX, [R9 + #16 + RDX << #3] # long > 0fe movq RAX, [RSI + #16 + RDX << #3] # long > 103 cmpq RBX, RAX > 106 jne B28 P=0.000000 C=7836.000000 > 106 > 10c B13: # B9 B14 <- B12 Freq: 977.659 > 10c addl RDI, #4 # int > 10f cmpl RDI, RBP > 111 jl,s B9 # loop end P=0.998980 C=7836.000000 > > But the intermediate increment of the induction variable: > 0c7 incl RDX # int > 0de addl RDX, #2 # int > 0f6 addl RDX, #3 # int > > should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. > > for (; wi < length >> valuesPerWidth; wi++) { > long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; > long av = U.getLongUnaligned(a, aOffset + bi); > long bv = U.getLongUnaligned(b, bOffset + bi); > if (av != bv) { > > 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 > 0b0 movslq R8, RSI # i2l > 0b3 movq RAX, [RDX + #16 + R8 << #3] # long > 0b8 movq RDI, [RBP + #16 + R8 << #3] # long > 0bd cmpq RAX, RDI > 0c0 jne B32 P=0.000000 C=7836.000000 > 0c0 > 0c6 B8: # B33 B9 <- B7 Freq: 975.842 > 0c6 movl R8, RSI # spill > 0c9 incl R8 # int > 0cc movslq RDI, R8 # i2l > 0cf movq RAX, [RDX + #16 + RDI << #3] # long > 0d4 movq RDI, [RBP + #16 + RDI << #3] # long > 0d9 cmpq RAX, RDI > 0dc jne B33 P=0.000000 C=7836.000000 > 0dc > 0e2 B9: # B33 B10 <- B8 Freq: 975.842 > 0e2 movl R8, RSI # spill > 0e5 addl R8, #2 # int > 0e9 movslq RDI, R8 # i2l > 0ec movq RAX, [RDX + #16 + RDI << #3] # long > 0f1 movq RDI, [RBP + #16 + RDI << #3] # long > 0f6 cmpq RAX, RDI > 0f9 jne B33 P=0.000000 C=7836.000000 > 0f9 > 0ff B10: # B33 B11 <- B9 Freq: 975.842 > 0ff movl R8, RSI # spill > 102 addl R8, #3 # int > 106 movslq RDI, R8 # i2l > 109 movq RAX, [RDX + #16 + RDI << #3] # long > 10e movq RDI, [RBP + #16 + RDI << #3] # long > 113 cmpq RAX, RDI > 116 jne B33 P=0.000000 C=7836.000000 > 116 > 11c B11: # B33 B12 <- B10 Freq: 975.841 > 11c movl R8, RSI # spill > 11f addl R8, #4 # int > 123 movslq RDI, R8 # i2l > 126 movq RAX, [RDX + #16 + RDI << #3] # long > 12b movq RDI, [RBP + #16 + RDI << #3] # long > 130 cmpq RAX, RDI > 133 jne B33 P=0.000000 C=7836.000000 > 133 > 139 B12: # B33 B13 <- B11 Freq: 975.841 > 139 movl R8, RSI # spill > 13c addl R8, #5 # int > 140 movslq RDI, R8 # i2l > 143 movq RAX, [RDX + #16 + RDI << #3] # long > 148 movq RDI, [RBP + #16 + RDI << #3] # long > 14d cmpq RAX, RDI > 150 jne B33 P=0.000000 C=7836.000000 > 150 > 156 B13: # B33 B14 <- B12 Freq: 975.84 > 156 movl R8, RSI # spill > 159 addl R8, #6 # int > 15d movslq RDI, R8 # i2l > 160 movq RAX, [RDX + #16 + RDI << #3] # long > 165 movq RDI, [RBP + #16 + RDI << #3] # long > 16a cmpq RAX, RDI > 16d jne B33 P=0.000000 C=7836.000000 > 16d > 173 B14: # B33 B15 <- B13 Freq: 975.84 > 173 movl R8, RSI # spill > 176 addl R8, #7 # int > 17a movslq RDI, R8 # i2l > 17d movq RAX, [RDX + #16 + RDI << #3] # long > 182 movq RDI, [RBP + #16 + RDI << #3] # long > 187 cmpq RAX, RDI > 18a jne B33 P=0.000000 C=7836.000000 > 18a > 190 B15: # B7 B16 <- B14 Freq: 975.839 > 190 addl RSI, #8 # int > 193 cmpl RSI, R11 > 196 jl B7 # loop end P=0.998980 C=7836.000000 > > Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). > > I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. > > Roland. > From vladimir.kozlov at oracle.com Tue Dec 15 03:10:40 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 19:10:40 -0800 Subject: RFR(S): 8144850: C1: operator delete needs an implementation In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672287230@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672287230@DEWDFEMB19C.global.corp.sap> Message-ID: <566F84B0.5030603@oracle.com> Good. I am pushing it. Thanks, Vladimir On 12/9/15 7:49 AM, Doerr, Martin wrote: > Hi, > > unfortunately, I didn?t test the slow debug build when I overworked > JDK-8138890. > > Product and fastdebug build are working fine. > > However, we need another fix to support the slow debug build with xlC on > AIX. > > A webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144850_c1_delete/webrev.00/ > > It would be great if somebody could review and sponsor. > > Thanks and best regards, > > Martin > From christian.thalinger at oracle.com Tue Dec 15 03:11:23 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 14 Dec 2015 17:11:23 -1000 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: References: Message-ID: <13BE25EE-796F-4C15-A291-A4FC936FE14F@oracle.com> > On Nov 30, 2015, at 8:54 AM, Volker Simonis wrote: > > Hi, > > I'm currently trying to fix the String intrinsics > (SpecialStringCompareTo, SpecialStringEquals, SpecialStringIndexOf) on > ppc64 after change "8141132: JEP 254: Compact Strings" - at least for > the case where the strings are not compressed (i.e. UTF16 byte > arrays). > > For SpecialStringCompareTo and SpecialStringEquals this was not too > hard, but when I wanted to test the intrinsics for > SpecialStringIndexOf I realized that they never got generated. I think > it is because of this code in library_call.cpp > > bool LibraryCallKit::inline_string_indexOf(StrIntrinsicNode::ArgEnc ae) { > if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { > return false; > } > > If "UseSSE42Intrinsics" is false, which it always is on ppc64, > inline_string_indexOf() will always return false. > > So I wonder how this code is supposed to ever work on non-amd64 architectures? > > Moreover, "UseSSE42Intrinsics" is clearly a architecture-dependant > option. I already wondered that according to vm_version_aarch64.cpp it > seems to exists on aarch64 (is this really true Andrew?). But it's > surely not available on PowerPC, SPARC, ... > > So I think "UseSSE42Intrinsics" should be at least guarded with > X86_ONLY/AARCH64_ONLY in shared files. Probably even better would be > to move the check right into the Matcher. > > What do you think? That looks like bad design to me. We should find another way to do this. Your suggestion to move it into the matcher sounds reasonable. > > Regards, > Volker From vladimir.kozlov at oracle.com Tue Dec 15 05:37:38 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 21:37:38 -0800 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: References: Message-ID: <566FA722.3070302@oracle.com> Yes, we should not use "platform specific" flags in shared code. I think we should set SpecialString* flags based on presence corresponding ISA in vm_version_.cpp and remove UseSSE42Intrinsics flag and UseSSE checks in shared code and other places related to these intrinsics. Thanks, Vladimir On 11/30/15 10:54 AM, Volker Simonis wrote: > Hi, > > I'm currently trying to fix the String intrinsics > (SpecialStringCompareTo, SpecialStringEquals, SpecialStringIndexOf) on > ppc64 after change "8141132: JEP 254: Compact Strings" - at least for > the case where the strings are not compressed (i.e. UTF16 byte > arrays). > > For SpecialStringCompareTo and SpecialStringEquals this was not too > hard, but when I wanted to test the intrinsics for > SpecialStringIndexOf I realized that they never got generated. I think > it is because of this code in library_call.cpp > > bool LibraryCallKit::inline_string_indexOf(StrIntrinsicNode::ArgEnc ae) { > if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { > return false; > } > > If "UseSSE42Intrinsics" is false, which it always is on ppc64, > inline_string_indexOf() will always return false. > > So I wonder how this code is supposed to ever work on non-amd64 architectures? > > Moreover, "UseSSE42Intrinsics" is clearly a architecture-dependant > option. I already wondered that according to vm_version_aarch64.cpp it > seems to exists on aarch64 (is this really true Andrew?). But it's > surely not available on PowerPC, SPARC, ... > > So I think "UseSSE42Intrinsics" should be at least guarded with > X86_ONLY/AARCH64_ONLY in shared files. Probably even better would be > to move the check right into the Matcher. > > What do you think? > > Regards, > Volker > From tobias.hartmann at oracle.com Tue Dec 15 08:35:45 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Dec 2015 09:35:45 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: References: Message-ID: <566FD0E1.5000105@oracle.com> Hi Roland, unfortunately, your fixes for 8145322 and 8139771 break my prototype fix for JDK-6675699 but it's probably the easiest if I merge after you pushed your patches (I still need to address some performance problems). I recently hit the "incorrect size calculattion" assert in x86_32.ad and saw that you fixed the same problem for x86_64.ad (missing "break"). Could you fix it for 32 as well? There are two locations in "vec_stack_to_stack_helper" that miss a break: --- a/src/cpu/x86/vm/x86_32.ad Fri Dec 11 15:03:11 2015 +0300 +++ b/src/cpu/x86/vm/x86_32.ad Tue Dec 15 09:22:07 2015 +0100 @@ -1005,6 +1005,7 @@ __ vmovdqu(xmm0, Address(rsp, src_offset)); __ vmovdqu(Address(rsp, dst_offset), xmm0); __ vmovdqu(xmm0, Address(rsp, -32)); + break; case Op_VecZ: __ evmovdqul(Address(rsp, -64), xmm0, 2); __ evmovdqul(xmm0, Address(rsp, src_offset), 2); @@ -1045,6 +1046,7 @@ "vmovdqu [rsp + #%d], xmm0\n\t" "vmovdqu xmm0, [rsp - #32]", src_offset, dst_offset); + break; case Op_VecZ: st->print("vmovdqu [rsp - #64], xmm0\t# 512-bit mem-mem spill\n\t" "vmovdqu xmm0, [rsp + #%d]\n\t" Regarding your changes in superword.cpp, shouldn't it be possible for the invar to be of type long? I hit similar problems with JDK-6675699 after re-enabling the split-if optimization for ConvI2L and I planned to fix it like this (line 2993-2999): http://cr.openjdk.java.net/~thartmann/6675699/webrev.00/src/share/vm/opto/superword.cpp.sdiff.html (see JDK-8145313) Could you attach the assembly output with your fix? Best, Tobias On 14.12.2015 17:42, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8145322/webrev.00/ > > Paul spotted the following small inefficiencies: > > for (; wi < l; wi++) { > long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; > long av = U.getLongUnaligned(a, aOffset + bi); > long bv = U.getLongUnaligned(b, bOffset + bi); > if (av != bv) { > > is compiled to: > > 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 > 0b0 movl RDX, RDI # spill > 0b2 # castII of RDX > 0b2 movq RBX, [R9 + #16 + RDX << #3] # long > 0b7 movq RAX, [RSI + #16 + RDX << #3] # long > 0bc cmpq RBX, RAX > 0bf jne B28 P=0.000000 C=7836.000000 > 0bf > 0c5 B10: # B28 B11 <- B9 Freq: 977.66 > 0c5 movl RDX, RDI # spill > 0c7 incl RDX # int > 0c9 # castII of RDX > 0c9 movq RBX, [R9 + #16 + RDX << #3] # long > 0ce movq RAX, [RSI + #16 + RDX << #3] # long > 0d3 cmpq RBX, RAX > 0d6 jne B28 P=0.000000 C=7836.000000 > 0d6 > 0dc B11: # B28 B12 <- B10 Freq: 977.66 > 0dc movl RDX, RDI # spill > 0de addl RDX, #2 # int > 0e1 # castII of RDX > 0e1 movq RBX, [R9 + #16 + RDX << #3] # long > 0e6 movq RAX, [RSI + #16 + RDX << #3] # long > 0eb cmpq RBX, RAX > 0ee jne B28 P=0.000000 C=7836.000000 > 0ee > 0f4 B12: # B28 B13 <- B11 Freq: 977.659 > 0f4 movl RDX, RDI # spill > 0f6 addl RDX, #3 # int > 0f9 # castII of RDX > 0f9 movq RBX, [R9 + #16 + RDX << #3] # long > 0fe movq RAX, [RSI + #16 + RDX << #3] # long > 103 cmpq RBX, RAX > 106 jne B28 P=0.000000 C=7836.000000 > 106 > 10c B13: # B9 B14 <- B12 Freq: 977.659 > 10c addl RDI, #4 # int > 10f cmpl RDI, RBP > 111 jl,s B9 # loop end P=0.998980 C=7836.000000 > > But the intermediate increment of the induction variable: > 0c7 incl RDX # int > 0de addl RDX, #2 # int > 0f6 addl RDX, #3 # int > > should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. > > for (; wi < length >> valuesPerWidth; wi++) { > long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; > long av = U.getLongUnaligned(a, aOffset + bi); > long bv = U.getLongUnaligned(b, bOffset + bi); > if (av != bv) { > > 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 > 0b0 movslq R8, RSI # i2l > 0b3 movq RAX, [RDX + #16 + R8 << #3] # long > 0b8 movq RDI, [RBP + #16 + R8 << #3] # long > 0bd cmpq RAX, RDI > 0c0 jne B32 P=0.000000 C=7836.000000 > 0c0 > 0c6 B8: # B33 B9 <- B7 Freq: 975.842 > 0c6 movl R8, RSI # spill > 0c9 incl R8 # int > 0cc movslq RDI, R8 # i2l > 0cf movq RAX, [RDX + #16 + RDI << #3] # long > 0d4 movq RDI, [RBP + #16 + RDI << #3] # long > 0d9 cmpq RAX, RDI > 0dc jne B33 P=0.000000 C=7836.000000 > 0dc > 0e2 B9: # B33 B10 <- B8 Freq: 975.842 > 0e2 movl R8, RSI # spill > 0e5 addl R8, #2 # int > 0e9 movslq RDI, R8 # i2l > 0ec movq RAX, [RDX + #16 + RDI << #3] # long > 0f1 movq RDI, [RBP + #16 + RDI << #3] # long > 0f6 cmpq RAX, RDI > 0f9 jne B33 P=0.000000 C=7836.000000 > 0f9 > 0ff B10: # B33 B11 <- B9 Freq: 975.842 > 0ff movl R8, RSI # spill > 102 addl R8, #3 # int > 106 movslq RDI, R8 # i2l > 109 movq RAX, [RDX + #16 + RDI << #3] # long > 10e movq RDI, [RBP + #16 + RDI << #3] # long > 113 cmpq RAX, RDI > 116 jne B33 P=0.000000 C=7836.000000 > 116 > 11c B11: # B33 B12 <- B10 Freq: 975.841 > 11c movl R8, RSI # spill > 11f addl R8, #4 # int > 123 movslq RDI, R8 # i2l > 126 movq RAX, [RDX + #16 + RDI << #3] # long > 12b movq RDI, [RBP + #16 + RDI << #3] # long > 130 cmpq RAX, RDI > 133 jne B33 P=0.000000 C=7836.000000 > 133 > 139 B12: # B33 B13 <- B11 Freq: 975.841 > 139 movl R8, RSI # spill > 13c addl R8, #5 # int > 140 movslq RDI, R8 # i2l > 143 movq RAX, [RDX + #16 + RDI << #3] # long > 148 movq RDI, [RBP + #16 + RDI << #3] # long > 14d cmpq RAX, RDI > 150 jne B33 P=0.000000 C=7836.000000 > 150 > 156 B13: # B33 B14 <- B12 Freq: 975.84 > 156 movl R8, RSI # spill > 159 addl R8, #6 # int > 15d movslq RDI, R8 # i2l > 160 movq RAX, [RDX + #16 + RDI << #3] # long > 165 movq RDI, [RBP + #16 + RDI << #3] # long > 16a cmpq RAX, RDI > 16d jne B33 P=0.000000 C=7836.000000 > 16d > 173 B14: # B33 B15 <- B13 Freq: 975.84 > 173 movl R8, RSI # spill > 176 addl R8, #7 # int > 17a movslq RDI, R8 # i2l > 17d movq RAX, [RDX + #16 + RDI << #3] # long > 182 movq RDI, [RBP + #16 + RDI << #3] # long > 187 cmpq RAX, RDI > 18a jne B33 P=0.000000 C=7836.000000 > 18a > 190 B15: # B7 B16 <- B14 Freq: 975.839 > 190 addl RSI, #8 # int > 193 cmpl RSI, R11 > 196 jl B7 # loop end P=0.998980 C=7836.000000 > > Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). > > I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. > > Roland. > From roland.westrelin at oracle.com Tue Dec 15 08:55:08 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 15 Dec 2015 09:55:08 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <566F8177.8080000@oracle.com> References: <566F8177.8080000@oracle.com> Message-ID: <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> Hi Vladimir, Thanks for looking at this. > Second assembler output still have intermediate increments and also new movslq instructions. Why it should be better. I thinks there is some confusion here. There are 2 problems I?d like to fix. One is when using checkIndex. In that case, the code should be as good as regular array accesses. The first assembly dump shows it?s not. The second problem is when not using checkIndex but we know the loop bounds, should be able to do better. That?s the second assembly dump. In my email I only showed assembly without my change. With my change: first test case: 0c2 B11: # B37 B12 <- B8 B10 Loop: B11-B10 inner main of N142 Freq: 975.841 0c2 movq RAX, [RSI + #16 + RDI << #3] # long 0c7 movq RBX, [R9 + #16 + RDI << #3] # long 0cc cmpq RBX, RAX 0cf jne B37 P=0.000000 C=7836.000000 0cf 0d5 B12: # B38 B13 <- B11 Freq: 975.84 0d5 movq RAX, [RSI + #24 + RDI << #3] # long 0da movq RBX, [R9 + #24 + RDI << #3] # long 0df cmpq RBX, RAX 0e2 jne B38 P=0.000000 C=7836.000000 0e2 0e8 B13: # B40 B14 <- B12 Freq: 975.84 0e8 movq RAX, [RSI + #32 + RDI << #3] # long 0ed movq RBX, [R9 + #32 + RDI << #3] # long 0f2 cmpq RBX, RAX 0f5 jne B40 P=0.000000 C=7836.000000 0f5 0fb B14: # B42 B15 <- B13 Freq: 975.84 0fb movq RAX, [RSI + #40 + RDI << #3] # long 100 movq RBX, [R9 + #40 + RDI << #3] # long 105 cmpq RBX, RAX 108 jne B42 P=0.000000 C=7836.000000 108 10e B15: # B44 B16 <- B14 Freq: 975.839 10e movq RAX, [RSI + #48 + RDI << #3] # long 113 movq RBX, [R9 + #48 + RDI << #3] # long 118 movl RDX, RDI # spill 11a addl RDX, #4 # int 11d cmpq RBX, RAX 120 jne B44 P=0.000000 C=7836.000000 120 126 B16: # B39 B17 <- B15 Freq: 975.839 126 movq RAX, [RSI + #56 + RDI << #3] # long 12b movq RBX, [R9 + #56 + RDI << #3] # long 130 cmpq RBX, RAX 133 jne B39 P=0.000000 C=7836.000000 133 139 B17: # B41 B18 <- B16 Freq: 975.838 139 movq RAX, [RSI + #64 + RDI << #3] # long 13e movq RBX, [R9 + #64 + RDI << #3] # long 143 cmpq RBX, RAX 146 jne B41 P=0.000000 C=7836.000000 146 14c B18: # B43 B19 <- B17 Freq: 975.838 14c movq RAX, [RSI + #72 + RDI << #3] # long 151 movq RBX, [R9 + #72 + RDI << #3] # long 156 cmpq RBX, RAX 159 jne B43 P=0.000000 C=7836.000000 159 15f B19: # B10 B20 <- B18 Freq: 975.837 15f movl RDX, RDI # spill 161 addl RDX, #8 # int 164 cmpl RDX, RBP 166 jl B10 # loop end P=0.998980 C=7836.000000 second test case: 0a3 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 0a3 movq RDI, [RBP + #16 + RSI << #3] # long 0a8 movq RAX, [RDX + #16 + RSI << #3] # long 0ad cmpq RAX, RDI 0b0 jne B32 P=0.000000 C=7836.000000 0b0 0b6 B8: # B33 B9 <- B7 Freq: 975.842 0b6 movq RDI, [RBP + #24 + RSI << #3] # long 0bb movq RAX, [RDX + #24 + RSI << #3] # long 0c0 cmpq RAX, RDI 0c3 jne B33 P=0.000000 C=7836.000000 0c3 0c9 B9: # B35 B10 <- B8 Freq: 975.842 0c9 movq RDI, [RBP + #32 + RSI << #3] # long 0ce movq RAX, [RDX + #32 + RSI << #3] # long 0d3 cmpq RAX, RDI 0d6 jne B35 P=0.000000 C=7836.000000 0d6 0dc B10: # B39 B11 <- B9 Freq: 975.842 0dc movq RDI, [RBP + #40 + RSI << #3] # long 0e1 movq RAX, [RDX + #40 + RSI << #3] # long 0e6 cmpq RAX, RDI 0e9 jne B39 P=0.000000 C=7836.000000 0e9 0ef B11: # B38 B12 <- B10 Freq: 975.841 0ef movq RDI, [RBP + #48 + RSI << #3] # long 0f4 movq RAX, [RDX + #48 + RSI << #3] # long 0f9 movl R8, RSI # spill 0fc addl R8, #4 # int 100 cmpq RAX, RDI 103 jne B38 P=0.000000 C=7836.000000 103 109 B12: # B34 B13 <- B11 Freq: 975.841 109 movq RDI, [RBP + #56 + RSI << #3] # long 10e movq RAX, [RDX + #56 + RSI << #3] # long 113 cmpq RAX, RDI 116 jne B34 P=0.000000 C=7836.000000 116 11c B13: # B36 B14 <- B12 Freq: 975.84 11c movq RDI, [RBP + #64 + RSI << #3] # long 121 movq RAX, [RDX + #64 + RSI << #3] # long 126 cmpq RAX, RDI 129 jne B36 P=0.000000 C=7836.000000 129 12f B14: # B38 B15 <- B13 Freq: 975.84 12f movq RDI, [RBP + #72 + RSI << #3] # long 134 movq RAX, [RDX + #72 + RSI << #3] # long 139 movl R8, RSI # spill 13c addl R8, #7 # int 140 cmpq RAX, RDI 143 jne B38 P=0.000000 C=7836.000000 143 149 B15: # B7 B16 <- B14 Freq: 975.839 149 addl RSI, #8 # int 14c cmpl RSI, R11 14f jl B7 # loop end P=0.998980 C=7836.000000 Roland. > > Thanks, > Vladimir > > On 12/14/15 8:42 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >> >> Paul spotted the following small inefficiencies: >> >> for (; wi < l; wi++) { >> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >> long av = U.getLongUnaligned(a, aOffset + bi); >> long bv = U.getLongUnaligned(b, bOffset + bi); >> if (av != bv) { >> >> is compiled to: >> >> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >> 0b0 movl RDX, RDI # spill >> 0b2 # castII of RDX >> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >> 0bc cmpq RBX, RAX >> 0bf jne B28 P=0.000000 C=7836.000000 >> 0bf >> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >> 0c5 movl RDX, RDI # spill >> 0c7 incl RDX # int >> 0c9 # castII of RDX >> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >> 0d3 cmpq RBX, RAX >> 0d6 jne B28 P=0.000000 C=7836.000000 >> 0d6 >> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >> 0dc movl RDX, RDI # spill >> 0de addl RDX, #2 # int >> 0e1 # castII of RDX >> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >> 0eb cmpq RBX, RAX >> 0ee jne B28 P=0.000000 C=7836.000000 >> 0ee >> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >> 0f4 movl RDX, RDI # spill >> 0f6 addl RDX, #3 # int >> 0f9 # castII of RDX >> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >> 103 cmpq RBX, RAX >> 106 jne B28 P=0.000000 C=7836.000000 >> 106 >> 10c B13: # B9 B14 <- B12 Freq: 977.659 >> 10c addl RDI, #4 # int >> 10f cmpl RDI, RBP >> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >> >> But the intermediate increment of the induction variable: >> 0c7 incl RDX # int >> 0de addl RDX, #2 # int >> 0f6 addl RDX, #3 # int >> >> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >> >> for (; wi < length >> valuesPerWidth; wi++) { >> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >> long av = U.getLongUnaligned(a, aOffset + bi); >> long bv = U.getLongUnaligned(b, bOffset + bi); >> if (av != bv) { >> >> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >> 0b0 movslq R8, RSI # i2l >> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >> 0bd cmpq RAX, RDI >> 0c0 jne B32 P=0.000000 C=7836.000000 >> 0c0 >> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >> 0c6 movl R8, RSI # spill >> 0c9 incl R8 # int >> 0cc movslq RDI, R8 # i2l >> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >> 0d9 cmpq RAX, RDI >> 0dc jne B33 P=0.000000 C=7836.000000 >> 0dc >> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >> 0e2 movl R8, RSI # spill >> 0e5 addl R8, #2 # int >> 0e9 movslq RDI, R8 # i2l >> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >> 0f6 cmpq RAX, RDI >> 0f9 jne B33 P=0.000000 C=7836.000000 >> 0f9 >> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >> 0ff movl R8, RSI # spill >> 102 addl R8, #3 # int >> 106 movslq RDI, R8 # i2l >> 109 movq RAX, [RDX + #16 + RDI << #3] # long >> 10e movq RDI, [RBP + #16 + RDI << #3] # long >> 113 cmpq RAX, RDI >> 116 jne B33 P=0.000000 C=7836.000000 >> 116 >> 11c B11: # B33 B12 <- B10 Freq: 975.841 >> 11c movl R8, RSI # spill >> 11f addl R8, #4 # int >> 123 movslq RDI, R8 # i2l >> 126 movq RAX, [RDX + #16 + RDI << #3] # long >> 12b movq RDI, [RBP + #16 + RDI << #3] # long >> 130 cmpq RAX, RDI >> 133 jne B33 P=0.000000 C=7836.000000 >> 133 >> 139 B12: # B33 B13 <- B11 Freq: 975.841 >> 139 movl R8, RSI # spill >> 13c addl R8, #5 # int >> 140 movslq RDI, R8 # i2l >> 143 movq RAX, [RDX + #16 + RDI << #3] # long >> 148 movq RDI, [RBP + #16 + RDI << #3] # long >> 14d cmpq RAX, RDI >> 150 jne B33 P=0.000000 C=7836.000000 >> 150 >> 156 B13: # B33 B14 <- B12 Freq: 975.84 >> 156 movl R8, RSI # spill >> 159 addl R8, #6 # int >> 15d movslq RDI, R8 # i2l >> 160 movq RAX, [RDX + #16 + RDI << #3] # long >> 165 movq RDI, [RBP + #16 + RDI << #3] # long >> 16a cmpq RAX, RDI >> 16d jne B33 P=0.000000 C=7836.000000 >> 16d >> 173 B14: # B33 B15 <- B13 Freq: 975.84 >> 173 movl R8, RSI # spill >> 176 addl R8, #7 # int >> 17a movslq RDI, R8 # i2l >> 17d movq RAX, [RDX + #16 + RDI << #3] # long >> 182 movq RDI, [RBP + #16 + RDI << #3] # long >> 187 cmpq RAX, RDI >> 18a jne B33 P=0.000000 C=7836.000000 >> 18a >> 190 B15: # B7 B16 <- B14 Freq: 975.839 >> 190 addl RSI, #8 # int >> 193 cmpl RSI, R11 >> 196 jl B7 # loop end P=0.998980 C=7836.000000 >> >> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >> >> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >> >> Roland. >> From roland.westrelin at oracle.com Tue Dec 15 08:58:30 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 15 Dec 2015 09:58:30 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <566FD0E1.5000105@oracle.com> References: <566FD0E1.5000105@oracle.com> Message-ID: <42C680DD-DC76-446B-9618-BB3A7AA8FC85@oracle.com> Hi Tobias, Thanks for looking at this. > unfortunately, your fixes for 8145322 and 8139771 break my prototype fix for JDK-6675699 but it's probably the easiest if I merge after you pushed your patches (I still need to address some performance problems). > > I recently hit the "incorrect size calculattion" assert in x86_32.ad and saw that you fixed the same problem for x86_64.ad (missing "break"). Could you fix it for 32 as well? There are two locations in "vec_stack_to_stack_helper" that miss a break: > > --- a/src/cpu/x86/vm/x86_32.ad Fri Dec 11 15:03:11 2015 +0300 > +++ b/src/cpu/x86/vm/x86_32.ad Tue Dec 15 09:22:07 2015 +0100 > @@ -1005,6 +1005,7 @@ > __ vmovdqu(xmm0, Address(rsp, src_offset)); > __ vmovdqu(Address(rsp, dst_offset), xmm0); > __ vmovdqu(xmm0, Address(rsp, -32)); > + break; > case Op_VecZ: > __ evmovdqul(Address(rsp, -64), xmm0, 2); > __ evmovdqul(xmm0, Address(rsp, src_offset), 2); > @@ -1045,6 +1046,7 @@ > "vmovdqu [rsp + #%d], xmm0\n\t" > "vmovdqu xmm0, [rsp - #32]", > src_offset, dst_offset); > + break; > case Op_VecZ: > st->print("vmovdqu [rsp - #64], xmm0\t# 512-bit mem-mem spill\n\t" > "vmovdqu xmm0, [rsp + #%d]\n\t? Sure. I didn?t notice the same problem existed on 32bits. > Regarding your changes in superword.cpp, shouldn't it be possible for the invar to be of type long? I hit similar problems with JDK-6675699 after re-enabling the split-if optimization for ConvI2L and I planned to fix it like this (line 2993-2999): > http://cr.openjdk.java.net/~thartmann/6675699/webrev.00/src/share/vm/opto/superword.cpp.sdiff.html > (see JDK-8145313) But then you have to guarantee the long value fits in an integer otherwise a ConvL2I would not be correct here, right? Ending with a long invar is very uncommon in my testing. > Could you attach the assembly output with your fix? See my reply to Vladimir. Roland. > > Best, > Tobias > > On 14.12.2015 17:42, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >> >> Paul spotted the following small inefficiencies: >> >> for (; wi < l; wi++) { >> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >> long av = U.getLongUnaligned(a, aOffset + bi); >> long bv = U.getLongUnaligned(b, bOffset + bi); >> if (av != bv) { >> >> is compiled to: >> >> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >> 0b0 movl RDX, RDI # spill >> 0b2 # castII of RDX >> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >> 0bc cmpq RBX, RAX >> 0bf jne B28 P=0.000000 C=7836.000000 >> 0bf >> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >> 0c5 movl RDX, RDI # spill >> 0c7 incl RDX # int >> 0c9 # castII of RDX >> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >> 0d3 cmpq RBX, RAX >> 0d6 jne B28 P=0.000000 C=7836.000000 >> 0d6 >> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >> 0dc movl RDX, RDI # spill >> 0de addl RDX, #2 # int >> 0e1 # castII of RDX >> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >> 0eb cmpq RBX, RAX >> 0ee jne B28 P=0.000000 C=7836.000000 >> 0ee >> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >> 0f4 movl RDX, RDI # spill >> 0f6 addl RDX, #3 # int >> 0f9 # castII of RDX >> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >> 103 cmpq RBX, RAX >> 106 jne B28 P=0.000000 C=7836.000000 >> 106 >> 10c B13: # B9 B14 <- B12 Freq: 977.659 >> 10c addl RDI, #4 # int >> 10f cmpl RDI, RBP >> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >> >> But the intermediate increment of the induction variable: >> 0c7 incl RDX # int >> 0de addl RDX, #2 # int >> 0f6 addl RDX, #3 # int >> >> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >> >> for (; wi < length >> valuesPerWidth; wi++) { >> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >> long av = U.getLongUnaligned(a, aOffset + bi); >> long bv = U.getLongUnaligned(b, bOffset + bi); >> if (av != bv) { >> >> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >> 0b0 movslq R8, RSI # i2l >> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >> 0bd cmpq RAX, RDI >> 0c0 jne B32 P=0.000000 C=7836.000000 >> 0c0 >> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >> 0c6 movl R8, RSI # spill >> 0c9 incl R8 # int >> 0cc movslq RDI, R8 # i2l >> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >> 0d9 cmpq RAX, RDI >> 0dc jne B33 P=0.000000 C=7836.000000 >> 0dc >> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >> 0e2 movl R8, RSI # spill >> 0e5 addl R8, #2 # int >> 0e9 movslq RDI, R8 # i2l >> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >> 0f6 cmpq RAX, RDI >> 0f9 jne B33 P=0.000000 C=7836.000000 >> 0f9 >> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >> 0ff movl R8, RSI # spill >> 102 addl R8, #3 # int >> 106 movslq RDI, R8 # i2l >> 109 movq RAX, [RDX + #16 + RDI << #3] # long >> 10e movq RDI, [RBP + #16 + RDI << #3] # long >> 113 cmpq RAX, RDI >> 116 jne B33 P=0.000000 C=7836.000000 >> 116 >> 11c B11: # B33 B12 <- B10 Freq: 975.841 >> 11c movl R8, RSI # spill >> 11f addl R8, #4 # int >> 123 movslq RDI, R8 # i2l >> 126 movq RAX, [RDX + #16 + RDI << #3] # long >> 12b movq RDI, [RBP + #16 + RDI << #3] # long >> 130 cmpq RAX, RDI >> 133 jne B33 P=0.000000 C=7836.000000 >> 133 >> 139 B12: # B33 B13 <- B11 Freq: 975.841 >> 139 movl R8, RSI # spill >> 13c addl R8, #5 # int >> 140 movslq RDI, R8 # i2l >> 143 movq RAX, [RDX + #16 + RDI << #3] # long >> 148 movq RDI, [RBP + #16 + RDI << #3] # long >> 14d cmpq RAX, RDI >> 150 jne B33 P=0.000000 C=7836.000000 >> 150 >> 156 B13: # B33 B14 <- B12 Freq: 975.84 >> 156 movl R8, RSI # spill >> 159 addl R8, #6 # int >> 15d movslq RDI, R8 # i2l >> 160 movq RAX, [RDX + #16 + RDI << #3] # long >> 165 movq RDI, [RBP + #16 + RDI << #3] # long >> 16a cmpq RAX, RDI >> 16d jne B33 P=0.000000 C=7836.000000 >> 16d >> 173 B14: # B33 B15 <- B13 Freq: 975.84 >> 173 movl R8, RSI # spill >> 176 addl R8, #7 # int >> 17a movslq RDI, R8 # i2l >> 17d movq RAX, [RDX + #16 + RDI << #3] # long >> 182 movq RDI, [RBP + #16 + RDI << #3] # long >> 187 cmpq RAX, RDI >> 18a jne B33 P=0.000000 C=7836.000000 >> 18a >> 190 B15: # B7 B16 <- B14 Freq: 975.839 >> 190 addl RSI, #8 # int >> 193 cmpl RSI, R11 >> 196 jl B7 # loop end P=0.998980 C=7836.000000 >> >> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >> >> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >> >> Roland. >> From aleksey.shipilev at oracle.com Tue Dec 15 09:05:44 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 15 Dec 2015 12:05:44 +0300 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566F7D82.6030806@oracle.com> References: <566F7D82.6030806@oracle.com> Message-ID: <566FD7E8.7000105@oracle.com> Also, I think this is a duplicate of: https://bugs.openjdk.java.net/browse/JDK-8032481 -Aleksey On 12/15/2015 05:40 AM, Vladimir Kozlov wrote: > Very interesting! > > Please, add short statement to the comment in /macro.cpp for your case. > > Changes looks fine to me. One nit could be to delay bytecode analysis > until macro expansion - it may reduce compilation time. Bytecode > analysis of each constructor could be expensive. > > Thanks, > Vladimir > > On 12/10/15 6:48 AM, Hui Shi wrote: >> Hi All, >> >> >> Could some one help comments this change? >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 >> >> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ >> >> >> This patch aims to remove redundant memory barrier after allocation >> node, on AArch64 it removes redundant dmb when creating object. The >> motivation is dmb instructions after commonly used object allocation, >> for example string and boxing objects is redundant with dmb inserted for >> final field write. In following small case:____ >> >> __ __ >> >> String foo(String s)____ >> >> {____ >> >> String copy = new String(s);____ >> >> return copy;____ >> >> }____ >> >> __ __ >> >> There are two dmb instructions in generated code. First one is >> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. >> Second one is membar_release, inserted at exit of initializer method as >> final fields write happens. Allocated String doesn't escape in String >> initializer method, membar_release includes membar_storestore semantic. >> So first one can be removed safely.____ >> >> __ __ >> >> 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ >> >> 0x0000007f85bbfa90: str xzr, [x0,#16]____ >> >> 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ >> >> ....____ >> >> ____ >> >> 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ >> >> 0x0000007fa01d83c4: ldr w12, [x20,#16]____ >> >> 0x0000007fa01d83c8: ldr x11, [sp,#8]____ >> >> 0x0000007fa01d83cc: strb w10, [x11,#20]____ >> >> 0x0000007fa01d83d0: str w12, [x11,#16]____ >> >> 0x0000007fa01d83d4: dmb ish // second dmb____ >> >> __ __ >> >> >> Patch targets this pattern and remove redundant memory barrier for >> allocation node.____ >> >> 1. When inserting memory barrier for final field write. If final fields' >> object allocation node is available, invoke >> AllocationNode::compute_MemBar_redundancy(initializer method).____ >> >> 2. In AllocationNode:____ >> >> 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate >> if memory barrier after allocation node is redundant.____ >> >> 2.2 Add method compute_MemBar_redundancy, set >> _is_allocation_MemBar_redundant true if first parameter "this" does >> not escape in initializer method according to BCEscapeAnalyzer.____ >> >> 3. skip inserting memory barrier in >> PhaseMacroExpand::expand_allocate_common, when AllocationNode's >> _is_allocation_MemBar_redundant flagis true. >> >> >> Regards >> >> Hui >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Dec 15 09:08:33 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Dec 2015 10:08:33 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <42C680DD-DC76-446B-9618-BB3A7AA8FC85@oracle.com> References: <566FD0E1.5000105@oracle.com> <42C680DD-DC76-446B-9618-BB3A7AA8FC85@oracle.com> Message-ID: <566FD891.6000304@oracle.com> On 15.12.2015 09:58, Roland Westrelin wrote: > Hi Tobias, > > Thanks for looking at this. > >> unfortunately, your fixes for 8145322 and 8139771 break my prototype fix for JDK-6675699 but it's probably the easiest if I merge after you pushed your patches (I still need to address some performance problems). >> >> I recently hit the "incorrect size calculattion" assert in x86_32.ad and saw that you fixed the same problem for x86_64.ad (missing "break"). Could you fix it for 32 as well? There are two locations in "vec_stack_to_stack_helper" that miss a break: >> >> --- a/src/cpu/x86/vm/x86_32.ad Fri Dec 11 15:03:11 2015 +0300 >> +++ b/src/cpu/x86/vm/x86_32.ad Tue Dec 15 09:22:07 2015 +0100 >> @@ -1005,6 +1005,7 @@ >> __ vmovdqu(xmm0, Address(rsp, src_offset)); >> __ vmovdqu(Address(rsp, dst_offset), xmm0); >> __ vmovdqu(xmm0, Address(rsp, -32)); >> + break; >> case Op_VecZ: >> __ evmovdqul(Address(rsp, -64), xmm0, 2); >> __ evmovdqul(xmm0, Address(rsp, src_offset), 2); >> @@ -1045,6 +1046,7 @@ >> "vmovdqu [rsp + #%d], xmm0\n\t" >> "vmovdqu xmm0, [rsp - #32]", >> src_offset, dst_offset); >> + break; >> case Op_VecZ: >> st->print("vmovdqu [rsp - #64], xmm0\t# 512-bit mem-mem spill\n\t" >> "vmovdqu xmm0, [rsp + #%d]\n\t? > > Sure. I didn?t notice the same problem existed on 32bits. > >> Regarding your changes in superword.cpp, shouldn't it be possible for the invar to be of type long? I hit similar problems with JDK-6675699 after re-enabling the split-if optimization for ConvI2L and I planned to fix it like this (line 2993-2999): >> http://cr.openjdk.java.net/~thartmann/6675699/webrev.00/src/share/vm/opto/superword.cpp.sdiff.html >> (see JDK-8145313) > > But then you have to guarantee the long value fits in an integer otherwise a ConvL2I would not be correct here, right? Right. I assumed that this is always the case because the long value originated from a ConvI2L but the invariant part of the address computation may probably as well be an arbitrary long. > Ending with a long invar is very uncommon in my testing. Yes, I also only ever hit it once. I think your fix is fine then. >> Could you attach the assembly output with your fix? > > See my reply to Vladimir. Thanks! Tobias > > Roland. > >> >> Best, >> Tobias >> >> On 14.12.2015 17:42, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >>> >>> Paul spotted the following small inefficiencies: >>> >>> for (; wi < l; wi++) { >>> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> is compiled to: >>> >>> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >>> 0b0 movl RDX, RDI # spill >>> 0b2 # castII of RDX >>> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0bc cmpq RBX, RAX >>> 0bf jne B28 P=0.000000 C=7836.000000 >>> 0bf >>> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >>> 0c5 movl RDX, RDI # spill >>> 0c7 incl RDX # int >>> 0c9 # castII of RDX >>> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >>> 0d3 cmpq RBX, RAX >>> 0d6 jne B28 P=0.000000 C=7836.000000 >>> 0d6 >>> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >>> 0dc movl RDX, RDI # spill >>> 0de addl RDX, #2 # int >>> 0e1 # castII of RDX >>> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0eb cmpq RBX, RAX >>> 0ee jne B28 P=0.000000 C=7836.000000 >>> 0ee >>> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >>> 0f4 movl RDX, RDI # spill >>> 0f6 addl RDX, #3 # int >>> 0f9 # castII of RDX >>> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >>> 103 cmpq RBX, RAX >>> 106 jne B28 P=0.000000 C=7836.000000 >>> 106 >>> 10c B13: # B9 B14 <- B12 Freq: 977.659 >>> 10c addl RDI, #4 # int >>> 10f cmpl RDI, RBP >>> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >>> >>> But the intermediate increment of the induction variable: >>> 0c7 incl RDX # int >>> 0de addl RDX, #2 # int >>> 0f6 addl RDX, #3 # int >>> >>> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >>> >>> for (; wi < length >> valuesPerWidth; wi++) { >>> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >>> 0b0 movslq R8, RSI # i2l >>> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >>> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >>> 0bd cmpq RAX, RDI >>> 0c0 jne B32 P=0.000000 C=7836.000000 >>> 0c0 >>> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >>> 0c6 movl R8, RSI # spill >>> 0c9 incl R8 # int >>> 0cc movslq RDI, R8 # i2l >>> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >>> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0d9 cmpq RAX, RDI >>> 0dc jne B33 P=0.000000 C=7836.000000 >>> 0dc >>> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >>> 0e2 movl R8, RSI # spill >>> 0e5 addl R8, #2 # int >>> 0e9 movslq RDI, R8 # i2l >>> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >>> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0f6 cmpq RAX, RDI >>> 0f9 jne B33 P=0.000000 C=7836.000000 >>> 0f9 >>> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >>> 0ff movl R8, RSI # spill >>> 102 addl R8, #3 # int >>> 106 movslq RDI, R8 # i2l >>> 109 movq RAX, [RDX + #16 + RDI << #3] # long >>> 10e movq RDI, [RBP + #16 + RDI << #3] # long >>> 113 cmpq RAX, RDI >>> 116 jne B33 P=0.000000 C=7836.000000 >>> 116 >>> 11c B11: # B33 B12 <- B10 Freq: 975.841 >>> 11c movl R8, RSI # spill >>> 11f addl R8, #4 # int >>> 123 movslq RDI, R8 # i2l >>> 126 movq RAX, [RDX + #16 + RDI << #3] # long >>> 12b movq RDI, [RBP + #16 + RDI << #3] # long >>> 130 cmpq RAX, RDI >>> 133 jne B33 P=0.000000 C=7836.000000 >>> 133 >>> 139 B12: # B33 B13 <- B11 Freq: 975.841 >>> 139 movl R8, RSI # spill >>> 13c addl R8, #5 # int >>> 140 movslq RDI, R8 # i2l >>> 143 movq RAX, [RDX + #16 + RDI << #3] # long >>> 148 movq RDI, [RBP + #16 + RDI << #3] # long >>> 14d cmpq RAX, RDI >>> 150 jne B33 P=0.000000 C=7836.000000 >>> 150 >>> 156 B13: # B33 B14 <- B12 Freq: 975.84 >>> 156 movl R8, RSI # spill >>> 159 addl R8, #6 # int >>> 15d movslq RDI, R8 # i2l >>> 160 movq RAX, [RDX + #16 + RDI << #3] # long >>> 165 movq RDI, [RBP + #16 + RDI << #3] # long >>> 16a cmpq RAX, RDI >>> 16d jne B33 P=0.000000 C=7836.000000 >>> 16d >>> 173 B14: # B33 B15 <- B13 Freq: 975.84 >>> 173 movl R8, RSI # spill >>> 176 addl R8, #7 # int >>> 17a movslq RDI, R8 # i2l >>> 17d movq RAX, [RDX + #16 + RDI << #3] # long >>> 182 movq RDI, [RBP + #16 + RDI << #3] # long >>> 187 cmpq RAX, RDI >>> 18a jne B33 P=0.000000 C=7836.000000 >>> 18a >>> 190 B15: # B7 B16 <- B14 Freq: 975.839 >>> 190 addl RSI, #8 # int >>> 193 cmpl RSI, R11 >>> 196 jl B7 # loop end P=0.998980 C=7836.000000 >>> >>> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >>> >>> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >>> >>> Roland. >>> > From tobias.hartmann at oracle.com Tue Dec 15 09:13:32 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Dec 2015 10:13:32 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> References: <566F8177.8080000@oracle.com> <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> Message-ID: <566FD9BC.5070309@oracle.com> Hi Roland, thanks for the assembly snippets. Looks good to me! Best, Tobias On 15.12.2015 09:55, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> Second assembler output still have intermediate increments and also new movslq instructions. Why it should be better. > > I thinks there is some confusion here. There are 2 problems I?d like to fix. One is when using checkIndex. In that case, the code should be as good as regular array accesses. The first assembly dump shows it?s not. The second problem is when not using checkIndex but we know the loop bounds, should be able to do better. That?s the second assembly dump. In my email I only showed assembly without my change. With my change: > > first test case: > > 0c2 B11: # B37 B12 <- B8 B10 Loop: B11-B10 inner main of N142 Freq: 975.841 > 0c2 movq RAX, [RSI + #16 + RDI << #3] # long > 0c7 movq RBX, [R9 + #16 + RDI << #3] # long > 0cc cmpq RBX, RAX > 0cf jne B37 P=0.000000 C=7836.000000 > 0cf > 0d5 B12: # B38 B13 <- B11 Freq: 975.84 > 0d5 movq RAX, [RSI + #24 + RDI << #3] # long > 0da movq RBX, [R9 + #24 + RDI << #3] # long > 0df cmpq RBX, RAX > 0e2 jne B38 P=0.000000 C=7836.000000 > 0e2 > 0e8 B13: # B40 B14 <- B12 Freq: 975.84 > 0e8 movq RAX, [RSI + #32 + RDI << #3] # long > 0ed movq RBX, [R9 + #32 + RDI << #3] # long > 0f2 cmpq RBX, RAX > 0f5 jne B40 P=0.000000 C=7836.000000 > 0f5 > 0fb B14: # B42 B15 <- B13 Freq: 975.84 > 0fb movq RAX, [RSI + #40 + RDI << #3] # long > 100 movq RBX, [R9 + #40 + RDI << #3] # long > 105 cmpq RBX, RAX > 108 jne B42 P=0.000000 C=7836.000000 > 108 > 10e B15: # B44 B16 <- B14 Freq: 975.839 > 10e movq RAX, [RSI + #48 + RDI << #3] # long > 113 movq RBX, [R9 + #48 + RDI << #3] # long > 118 movl RDX, RDI # spill > 11a addl RDX, #4 # int > 11d cmpq RBX, RAX > 120 jne B44 P=0.000000 C=7836.000000 > 120 > 126 B16: # B39 B17 <- B15 Freq: 975.839 > 126 movq RAX, [RSI + #56 + RDI << #3] # long > 12b movq RBX, [R9 + #56 + RDI << #3] # long > 130 cmpq RBX, RAX > 133 jne B39 P=0.000000 C=7836.000000 > 133 > 139 B17: # B41 B18 <- B16 Freq: 975.838 > 139 movq RAX, [RSI + #64 + RDI << #3] # long > 13e movq RBX, [R9 + #64 + RDI << #3] # long > 143 cmpq RBX, RAX > 146 jne B41 P=0.000000 C=7836.000000 > 146 > 14c B18: # B43 B19 <- B17 Freq: 975.838 > 14c movq RAX, [RSI + #72 + RDI << #3] # long > 151 movq RBX, [R9 + #72 + RDI << #3] # long > 156 cmpq RBX, RAX > 159 jne B43 P=0.000000 C=7836.000000 > 159 > 15f B19: # B10 B20 <- B18 Freq: 975.837 > 15f movl RDX, RDI # spill > 161 addl RDX, #8 # int > 164 cmpl RDX, RBP > 166 jl B10 # loop end P=0.998980 C=7836.000000 > > > > second test case: > > 0a3 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 > 0a3 movq RDI, [RBP + #16 + RSI << #3] # long > 0a8 movq RAX, [RDX + #16 + RSI << #3] # long > 0ad cmpq RAX, RDI > 0b0 jne B32 P=0.000000 C=7836.000000 > 0b0 > 0b6 B8: # B33 B9 <- B7 Freq: 975.842 > 0b6 movq RDI, [RBP + #24 + RSI << #3] # long > 0bb movq RAX, [RDX + #24 + RSI << #3] # long > 0c0 cmpq RAX, RDI > 0c3 jne B33 P=0.000000 C=7836.000000 > 0c3 > 0c9 B9: # B35 B10 <- B8 Freq: 975.842 > 0c9 movq RDI, [RBP + #32 + RSI << #3] # long > 0ce movq RAX, [RDX + #32 + RSI << #3] # long > 0d3 cmpq RAX, RDI > 0d6 jne B35 P=0.000000 C=7836.000000 > 0d6 > 0dc B10: # B39 B11 <- B9 Freq: 975.842 > 0dc movq RDI, [RBP + #40 + RSI << #3] # long > 0e1 movq RAX, [RDX + #40 + RSI << #3] # long > 0e6 cmpq RAX, RDI > 0e9 jne B39 P=0.000000 C=7836.000000 > 0e9 > 0ef B11: # B38 B12 <- B10 Freq: 975.841 > 0ef movq RDI, [RBP + #48 + RSI << #3] # long > 0f4 movq RAX, [RDX + #48 + RSI << #3] # long > 0f9 movl R8, RSI # spill > 0fc addl R8, #4 # int > 100 cmpq RAX, RDI > 103 jne B38 P=0.000000 C=7836.000000 > 103 > 109 B12: # B34 B13 <- B11 Freq: 975.841 > 109 movq RDI, [RBP + #56 + RSI << #3] # long > 10e movq RAX, [RDX + #56 + RSI << #3] # long > 113 cmpq RAX, RDI > 116 jne B34 P=0.000000 C=7836.000000 > 116 > 11c B13: # B36 B14 <- B12 Freq: 975.84 > 11c movq RDI, [RBP + #64 + RSI << #3] # long > 121 movq RAX, [RDX + #64 + RSI << #3] # long > 126 cmpq RAX, RDI > 129 jne B36 P=0.000000 C=7836.000000 > 129 > 12f B14: # B38 B15 <- B13 Freq: 975.84 > 12f movq RDI, [RBP + #72 + RSI << #3] # long > 134 movq RAX, [RDX + #72 + RSI << #3] # long > 139 movl R8, RSI # spill > 13c addl R8, #7 # int > 140 cmpq RAX, RDI > 143 jne B38 P=0.000000 C=7836.000000 > 143 > 149 B15: # B7 B16 <- B14 Freq: 975.839 > 149 addl RSI, #8 # int > 14c cmpl RSI, R11 > 14f jl B7 # loop end P=0.998980 C=7836.000000 > > Roland. > >> >> Thanks, >> Vladimir >> >> On 12/14/15 8:42 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >>> >>> Paul spotted the following small inefficiencies: >>> >>> for (; wi < l; wi++) { >>> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> is compiled to: >>> >>> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >>> 0b0 movl RDX, RDI # spill >>> 0b2 # castII of RDX >>> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0bc cmpq RBX, RAX >>> 0bf jne B28 P=0.000000 C=7836.000000 >>> 0bf >>> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >>> 0c5 movl RDX, RDI # spill >>> 0c7 incl RDX # int >>> 0c9 # castII of RDX >>> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >>> 0d3 cmpq RBX, RAX >>> 0d6 jne B28 P=0.000000 C=7836.000000 >>> 0d6 >>> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >>> 0dc movl RDX, RDI # spill >>> 0de addl RDX, #2 # int >>> 0e1 # castII of RDX >>> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0eb cmpq RBX, RAX >>> 0ee jne B28 P=0.000000 C=7836.000000 >>> 0ee >>> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >>> 0f4 movl RDX, RDI # spill >>> 0f6 addl RDX, #3 # int >>> 0f9 # castII of RDX >>> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >>> 103 cmpq RBX, RAX >>> 106 jne B28 P=0.000000 C=7836.000000 >>> 106 >>> 10c B13: # B9 B14 <- B12 Freq: 977.659 >>> 10c addl RDI, #4 # int >>> 10f cmpl RDI, RBP >>> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >>> >>> But the intermediate increment of the induction variable: >>> 0c7 incl RDX # int >>> 0de addl RDX, #2 # int >>> 0f6 addl RDX, #3 # int >>> >>> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >>> >>> for (; wi < length >> valuesPerWidth; wi++) { >>> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >>> 0b0 movslq R8, RSI # i2l >>> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >>> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >>> 0bd cmpq RAX, RDI >>> 0c0 jne B32 P=0.000000 C=7836.000000 >>> 0c0 >>> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >>> 0c6 movl R8, RSI # spill >>> 0c9 incl R8 # int >>> 0cc movslq RDI, R8 # i2l >>> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >>> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0d9 cmpq RAX, RDI >>> 0dc jne B33 P=0.000000 C=7836.000000 >>> 0dc >>> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >>> 0e2 movl R8, RSI # spill >>> 0e5 addl R8, #2 # int >>> 0e9 movslq RDI, R8 # i2l >>> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >>> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0f6 cmpq RAX, RDI >>> 0f9 jne B33 P=0.000000 C=7836.000000 >>> 0f9 >>> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >>> 0ff movl R8, RSI # spill >>> 102 addl R8, #3 # int >>> 106 movslq RDI, R8 # i2l >>> 109 movq RAX, [RDX + #16 + RDI << #3] # long >>> 10e movq RDI, [RBP + #16 + RDI << #3] # long >>> 113 cmpq RAX, RDI >>> 116 jne B33 P=0.000000 C=7836.000000 >>> 116 >>> 11c B11: # B33 B12 <- B10 Freq: 975.841 >>> 11c movl R8, RSI # spill >>> 11f addl R8, #4 # int >>> 123 movslq RDI, R8 # i2l >>> 126 movq RAX, [RDX + #16 + RDI << #3] # long >>> 12b movq RDI, [RBP + #16 + RDI << #3] # long >>> 130 cmpq RAX, RDI >>> 133 jne B33 P=0.000000 C=7836.000000 >>> 133 >>> 139 B12: # B33 B13 <- B11 Freq: 975.841 >>> 139 movl R8, RSI # spill >>> 13c addl R8, #5 # int >>> 140 movslq RDI, R8 # i2l >>> 143 movq RAX, [RDX + #16 + RDI << #3] # long >>> 148 movq RDI, [RBP + #16 + RDI << #3] # long >>> 14d cmpq RAX, RDI >>> 150 jne B33 P=0.000000 C=7836.000000 >>> 150 >>> 156 B13: # B33 B14 <- B12 Freq: 975.84 >>> 156 movl R8, RSI # spill >>> 159 addl R8, #6 # int >>> 15d movslq RDI, R8 # i2l >>> 160 movq RAX, [RDX + #16 + RDI << #3] # long >>> 165 movq RDI, [RBP + #16 + RDI << #3] # long >>> 16a cmpq RAX, RDI >>> 16d jne B33 P=0.000000 C=7836.000000 >>> 16d >>> 173 B14: # B33 B15 <- B13 Freq: 975.84 >>> 173 movl R8, RSI # spill >>> 176 addl R8, #7 # int >>> 17a movslq RDI, R8 # i2l >>> 17d movq RAX, [RDX + #16 + RDI << #3] # long >>> 182 movq RDI, [RBP + #16 + RDI << #3] # long >>> 187 cmpq RAX, RDI >>> 18a jne B33 P=0.000000 C=7836.000000 >>> 18a >>> 190 B15: # B7 B16 <- B14 Freq: 975.839 >>> 190 addl RSI, #8 # int >>> 193 cmpl RSI, R11 >>> 196 jl B7 # loop end P=0.998980 C=7836.000000 >>> >>> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >>> >>> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >>> >>> Roland. >>> > From martin.doerr at sap.com Tue Dec 15 09:14:06 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 15 Dec 2015 09:14:06 +0000 Subject: RFR(S): 8144850: C1: operator delete needs an implementation In-Reply-To: <566F84B0.5030603@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB4181165672287230@DEWDFEMB19C.global.corp.sap> <566F84B0.5030603@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672288C9F@DEWDFEMB19C.global.corp.sap> Thank you very much. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Dienstag, 15. Dezember 2015 04:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8144850: C1: operator delete needs an implementation Good. I am pushing it. Thanks, Vladimir On 12/9/15 7:49 AM, Doerr, Martin wrote: > Hi, > > unfortunately, I didn't test the slow debug build when I overworked > JDK-8138890. > > Product and fastdebug build are working fine. > > However, we need another fix to support the slow debug build with xlC on > AIX. > > A webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8144850_c1_delete/webrev.00/ > > It would be great if somebody could review and sponsor. > > Thanks and best regards, > > Martin > From roland.westrelin at oracle.com Tue Dec 15 09:14:59 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 15 Dec 2015 10:14:59 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: <566F45F9.5000304@oracle.com> References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> Message-ID: For reference, current webrev: http://cr.openjdk.java.net/~roland/8139771/webrev.01/ >> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. > > Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? Let me take another look at this. Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? Roland. From andreas.eriksson at oracle.com Tue Dec 15 09:35:34 2015 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Tue, 15 Dec 2015 10:35:34 +0100 Subject: [8u-dev] backport RFR: 6869327: Add new C2 flag to keep safepoints in counted loops In-Reply-To: <566F3097.5040403@oracle.com> References: <5666E383.3010102@oracle.com> <566F3097.5040403@oracle.com> Message-ID: <566FDEE6.5040804@oracle.com> Thanks Vladimir, On 2015-12-14 22:11, Vladimir Kozlov wrote: > You 8u changes looks fine but they introduced bug in jdk9: > > https://bugs.openjdk.java.net/browse/JDK-8144935 > > You need to backport 8144935 fix too as separate changes. Thanks for pointing that out, I'd missed that follow up bug, since apparently I was not subscribed to receive notifications for 6869327. - Andreas > > Regards, > Vladimir > > On 12/8/15 6:04 AM, Andreas Eriksson wrote: >> Hi, >> >> Please review this backport of JDK-6869327: Add new C2 flag to keep >> safepoints in counted loops. >> The only change in this backport is to the test, where the testlibrary >> imports needed to be changed, and I also removed the @module tag. >> >> JDK 9 review: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-November/020110.html >> >> >> >> Webrev for changes between 9 and 8: >> http://cr.openjdk.java.net/~aeriksso/6869327/webrev.9_to_8/ >> >> Full 8u webrev: >> http://cr.openjdk.java.net/~aeriksso/6869327/webrev.jdk8u/ >> >> Bug: 6869327: Add new C2 flag to keep safepoints in counted loops. >> https://bugs.openjdk.java.net/browse/JDK-6869327 >> >> Thanks, >> Andreas From adinn at redhat.com Tue Dec 15 09:59:44 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 15 Dec 2015 09:59:44 +0000 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: References: Message-ID: <566FE490.6020006@redhat.com> On 30/11/15 18:54, Volker Simonis wrote: > Moreover, "UseSSE42Intrinsics" is clearly a architecture-dependant > option. I already wondered that according to vm_version_aarch64.cpp it > seems to exists on aarch64 (is this really true Andrew?). But it's > surely not available on PowerPC, SPARC, ... I assume you meant the other Andrew? Anyway, you get a 2nd Andrew for free. It appears that this bogus set default = true for UseSSE42Intrinsics was included in our JDK8 tree when Ed Nevill implemented a StrIndexOf rule in the AArch64 ad file. I guess it was done precisely to get round the problem that a StrIndexOf rule only gets used if this value UseSSE42Intrinsics is set. So, yes this needs cleaning up to allow StrIndexOf nodes to be implemented in non-x86_64 cases (including aarch64) without having to set a bogus flag. regards, Andrew Dinn ----------- From martin.doerr at sap.com Tue Dec 15 10:27:14 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 15 Dec 2015 10:27:14 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566FD7E8.7000105@oracle.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> Hi, I think this change is good with respect to concurrent java threads. However, I'm not sure if concurrent GC may have a problem when we optimize out the memory barrier (with or without this change). Is it guaranteed that no concurrent GC will ever read an object header of such a newly allocated object? A reference to this object may get written somewhere where GC can find it. If the GC reads the header, it may read stale data. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev Sent: Dienstag, 15. Dezember 2015 10:06 To: Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode * PGP Signed by an unknown key Also, I think this is a duplicate of: https://bugs.openjdk.java.net/browse/JDK-8032481 -Aleksey On 12/15/2015 05:40 AM, Vladimir Kozlov wrote: > Very interesting! > > Please, add short statement to the comment in /macro.cpp for your case. > > Changes looks fine to me. One nit could be to delay bytecode analysis > until macro expansion - it may reduce compilation time. Bytecode > analysis of each constructor could be expensive. > > Thanks, > Vladimir > > On 12/10/15 6:48 AM, Hui Shi wrote: >> Hi All, >> >> >> Could some one help comments this change? >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 >> >> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ >> >> >> This patch aims to remove redundant memory barrier after allocation >> node, on AArch64 it removes redundant dmb when creating object. The >> motivation is dmb instructions after commonly used object allocation, >> for example string and boxing objects is redundant with dmb inserted for >> final field write. In following small case:____ >> >> __ __ >> >> String foo(String s)____ >> >> {____ >> >> String copy = new String(s);____ >> >> return copy;____ >> >> }____ >> >> __ __ >> >> There are two dmb instructions in generated code. First one is >> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. >> Second one is membar_release, inserted at exit of initializer method as >> final fields write happens. Allocated String doesn't escape in String >> initializer method, membar_release includes membar_storestore semantic. >> So first one can be removed safely.____ >> >> __ __ >> >> 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ >> >> 0x0000007f85bbfa90: str xzr, [x0,#16]____ >> >> 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ >> >> ....____ >> >> ____ >> >> 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ >> >> 0x0000007fa01d83c4: ldr w12, [x20,#16]____ >> >> 0x0000007fa01d83c8: ldr x11, [sp,#8]____ >> >> 0x0000007fa01d83cc: strb w10, [x11,#20]____ >> >> 0x0000007fa01d83d0: str w12, [x11,#16]____ >> >> 0x0000007fa01d83d4: dmb ish // second dmb____ >> >> __ __ >> >> >> Patch targets this pattern and remove redundant memory barrier for >> allocation node.____ >> >> 1. When inserting memory barrier for final field write. If final fields' >> object allocation node is available, invoke >> AllocationNode::compute_MemBar_redundancy(initializer method).____ >> >> 2. In AllocationNode:____ >> >> 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate >> if memory barrier after allocation node is redundant.____ >> >> 2.2 Add method compute_MemBar_redundancy, set >> _is_allocation_MemBar_redundant true if first parameter "this" does >> not escape in initializer method according to BCEscapeAnalyzer.____ >> >> 3. skip inserting memory barrier in >> PhaseMacroExpand::expand_allocate_common, when AllocationNode's >> _is_allocation_MemBar_redundant flagis true. >> >> >> Regards >> >> Hui >> * Unknown Key * 0x62A119A7 From aph at redhat.com Tue Dec 15 10:28:49 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 10:28:49 +0000 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: References: Message-ID: <566FEB61.6040200@redhat.com> On 30/11/15 18:54, Volker Simonis wrote: > Moreover, "UseSSE42Intrinsics" is clearly a architecture-dependant > option. I already wondered that according to vm_version_aarch64.cpp it > seems to exists on aarch64 (is this really true Andrew?). But it's > surely not available on PowerPC, SPARC, ... Sure it is. The flag is just oddly-named, that's all: it means "InlineStringIndexOf". I have no idea why a name like UseSSE42Intrinsics ever made its way into the shared code. Andrew. From aph at redhat.com Tue Dec 15 10:42:17 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 10:42:17 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> Message-ID: <566FEE89.5020300@redhat.com> On 15/12/15 10:27, Doerr, Martin wrote: > I think this change is good with respect to concurrent java threads. > However, I'm not sure if concurrent GC may have a problem when we > optimize out the memory barrier (with or without this change). > > Is it guaranteed that no concurrent GC will ever read an object > header of such a newly allocated object? > A reference to this object may get written somewhere where GC can > find it. If the GC reads the header, it may read stale data. We know that the reference to the newly-created object does not escape, so it is not reachable from any reference. The only other way a GC might find it is at a safepoint. But even if that happens, a safepoint is a memory barrier. So I think we're OK. Andrew. From aph at redhat.com Tue Dec 15 10:48:36 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 10:48:36 +0000 Subject: RFR: 8144856 (XS): Fix assert in CompiledStaticCall::set_to_interpreted In-Reply-To: <59A392C9-D463-45B6-A274-7C3AA8A7B4FF@oracle.com> References: <566A9946.6010101@oracle.com> <59A392C9-D463-45B6-A274-7C3AA8A7B4FF@oracle.com> Message-ID: <566FF004.6020104@redhat.com> On 15/12/15 02:32, Christian Thalinger wrote: > AArch64 folk, is the fact that the second assert is different on AArch64 than all the other platforms on purpose? I'm sorry, I can't remember. Andrew. From nils.eliasson at oracle.com Tue Dec 15 12:22:34 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 15 Dec 2015 13:22:34 +0100 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control Message-ID: <5670060A.6010501@oracle.com> Hi, Please review this change that fixes log compilation. It changed the default value for the log option to the command line flag value in compilerDirectives.hpp, updates how CompileCommand=log updates the value, and adds a warning if per method logging is used but LogCompilation is not set. Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ Regards, Nils Eliasson From vladimir.x.ivanov at oracle.com Tue Dec 15 12:30:52 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Dec 2015 15:30:52 +0300 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe Message-ID: <567007FC.7090601@oracle.com> http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8071374 Disassembler wraps some native disassembler, which is not necessarily thread-safe. It's not a problem for -XX:+PrintAssembly since access from compilers is serialized by Compile_lock. It is not the case anymore when there are calls from runtime (e.g., with -XX:+PrintSignatureHandlers). The problem can manifest as a failure to parse instruction stream. The fix is to serialize access to Disassembler on tty_lock. Considering most of the calls to Disassembler::decode are performed under tty_lock (which has the lowest rank), it's too burdensome to introduce a dedicated lock and a new rank to please deadlock detection logic. Also, some cleanups are included. Testing: failing test case from the report, JPRT. Thanks! Best regards, Vladimir Ivanov PS: I noted that the following code usually dumps some garbage at the end of the code block: src/share/vm/interpreter/interpreter.cpp: void SignatureHandlerLibrary::add(const methodHandle& method) { ... tty->print_cr(" --- associated result handler ---"); address rh_end = rh_begin; while (*(int*)rh_end != 0) { rh_end += sizeof(int); } Disassembler::decode(rh_begin, rh_end); $ java -XX:+PrintSignatureHandlers ... ... argument handler #0 for: static java.lang.Object.registerNatives()V (fingerprint = 349, 11 bytes generated) 0x0000000106d55e60: movabs $0x106c1b118,%rax 0x0000000106d55e6a: retq --- associated result handler --- 0x0000000106c1b118: retq // T_VOID: _native_abi_to_tosca[6] 0x0000000106c1b119: retq // T_FLOAT: _native_abi_to_tosca[7] 0x0000000106c1b11a: retq // T_DOUBLE: _native_abi_to_tosca[8] // T_OBJECT/T_ARRAY: _native_abi_to_tosca[9] 0x0000000106c1b11b: mov 0x10(%rbp),%rax 0x0000000106c1b11f: retq === end of AbstractInterpreter::_native_abi_to_tosca[] Garbage until (*(int*)rh_end) == 0: 0x0000000106c1b120: rex add %eax,(%rax) 0x0000000106c1b123: add %cl,%ah 0x0000000106c1b125: int3 0x0000000106c1b126: int3 0x0000000106c1b127: int3 0x0000000106c1b128: insl (%dx),%es:(%rdi) 0x0000000106c1b129: sub $0x10521,%eax 0x0000000106c1b12e: add %al,(%rax) 0x0000000106c1b130: (bad) 0x0000000106c1b131: (bad) 0x0000000106c1b132: (bad) 0x0000000106c1b133: dec %esp 0x0000000106c1b135: int3 0x0000000106c1b136: int3 0x0000000106c1b137: int3 Maybe add some padding in either CodeletMark::~CodeletMark or TemplateInterpreterGenerator::generate_all()? From goetz.lindenmaier at sap.com Tue Dec 15 13:09:58 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 13:09:58 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566FEE89.5020300@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> Hi Andrew, What if it's assigned to an object that's already completely alive, but does not escape itself? Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Andrew Haley > Sent: Dienstag, 15. Dezember 2015 11:42 > To: Doerr, Martin ; Aleksey Shipilev > ; Vladimir Kozlov > ; Hui Shi ; hotspot > compiler ; aarch64-port-dev > ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > On 15/12/15 10:27, Doerr, Martin wrote: > > > I think this change is good with respect to concurrent java threads. > > However, I'm not sure if concurrent GC may have a problem when we > > optimize out the memory barrier (with or without this change). > > > > Is it guaranteed that no concurrent GC will ever read an object > > header of such a newly allocated object? > > A reference to this object may get written somewhere where GC can > > find it. If the GC reads the header, it may read stale data. > > We know that the reference to the newly-created object does not > escape, so it is not reachable from any reference. The only other way > a GC might find it is at a safepoint. But even if that happens, a > safepoint is a memory barrier. So I think we're OK. > > Andrew. From aph at redhat.com Tue Dec 15 13:46:13 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 13:46:13 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> Message-ID: <567019A5.1000202@redhat.com> Hi, On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > What if it's assigned to an object that's already completely alive, > but does not escape itself? It's not clear to me exactly what this means. However, if neither object escapes then they are both reachable to GC only via scanning the stack, and this can happen only at safepoints. Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 13:53:39 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 13:53:39 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567019A5.1000202@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Hi Andrew, here an example: A a = new A (); // a does not escape Safepoint(); // a is known to GC // Concurrent GC is running. B b = new B(a); where B(A a) { StoreStore barrier // This is removed by the optimization. a.x = this; // Then this is not initialized, but visible to GC final field store Membar_release } Best regards, Martin and Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 14:46 > To: Lindenmaier, Goetz ; Doerr, Martin > ; Aleksey Shipilev ; > Vladimir Kozlov ; Hui Shi ; > hotspot compiler ; aarch64-port- > dev ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > Hi, > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > What if it's assigned to an object that's already completely alive, > > but does not escape itself? > > It's not clear to me exactly what this means. However, if neither > object escapes then they are both reachable to GC only via scanning > the stack, and this can happen only at safepoints. > > Andrew. From aph at redhat.com Tue Dec 15 14:05:34 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:05:34 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: <56701E2E.5000901@redhat.com> Hi, On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, but visible to GC > final field store > Membar_release > } Hmm, interesting. Here we're presented with two objects which escape analysis reveals as not escaping but both are allocated anyway and are included in the OOP map. I'd argue that once you've put an object into an OOP map to be scanned it has escaped, but that may well not be how C2 handles it. For this reachability analysis to be correct, if you put a reference to an object into any object which is reachable as a GC root then that object surely does escape. Andrew. From roland.westrelin at oracle.com Tue Dec 15 14:12:52 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 15 Dec 2015 15:12:52 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> Message-ID: > For reference, current webrev: > > http://cr.openjdk.java.net/~roland/8139771/webrev.01/ > >>> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. >> >> Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? > > Let me take another look at this. > Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? So it?s not a peel but a partial peel. The CastPP has _carry_dependency set to true. My concern is that if _carry_dependency is true, we lost track of why we have that CastPP so I wonder if there can be a hidden dependency between the CastPP and the predicates: a pass of loop predication moves stuff out of the loop, that allowed some optimizations to proceed that resulted in the CastPP with _carry_dependency and then the loop is partially peeled. Roland. From vitalyd at gmail.com Tue Dec 15 14:28:35 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 09:28:35 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <56701E2E.5000901@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: I'm curious why you guys think `a` and/or `b` would be in the oopmap if compiler proves they don't escape. AFAIK, both `a` and `b` will be component-wise scalar replaced. Once that's done, there's a ref from scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either case, I don't see why either of these need to be known to GC at all (which would somewhat defeat the purpose of EA to begin with). On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley wrote: > Hi, > > On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > > > here an example: > > > > A a = new A (); // a does not escape > > Safepoint(); // a is known to GC > > // Concurrent GC is running. > > B b = new B(a); > > > > where > > B(A a) { > > > > StoreStore barrier // This is removed by the optimization. > > a.x = this; // Then this is not initialized, > but visible to GC > > final field store > > Membar_release > > } > > Hmm, interesting. Here we're presented with two objects which > escape analysis reveals as not escaping but both are allocated > anyway and are included in the OOP map. > > I'd argue that once you've put an object into an OOP map to be scanned > it has escaped, but that may well not be how C2 handles it. For this > reachability analysis to be correct, if you put a reference to an > object into any object which is reachable as a GC root then that object > surely does escape. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Dec 15 14:33:04 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:33:04 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: <567024A0.40409@redhat.com> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > compiler proves they don't escape. AFAIK, both `a` and `b` will be > component-wise scalar replaced. Once that's done, there's a ref from > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > case, I don't see why either of these need to be known to GC at all (which > would somewhat defeat the purpose of EA to begin with). Are you saying that if escape analysis determined that an object does not escape then you know *for sure* that it will always be scalar- replaced? Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 14:37:51 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 14:37:51 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible. Oop maps are needed, else these don?t survive the gc. Goetz. From: Vitaly Davidovich [mailto:vitalyd at gmail.com] Sent: Dienstag, 15. Dezember 2015 15:29 To: Andrew Haley Cc: Lindenmaier, Goetz ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode I'm curious why you guys think `a` and/or `b` would be in the oopmap if compiler proves they don't escape. AFAIK, both `a` and `b` will be component-wise scalar replaced. Once that's done, there's a ref from scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either case, I don't see why either of these need to be known to GC at all (which would somewhat defeat the purpose of EA to begin with). On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley > wrote: Hi, On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, but visible to GC > final field store > Membar_release > } Hmm, interesting. Here we're presented with two objects which escape analysis reveals as not escaping but both are allocated anyway and are included in the OOP map. I'd argue that once you've put an object into an OOP map to be scanned it has escaped, but that may well not be how C2 handles it. For this reachability analysis to be correct, if you put a reference to an object into any object which is reachable as a GC root then that object surely does escape. Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Dec 15 14:42:31 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:42:31 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> Message-ID: <567026D7.6080908@redhat.com> On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote: > If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible. > Oop maps are needed, else these don?t survive the gc. I don't know what this means. Andrew. From hui.shi at linaro.org Tue Dec 15 14:50:38 2015 From: hui.shi at linaro.org (Hui Shi) Date: Tue, 15 Dec 2015 22:50:38 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: Thanks All! In Goetz example, suppose the outer method is named foo and object a, b is not escaped in foo. b is not escaped in foo as a is not escaped in foo. But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer method, "this" should be marked escaped as it is assigned to another parameter "assign to a.x". As b is escaped in its initializer, storestore barrier will not be removed in this case, so it's safe. Regards Hui On 15 December 2015 at 21:53, Lindenmaier, Goetz wrote: > Hi Andrew, > > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, > but visible to GC > final field store > Membar_release > } > > Best regards, > Martin and Goetz. > > > > -----Original Message----- > > From: Andrew Haley [mailto:aph at redhat.com] > > Sent: Dienstag, 15. Dezember 2015 14:46 > > To: Lindenmaier, Goetz ; Doerr, Martin > > ; Aleksey Shipilev ; > > Vladimir Kozlov ; Hui Shi < > hui.shi at linaro.org>; > > hotspot compiler ; aarch64-port- > > dev ; Mikael Gerdin > > (mikael.gerdin at oracle.com) > > > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > > barrier after AllocationNode > > > > Hi, > > > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > > > What if it's assigned to an object that's already completely alive, > > > but does not escape itself? > > > > It's not clear to me exactly what this means. However, if neither > > object escapes then they are both reachable to GC only via scanning > > the stack, and this can happen only at safepoints. > > > > Andrew. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Dec 15 14:51:40 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 09:51:40 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567024A0.40409@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: Hotspot implements only the scalar replacement form of EA. On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley wrote: > On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: > > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > > compiler proves they don't escape. AFAIK, both `a` and `b` will be > > component-wise scalar replaced. Once that's done, there's a ref from > > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > > case, I don't see why either of these need to be known to GC at all > (which > > would somewhat defeat the purpose of EA to begin with). > > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? > > Andrew. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Tue Dec 15 14:54:23 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 14:54:23 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567026D7.6080908@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> <567026D7.6080908@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> Hi, It's explained in escape.hpp. The proper name is 'ArgEscape'. typedef enum { UnknownEscape = 0, NoEscape = 1, // An object does not escape method or thread and it is // not passed to call. It could be replaced with scalar. ArgEscape = 2, // An object does not escape method or thread but it is // passed as argument to call or referenced by argument // and it does not escape during call. GlobalEscape = 3 // An object escapes the method or thread. } EscapeState; I.e., an object passed to a callee that is a pure function can not be scalar replaced, as you have to keep the object layout to pass it down. But the callee does not publish the reference to any other thread, so we don't need to execute locks. Also, we can remove barriers. Actually, we see a whole bunch of errors on ppc recently. I thought it's all related to ComressedStrings, but not all are investigated yet. So it could also stem from "8136596: Remove aarch64: MemBarRelease when final field's allocation is NoEscape or ArgEscape" http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6cc606e29b74 We'll investigate ... Best regards, Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 15:43 > To: Lindenmaier, Goetz ; Vitaly Davidovich > > Cc: Doerr, Martin ; Aleksey Shipilev > ; Vladimir Kozlov > ; Hui Shi ; hotspot > compiler ; aarch64-port-dev > ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote: > > If object arg_escape, locking, barriers etc can be relaxed, but scalar > replacement is not possible. > > Oop maps are needed, else these don?t survive the gc. > > I don't know what this means. > > Andrew. From aph at redhat.com Tue Dec 15 14:55:38 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:55:38 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: <567029EA.5030607@redhat.com> On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. Scalar replacement is not a form of escape analysis. This does not answer my question, which was: > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? Andrew. From aph at redhat.com Tue Dec 15 14:57:59 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:57:59 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> <567026D7.6080908@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> Message-ID: <56702A77.7040407@redhat.com> On 12/15/2015 02:54 PM, Lindenmaier, Goetz wrote: > I.e., an object passed to a callee that is a pure function > can not be scalar replaced, as you have to keep the object > layout to pass it down. > But the callee does not publish the reference to any other > thread, so we don't need to execute locks. Also, we > can remove barriers. So the answer is obvious, surely? We can elide the locks only if NoEscape. Andrew. From vitalyd at gmail.com Tue Dec 15 15:00:42 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:00:42 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: Well, scratch what I said; I see Goetz is referring to ArgEscape form, but I was thinking we're talking about the NoEscape version given the example is quite simple. On Tue, Dec 15, 2015 at 9:51 AM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. > > On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley wrote: > >> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: >> > I'm curious why you guys think `a` and/or `b` would be in the oopmap if >> > compiler proves they don't escape. AFAIK, both `a` and `b` will be >> > component-wise scalar replaced. Once that's done, there's a ref from >> > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In >> either >> > case, I don't see why either of these need to be known to GC at all >> (which >> > would somewhat defeat the purpose of EA to begin with). >> >> Are you saying that if escape analysis determined that an object does >> not escape then you know *for sure* that it will always be scalar- >> replaced? >> >> Andrew. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Dec 15 15:02:23 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:02:23 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> Message-ID: Ok, as I just replied to Andrew, I hadn't considered the ArgEscape scenario. Does an oop that's ArgEscape still get allocated on heap then? On Tue, Dec 15, 2015 at 9:37 AM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > If object arg_escape, locking, barriers etc can be relaxed, but scalar > replacement is not possible. > > Oop maps are needed, else these don?t survive the gc. > > > > Goetz. > > > > *From:* Vitaly Davidovich [mailto:vitalyd at gmail.com] > *Sent:* Dienstag, 15. Dezember 2015 15:29 > *To:* Andrew Haley > *Cc:* Lindenmaier, Goetz ; Doerr, Martin < > martin.doerr at sap.com>; Aleksey Shipilev ; > Vladimir Kozlov ; Hui Shi ; > hotspot compiler ; > aarch64-port-dev ; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > *Subject:* Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > > > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > compiler proves they don't escape. AFAIK, both `a` and `b` will be > component-wise scalar replaced. Once that's done, there's a ref from > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > case, I don't see why either of these need to be known to GC at all (which > would somewhat defeat the purpose of EA to begin with). > > > > On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley wrote: > > Hi, > > On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > > > here an example: > > > > A a = new A (); // a does not escape > > Safepoint(); // a is known to GC > > // Concurrent GC is running. > > B b = new B(a); > > > > where > > B(A a) { > > > > StoreStore barrier // This is removed by the optimization. > > a.x = this; // Then this is not initialized, > but visible to GC > > final field store > > Membar_release > > } > > Hmm, interesting. Here we're presented with two objects which > escape analysis reveals as not escaping but both are allocated > anyway and are included in the OOP map. > > I'd argue that once you've put an object into an OOP map to be scanned > it has escaped, but that may well not be how C2 handles it. For this > reachability analysis to be correct, if you put a reference to an > object into any object which is reachable as a GC root then that object > surely does escape. > > Andrew. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Tue Dec 15 15:04:50 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 15 Dec 2015 16:04:50 +0100 Subject: RFR(S/M): 8144246: CompilerControl: adding lots of directives via jcmd may produce OOM crash In-Reply-To: <566F49B9.8090104@oracle.com> References: <566ED7D3.1010200@oracle.com> <566F49B9.8090104@oracle.com> Message-ID: <56702C12.5050207@oracle.com> Thank you Vladimir! I added a test update to the webrev to limit the stress test to 1000 directives. http://cr.openjdk.java.net/~neliasso/8144246/webrev.02/ Best regards, //Nils On 2015-12-14 23:59, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 12/14/15 6:53 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this minor change. It introduced a limit to how many >> directives can be added. The limit can be controlled by the diagnostic >> flag CompilerDirectivesLimit. For normal use it would be very unusual to >> have more than a few directives. >> >> The Flag PrintCompilerDirectives was changed to CompilerDirectivesPrint >> to have a consistent naming for all directives flag. This is a new flag >> and is not used anywhere yet. >> >> Testing: >> All the compiler control tests will have been run before submit. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144246 >> Webrev: http://cr.openjdk.java.net/~neliasso/8144246/webrev.01/ >> >> Regards, >> Nils >> From pavel.punegov at oracle.com Tue Dec 15 15:09:36 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Tue, 15 Dec 2015 18:09:36 +0300 Subject: RFR(S/M): 8144246: CompilerControl: adding lots of directives via jcmd may produce OOM crash In-Reply-To: <56702C12.5050207@oracle.com> References: <566ED7D3.1010200@oracle.com> <566F49B9.8090104@oracle.com> <56702C12.5050207@oracle.com> Message-ID: Nils, thanks for updating tests. It looks good Pavel. > On 15 Dec 2015, at 18:04, Nils Eliasson wrote: > > Thank you Vladimir! > > I added a test update to the webrev to limit the stress test to 1000 directives. > > http://cr.openjdk.java.net/~neliasso/8144246/webrev.02/ > > Best regards, > //Nils > > On 2015-12-14 23:59, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks, >> Vladimir >> >> On 12/14/15 6:53 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this minor change. It introduced a limit to how many >>> directives can be added. The limit can be controlled by the diagnostic >>> flag CompilerDirectivesLimit. For normal use it would be very unusual to >>> have more than a few directives. >>> >>> The Flag PrintCompilerDirectives was changed to CompilerDirectivesPrint >>> to have a consistent naming for all directives flag. This is a new flag >>> and is not used anywhere yet. >>> >>> Testing: >>> All the compiler control tests will have been run before submit. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144246 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8144246/webrev.01/ >>> >>> Regards, >>> Nils >>> > From vitalyd at gmail.com Tue Dec 15 15:11:00 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:11:00 -0500 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567029EA.5030607@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> Message-ID: Yes that was my fault; I had forgotten about the ArgEscape analysis result. To answer your question somewhat, if an object is NoEscape then it's scalar replaced in the end. I don't think there's any other end result in hotspot (e.g there's no stack allocation). On Tuesday, December 15, 2015, Andrew Haley wrote: > On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > > Hotspot implements only the scalar replacement form of EA. > > Scalar replacement is not a form of escape analysis. This does > not answer my question, which was: > > > Are you saying that if escape analysis determined that an object does > > not escape then you know *for sure* that it will always be scalar- > > replaced? > > Andrew. > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Tue Dec 15 15:14:02 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 15:14:02 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF66@DEWDFEMB12A.global.corp.sap> Hi Hui That depends how BCEscapeAnalysis is implemented. I don?t know this in detail. But in theory, after analyzing a callee, you represent it by some function describing it?s semantics. From this you would derive that both are ArgEscape in the end. Best regards, Goetz. From: Hui Shi [mailto:hui.shi at linaro.org] Sent: Dienstag, 15. Dezember 2015 15:51 To: Lindenmaier, Goetz Cc: Andrew Haley ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode Thanks All! In Goetz example, suppose the outer method is named foo and object a, b is not escaped in foo. b is not escaped in foo as a is not escaped in foo. But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer method, "this" should be marked escaped as it is assigned to another parameter "assign to a.x". As b is escaped in its initializer, storestore barrier will not be removed in this case, so it's safe. Regards Hui On 15 December 2015 at 21:53, Lindenmaier, Goetz > wrote: Hi Andrew, here an example: A a = new A (); // a does not escape Safepoint(); // a is known to GC // Concurrent GC is running. B b = new B(a); where B(A a) { StoreStore barrier // This is removed by the optimization. a.x = this; // Then this is not initialized, but visible to GC final field store Membar_release } Best regards, Martin and Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 14:46 > To: Lindenmaier, Goetz >; Doerr, Martin > >; Aleksey Shipilev >; > Vladimir Kozlov >; Hui Shi >; > hotspot compiler >; aarch64-port- > dev >; Mikael Gerdin > > (mikael.gerdin at oracle.com) > > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > Hi, > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > What if it's assigned to an object that's already completely alive, > > but does not escape itself? > > It's not clear to me exactly what this means. However, if neither > object escapes then they are both reachable to GC only via scanning > the stack, and this can happen only at safepoints. > > Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Tue Dec 15 16:01:40 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 16:01:40 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> Yes, there is no stack allocation. But locks are removed, see escape.cpp:1844, which is executed under condition not_global_escape(). As well look at callnode:1770. Also, does_not_escape_thread() used here checks for <= ArgEscape. Further, if the object is NoEscape it might not be scalar replaced. If I remember correctly, there are various conditions, e.g., too big, allocated in loop. And, the constructor could be inlined (or does this happen after expand_allocate_common()?) Best regards, Goetz. From: Vitaly Davidovich [mailto:vitalyd at gmail.com] Sent: Dienstag, 15. Dezember 2015 16:11 To: Andrew Haley Cc: Lindenmaier, Goetz ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode Yes that was my fault; I had forgotten about the ArgEscape analysis result. To answer your question somewhat, if an object is NoEscape then it's scalar replaced in the end. I don't think there's any other end result in hotspot (e.g there's no stack allocation). On Tuesday, December 15, 2015, Andrew Haley > wrote: On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. Scalar replacement is not a form of escape analysis. This does not answer my question, which was: > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? Andrew. -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Dec 15 16:15:08 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 16:15:08 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> Message-ID: <56703C8C.4000801@redhat.com> On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > Further, if the object is NoEscape it might not be scalar > replaced. If I remember correctly, there are various conditions, > e.g., too big, allocated in loop. Well, that's the killer. The definition of "escape" we need to use here is the really, truly, honest-to-goodness one: that this object never becomes visible to any other thread by any means. Unless that is so, all bets are off. In this case, what is intended is "appears in an OOP map". Andrew. From roland.schatz at oracle.com Tue Dec 15 17:01:26 2015 From: roland.schatz at oracle.com (Roland Schatz) Date: Tue, 15 Dec 2015 18:01:26 +0100 Subject: RFR: 8144704: [JVMCI] add tests for simple code installation Message-ID: <56704766.3030509@oracle.com> Hi, Please review these new unittests for the JVMCI. JIRA: https://bugs.openjdk.java.net/browse/JDK-8144704 Webrev: http://cr.openjdk.java.net/~rschatz/JDK-8144704/webrev.00/ These tests try to generate, install and execute code. In order to do this, they need a small macro assembler for each platform. The TestAssembler is implemented for AMD64 and SPARC (i.e. all platforms that currently have the JVMCI). The tests themselves are platform independent. They test: - SimpleCodeInstallationTest.java: Installation and execution of a simple "a + b" method. - DataPatchTest.java: Installation and execution of various different implementations of "return DataPatchTest.class;" methods, using all possible combinations of data patches (narrow/wide, oop/klass pointer, inline in code or through data section). - SimpleDebugInfoTest.java: Deoptimizations to various interpreter states with constant/register/stack values. - VirtualObjectDebugInfoTest.java: Deoptimization with a virtual object graph. These tests are by far not complete, if we want complete coverage of all combinations (especially data type / native location / jvm location combinations in the DebugInfo), we probably need generated tests. Aside from combination coverage, still not tested are: - deoptimizations to more complicated VM states (e.g. while holding a lock, or with more than one inlined stack frame) - safepoints and oopmaps - invokes Thanks, Roland From aph at redhat.com Tue Dec 15 18:00:57 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 18:00:57 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> <1449588750.5880.28.camel@mylittlepony.linaroharston> Message-ID: <56705559.8020900@redhat.com> On 12/08/2015 03:32 PM, Edward Nevill wrote: > OK. Thanks, I have satisfied myself that this is correct. > > New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 By the powers newly vested in me I hereby approve this patch. Andrew. From vladimir.x.ivanov at oracle.com Tue Dec 15 19:21:13 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Dec 2015 22:21:13 +0300 Subject: [9] RFR (XS): 8140659: C1: invokedynamic call patching violates JVMS-6.5.invokedynamic Message-ID: <56706829.1080401@oracle.com> http://cr.openjdk.java.net/~vlivanov/8140659/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8140659 C1 call site patching for invokedynamic case doesn't comply with the following statement from the JVMS (6.5.invokedynamic): "If several threads simultaneously execute the bootstrap method for the same dynamic call site, the Java Virtual Machine must choose one returned call site object and install it visibly to all threads. Any other bootstrap methods executing for the dynamic call site are allowed to complete, but their results are ignored, and the threads' execution of the dynamic call site proceeds with the chosen call site object." The patching logic tries to update corresponding constant pool entry, but doesn't care about whether it has been already resolved. The fix is to reload appendix after updating the constant pool instead of using locally computed one. It guarantees that everybody will see the same call site object. CPCE::set_method_handle_common() ensures that initialization happens only once (it is guarded by a lock on resolved_references array). Testing: failing test case, JPRT. Thanks! Best regards, Vladimir Ivanov From aph at redhat.com Tue Dec 15 19:30:06 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 19:30:06 +0000 Subject: RFR: 8145438: Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 Message-ID: <56706A3E.4040809@redhat.com> This patch touches shared code, so I need a sponsor to push it. The AArch64 bit-test instructions are limited to a 32k displacement. In some fairly unusual circumstances the range can exceed this, so we get a compile-time failure. This patch implement long and short variants of the patterns. The shared code I changed is in adlc. The problem there is that when we search for short and long variants of a branch we do not consider the predicates. It makes no sense at all for short and long variants to have different predicates, so suspect this is a bug in adlc. http://cr.openjdk.java.net/~aph/8145438 Thanks, Andrew. From christian.thalinger at oracle.com Tue Dec 15 20:21:02 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 15 Dec 2015 10:21:02 -1000 Subject: RFR: 8141351 - Create tests for direct invoke instructions testing In-Reply-To: <5669955B.8060703@oracle.com> References: <5669955B.8060703@oracle.com> Message-ID: Looks good. > On Dec 10, 2015, at 5:08 AM, Dmitrij Pochepko wrote: > > Hi all, > > please review a patch for JDK-8141351 - Create tests for direct invoke instructions testing > > There was no separate jtreg tests for invokevirtual, invokespecial, invokestatic, invokeinterface and invokedynamic instructions before, so, a tests to check it with combinations of compiled, interpreted and native code were created. > > There are 8 common classes and native part in test/compiler/calls/common which contains all generic logic. Other files are just test descriptions. > Every test here basically consists of caller method and callee method, which is called using tested instruction. > So, test descriptions runs different combinations of caller and callee being compiled/interpreted/native. > > I've tested these tests on several platforms(linux/macos/solaris). > > CR: https://bugs.openjdk.java.net/browse/JDK-8141351 > > A webrev: http://cr.openjdk.java.net/~dpochepk/8141351/webrev.01/ > > Thanks, > Dmitrij From dmitrij.pochepko at oracle.com Tue Dec 15 20:39:19 2015 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Tue, 15 Dec 2015 23:39:19 +0300 Subject: RFR: 8141351 - Create tests for direct invoke instructions testing In-Reply-To: References: <5669955B.8060703@oracle.com> Message-ID: <56707A77.605@oracle.com> Thank you! > Looks good. > >> On Dec 10, 2015, at 5:08 AM, Dmitrij Pochepko wrote: >> >> Hi all, >> >> please review a patch for JDK-8141351 - Create tests for direct invoke instructions testing >> >> There was no separate jtreg tests for invokevirtual, invokespecial, invokestatic, invokeinterface and invokedynamic instructions before, so, a tests to check it with combinations of compiled, interpreted and native code were created. >> >> There are 8 common classes and native part in test/compiler/calls/common which contains all generic logic. Other files are just test descriptions. >> Every test here basically consists of caller method and callee method, which is called using tested instruction. >> So, test descriptions runs different combinations of caller and callee being compiled/interpreted/native. >> >> I've tested these tests on several platforms(linux/macos/solaris). >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8141351 >> >> A webrev: http://cr.openjdk.java.net/~dpochepk/8141351/webrev.01/ >> >> Thanks, >> Dmitrij From christian.thalinger at oracle.com Tue Dec 15 21:19:57 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 15 Dec 2015 11:19:57 -1000 Subject: RFR: 8144704: [JVMCI] add tests for simple code installation In-Reply-To: <56704766.3030509@oracle.com> References: <56704766.3030509@oracle.com> Message-ID: <6341D6AE-2BD4-4AA5-8327-6CA6A784630E@oracle.com> That?s neat! Thanks for doing this. Looks good. Andrew, can you implement TestAssembler for AArch64? Then we could integrate 8143072. > On Dec 15, 2015, at 7:01 AM, Roland Schatz wrote: > > Hi, > > Please review these new unittests for the JVMCI. > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8144704 > Webrev: http://cr.openjdk.java.net/~rschatz/JDK-8144704/webrev.00/ > > These tests try to generate, install and execute code. In order to do this, they need a small macro assembler for each platform. > The TestAssembler is implemented for AMD64 and SPARC (i.e. all platforms that currently have the JVMCI). > > The tests themselves are platform independent. They test: > > - SimpleCodeInstallationTest.java: Installation and execution of a simple "a + b" method. > - DataPatchTest.java: Installation and execution of various different implementations of "return DataPatchTest.class;" methods, using all possible combinations of data patches (narrow/wide, oop/klass pointer, inline in code or through data section). > - SimpleDebugInfoTest.java: Deoptimizations to various interpreter states with constant/register/stack values. > - VirtualObjectDebugInfoTest.java: Deoptimization with a virtual object graph. > > These tests are by far not complete, if we want complete coverage of all combinations (especially data type / native location / jvm location combinations in the DebugInfo), we probably need generated tests. > > Aside from combination coverage, still not tested are: > - deoptimizations to more complicated VM states (e.g. while holding a lock, or with more than one inlined stack frame) > - safepoints and oopmaps > - invokes > > Thanks, > Roland From vladimir.kozlov at oracle.com Tue Dec 15 22:37:31 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 14:37:31 -0800 Subject: RFR(S/M): 8144246: CompilerControl: adding lots of directives via jcmd may produce OOM crash In-Reply-To: <56702C12.5050207@oracle.com> References: <566ED7D3.1010200@oracle.com> <566F49B9.8090104@oracle.com> <56702C12.5050207@oracle.com> Message-ID: <5670962B.2070904@oracle.com> Good. Thanks, Vladimir On 12/15/15 7:04 AM, Nils Eliasson wrote: > Thank you Vladimir! > > I added a test update to the webrev to limit the stress test to 1000 directives. > > http://cr.openjdk.java.net/~neliasso/8144246/webrev.02/ > > Best regards, > //Nils > > On 2015-12-14 23:59, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks, >> Vladimir >> >> On 12/14/15 6:53 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this minor change. It introduced a limit to how many >>> directives can be added. The limit can be controlled by the diagnostic >>> flag CompilerDirectivesLimit. For normal use it would be very unusual to >>> have more than a few directives. >>> >>> The Flag PrintCompilerDirectives was changed to CompilerDirectivesPrint >>> to have a consistent naming for all directives flag. This is a new flag >>> and is not used anywhere yet. >>> >>> Testing: >>> All the compiler control tests will have been run before submit. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144246 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8144246/webrev.01/ >>> >>> Regards, >>> Nils >>> > From vladimir.kozlov at oracle.com Tue Dec 15 22:57:58 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 14:57:58 -0800 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5670060A.6010501@oracle.com> References: <5670060A.6010501@oracle.com> Message-ID: <56709AF6.9070701@oracle.com> Nils, Should you also check (!LogOption && LogCompilation) "compilation logging should be enabled if LogCompilation is set"? Thanks, Vladimir On 12/15/15 4:22 AM, Nils Eliasson wrote: > Hi, > > Please review this change that fixes log compilation. It changed the default value for the log option to the command > line flag value in compilerDirectives.hpp, updates how CompileCommand=log updates the value, and adds a warning if per > method logging is used but LogCompilation is not set. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 > Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ > > Regards, > Nils Eliasson From vladimir.kozlov at oracle.com Tue Dec 15 23:03:43 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 15:03:43 -0800 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe In-Reply-To: <567007FC.7090601@oracle.com> References: <567007FC.7090601@oracle.com> Message-ID: <56709C4F.9000309@oracle.com> Changes looks good. Thanks, Vladimir On 12/15/15 4:30 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8071374 > > Disassembler wraps some native disassembler, which is not necessarily thread-safe. It's not a problem for > -XX:+PrintAssembly since access from compilers is serialized by Compile_lock. > > It is not the case anymore when there are calls from runtime (e.g., with -XX:+PrintSignatureHandlers). The problem can > manifest as a failure to parse instruction stream. > > The fix is to serialize access to Disassembler on tty_lock. > > Considering most of the calls to Disassembler::decode are performed under tty_lock (which has the lowest rank), it's too > burdensome to introduce a dedicated lock and a new rank to please deadlock detection logic. > > Also, some cleanups are included. > > Testing: failing test case from the report, JPRT. > > Thanks! > > Best regards, > Vladimir Ivanov > > PS: I noted that the following code usually dumps some garbage at the end of the code block: > > src/share/vm/interpreter/interpreter.cpp: > void SignatureHandlerLibrary::add(const methodHandle& method) { > ... > tty->print_cr(" --- associated result handler ---"); > address rh_end = rh_begin; > while (*(int*)rh_end != 0) { > rh_end += sizeof(int); > } > Disassembler::decode(rh_begin, rh_end); > > $ java -XX:+PrintSignatureHandlers ... > ... > argument handler #0 for: static java.lang.Object.registerNatives()V (fingerprint = 349, 11 bytes generated) > 0x0000000106d55e60: movabs $0x106c1b118,%rax > 0x0000000106d55e6a: retq > --- associated result handler --- > 0x0000000106c1b118: retq // T_VOID: _native_abi_to_tosca[6] > 0x0000000106c1b119: retq // T_FLOAT: _native_abi_to_tosca[7] > 0x0000000106c1b11a: retq // T_DOUBLE: _native_abi_to_tosca[8] > // T_OBJECT/T_ARRAY: _native_abi_to_tosca[9] > 0x0000000106c1b11b: mov 0x10(%rbp),%rax > 0x0000000106c1b11f: retq > > === end of AbstractInterpreter::_native_abi_to_tosca[] > > Garbage until (*(int*)rh_end) == 0: > > 0x0000000106c1b120: rex add %eax,(%rax) > 0x0000000106c1b123: add %cl,%ah > 0x0000000106c1b125: int3 > 0x0000000106c1b126: int3 > 0x0000000106c1b127: int3 > 0x0000000106c1b128: insl (%dx),%es:(%rdi) > 0x0000000106c1b129: sub $0x10521,%eax > 0x0000000106c1b12e: add %al,(%rax) > 0x0000000106c1b130: (bad) > 0x0000000106c1b131: (bad) > 0x0000000106c1b132: (bad) > 0x0000000106c1b133: dec %esp > 0x0000000106c1b135: int3 > 0x0000000106c1b136: int3 > 0x0000000106c1b137: int3 > > Maybe add some padding in either CodeletMark::~CodeletMark or TemplateInterpreterGenerator::generate_all()? > > From vladimir.kozlov at oracle.com Tue Dec 15 23:06:28 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 15:06:28 -0800 Subject: RFR: 8145438: Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <56706A3E.4040809@redhat.com> References: <56706A3E.4040809@redhat.com> Message-ID: <56709CF4.4070800@oracle.com> Looks good. I agree with adlc change. I will sponsor it. Thanks, Vladimir On 12/15/15 11:30 AM, Andrew Haley wrote: > This patch touches shared code, so I need a sponsor to push it. > > The AArch64 bit-test instructions are limited to a 32k displacement. > In some fairly unusual circumstances the range can exceed this, so we > get a compile-time failure. This patch implement long and short > variants of the patterns. > > The shared code I changed is in adlc. The problem there is that when > we search for short and long variants of a branch we do not consider > the predicates. It makes no sense at all for short and long variants > to have different predicates, so suspect this is a bug in adlc. > > http://cr.openjdk.java.net/~aph/8145438 > > Thanks, > > Andrew. > From sangheon.kim at oracle.com Tue Dec 15 23:10:08 2015 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 15 Dec 2015 15:10:08 -0800 Subject: RFR(s): 8144949: TestOptionsWithRanges -XX:NUMAInterleaveGranularity=2147483648 crashes VM Message-ID: <56709DD0.80808@oracle.com> Hi all, Could I get some reviews for this change? Current 32bit binary with NUMAInterleavingGranularity=2g on server mode fires an assert on Windows as we are proceeding without memory allocation failure check at CodeCache::reserve_heap_memory. I think the constraint function can be removed with maximum range of 2G/8192G. These are the maximum available memory on Windows and smaller values can be used but I wanted to avoid adding artificial limit. With this limitation, current constraint function for overflow check is not needed. And we need to check allocation failure. This issue is not reproducible with client mode as the different default value of ReservedCodeCacheSize sets SegmentedCodeCache to false. And CodeCache::reserve_heap_memory() is called only when SegmentedCodeCache is enabled. Skipped adding a test as TestOptionsWithRanges.java is enough when this is combined with nightly vm option rotation. CR: https://bugs.openjdk.java.net/browse/JDK-8144949 Webrev: http://cr.openjdk.java.net/~sangheki/8144949/webrev.00 Testing: JPRT (with TestOptionsWithRanges.java enabled), manual tests on Windows machine(to test several option combination). Thanks, Sangheon From sangheon.kim at oracle.com Tue Dec 15 23:47:44 2015 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 15 Dec 2015 15:47:44 -0800 Subject: RFR(s): 8144949: TestOptionsWithRanges -XX:NUMAInterleaveGranularity=2147483648 crashes VM In-Reply-To: <5670A345.6030200@oracle.com> References: <56709DD0.80808@oracle.com> <5670A345.6030200@oracle.com> Message-ID: <5670A6A0.6010000@oracle.com> Hi Jesper, Thanks for the review! Sangheon On 12/15/2015 03:33 PM, Jesper Wilhelmsson wrote: > Looks good! > /Jesper > > Den 16/12/15 kl. 00:10, skrev sangheon: >> Hi all, >> >> Could I get some reviews for this change? >> >> Current 32bit binary with NUMAInterleavingGranularity=2g on server >> mode fires an >> assert on Windows as we are proceeding without memory allocation >> failure check >> at CodeCache::reserve_heap_memory. >> >> I think the constraint function can be removed with maximum range of >> 2G/8192G. >> These are the maximum available memory on Windows and smaller values >> can be used >> but I wanted to avoid adding artificial limit. With this limitation, >> current >> constraint function for overflow check is not needed. >> And we need to check allocation failure. >> >> This issue is not reproducible with client mode as the different >> default value >> of ReservedCodeCacheSize sets SegmentedCodeCache to false. And >> CodeCache::reserve_heap_memory() is called only when >> SegmentedCodeCache is enabled. >> >> Skipped adding a test as TestOptionsWithRanges.java is enough when >> this is >> combined with nightly vm option rotation. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8144949 >> Webrev: http://cr.openjdk.java.net/~sangheki/8144949/webrev.00 >> Testing: JPRT (with TestOptionsWithRanges.java enabled), manual tests >> on Windows >> machine(to test several option combination). >> >> Thanks, >> Sangheon >> From vladimir.kozlov at oracle.com Wed Dec 16 01:33:55 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 17:33:55 -0800 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> Message-ID: <5670BF83.7060907@oracle.com> On 12/15/15 1:14 AM, Roland Westrelin wrote: > For reference, current webrev: > > http://cr.openjdk.java.net/~roland/8139771/webrev.01/ > >>> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. >> >> Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? > > Let me take another look at this. > Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? We never peel before predicates. Peeling does not know about them. The peeled iteration is placed between predicates and peeled loop head. Vladimir > > Roland. > From vladimir.kozlov at oracle.com Wed Dec 16 01:38:32 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Dec 2015 17:38:32 -0800 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> References: <566F8177.8080000@oracle.com> <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> Message-ID: <5670C098.1030301@oracle.com> Very nice! You may need to change code in castnode.cpp according new changes 8145096 if they pushed first (not yet). And also 32-bit as Tobias pointed. Thanks, Vladimir On 12/15/15 12:55 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> Second assembler output still have intermediate increments and also new movslq instructions. Why it should be better. > > I thinks there is some confusion here. There are 2 problems I?d like to fix. One is when using checkIndex. In that case, the code should be as good as regular array accesses. The first assembly dump shows it?s not. The second problem is when not using checkIndex but we know the loop bounds, should be able to do better. That?s the second assembly dump. In my email I only showed assembly without my change. With my change: > > first test case: > > 0c2 B11: # B37 B12 <- B8 B10 Loop: B11-B10 inner main of N142 Freq: 975.841 > 0c2 movq RAX, [RSI + #16 + RDI << #3] # long > 0c7 movq RBX, [R9 + #16 + RDI << #3] # long > 0cc cmpq RBX, RAX > 0cf jne B37 P=0.000000 C=7836.000000 > 0cf > 0d5 B12: # B38 B13 <- B11 Freq: 975.84 > 0d5 movq RAX, [RSI + #24 + RDI << #3] # long > 0da movq RBX, [R9 + #24 + RDI << #3] # long > 0df cmpq RBX, RAX > 0e2 jne B38 P=0.000000 C=7836.000000 > 0e2 > 0e8 B13: # B40 B14 <- B12 Freq: 975.84 > 0e8 movq RAX, [RSI + #32 + RDI << #3] # long > 0ed movq RBX, [R9 + #32 + RDI << #3] # long > 0f2 cmpq RBX, RAX > 0f5 jne B40 P=0.000000 C=7836.000000 > 0f5 > 0fb B14: # B42 B15 <- B13 Freq: 975.84 > 0fb movq RAX, [RSI + #40 + RDI << #3] # long > 100 movq RBX, [R9 + #40 + RDI << #3] # long > 105 cmpq RBX, RAX > 108 jne B42 P=0.000000 C=7836.000000 > 108 > 10e B15: # B44 B16 <- B14 Freq: 975.839 > 10e movq RAX, [RSI + #48 + RDI << #3] # long > 113 movq RBX, [R9 + #48 + RDI << #3] # long > 118 movl RDX, RDI # spill > 11a addl RDX, #4 # int > 11d cmpq RBX, RAX > 120 jne B44 P=0.000000 C=7836.000000 > 120 > 126 B16: # B39 B17 <- B15 Freq: 975.839 > 126 movq RAX, [RSI + #56 + RDI << #3] # long > 12b movq RBX, [R9 + #56 + RDI << #3] # long > 130 cmpq RBX, RAX > 133 jne B39 P=0.000000 C=7836.000000 > 133 > 139 B17: # B41 B18 <- B16 Freq: 975.838 > 139 movq RAX, [RSI + #64 + RDI << #3] # long > 13e movq RBX, [R9 + #64 + RDI << #3] # long > 143 cmpq RBX, RAX > 146 jne B41 P=0.000000 C=7836.000000 > 146 > 14c B18: # B43 B19 <- B17 Freq: 975.838 > 14c movq RAX, [RSI + #72 + RDI << #3] # long > 151 movq RBX, [R9 + #72 + RDI << #3] # long > 156 cmpq RBX, RAX > 159 jne B43 P=0.000000 C=7836.000000 > 159 > 15f B19: # B10 B20 <- B18 Freq: 975.837 > 15f movl RDX, RDI # spill > 161 addl RDX, #8 # int > 164 cmpl RDX, RBP > 166 jl B10 # loop end P=0.998980 C=7836.000000 > > > > second test case: > > 0a3 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 > 0a3 movq RDI, [RBP + #16 + RSI << #3] # long > 0a8 movq RAX, [RDX + #16 + RSI << #3] # long > 0ad cmpq RAX, RDI > 0b0 jne B32 P=0.000000 C=7836.000000 > 0b0 > 0b6 B8: # B33 B9 <- B7 Freq: 975.842 > 0b6 movq RDI, [RBP + #24 + RSI << #3] # long > 0bb movq RAX, [RDX + #24 + RSI << #3] # long > 0c0 cmpq RAX, RDI > 0c3 jne B33 P=0.000000 C=7836.000000 > 0c3 > 0c9 B9: # B35 B10 <- B8 Freq: 975.842 > 0c9 movq RDI, [RBP + #32 + RSI << #3] # long > 0ce movq RAX, [RDX + #32 + RSI << #3] # long > 0d3 cmpq RAX, RDI > 0d6 jne B35 P=0.000000 C=7836.000000 > 0d6 > 0dc B10: # B39 B11 <- B9 Freq: 975.842 > 0dc movq RDI, [RBP + #40 + RSI << #3] # long > 0e1 movq RAX, [RDX + #40 + RSI << #3] # long > 0e6 cmpq RAX, RDI > 0e9 jne B39 P=0.000000 C=7836.000000 > 0e9 > 0ef B11: # B38 B12 <- B10 Freq: 975.841 > 0ef movq RDI, [RBP + #48 + RSI << #3] # long > 0f4 movq RAX, [RDX + #48 + RSI << #3] # long > 0f9 movl R8, RSI # spill > 0fc addl R8, #4 # int > 100 cmpq RAX, RDI > 103 jne B38 P=0.000000 C=7836.000000 > 103 > 109 B12: # B34 B13 <- B11 Freq: 975.841 > 109 movq RDI, [RBP + #56 + RSI << #3] # long > 10e movq RAX, [RDX + #56 + RSI << #3] # long > 113 cmpq RAX, RDI > 116 jne B34 P=0.000000 C=7836.000000 > 116 > 11c B13: # B36 B14 <- B12 Freq: 975.84 > 11c movq RDI, [RBP + #64 + RSI << #3] # long > 121 movq RAX, [RDX + #64 + RSI << #3] # long > 126 cmpq RAX, RDI > 129 jne B36 P=0.000000 C=7836.000000 > 129 > 12f B14: # B38 B15 <- B13 Freq: 975.84 > 12f movq RDI, [RBP + #72 + RSI << #3] # long > 134 movq RAX, [RDX + #72 + RSI << #3] # long > 139 movl R8, RSI # spill > 13c addl R8, #7 # int > 140 cmpq RAX, RDI > 143 jne B38 P=0.000000 C=7836.000000 > 143 > 149 B15: # B7 B16 <- B14 Freq: 975.839 > 149 addl RSI, #8 # int > 14c cmpl RSI, R11 > 14f jl B7 # loop end P=0.998980 C=7836.000000 > > Roland. > >> >> Thanks, >> Vladimir >> >> On 12/14/15 8:42 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >>> >>> Paul spotted the following small inefficiencies: >>> >>> for (; wi < l; wi++) { >>> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> is compiled to: >>> >>> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >>> 0b0 movl RDX, RDI # spill >>> 0b2 # castII of RDX >>> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0bc cmpq RBX, RAX >>> 0bf jne B28 P=0.000000 C=7836.000000 >>> 0bf >>> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >>> 0c5 movl RDX, RDI # spill >>> 0c7 incl RDX # int >>> 0c9 # castII of RDX >>> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >>> 0d3 cmpq RBX, RAX >>> 0d6 jne B28 P=0.000000 C=7836.000000 >>> 0d6 >>> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >>> 0dc movl RDX, RDI # spill >>> 0de addl RDX, #2 # int >>> 0e1 # castII of RDX >>> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >>> 0eb cmpq RBX, RAX >>> 0ee jne B28 P=0.000000 C=7836.000000 >>> 0ee >>> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >>> 0f4 movl RDX, RDI # spill >>> 0f6 addl RDX, #3 # int >>> 0f9 # castII of RDX >>> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >>> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >>> 103 cmpq RBX, RAX >>> 106 jne B28 P=0.000000 C=7836.000000 >>> 106 >>> 10c B13: # B9 B14 <- B12 Freq: 977.659 >>> 10c addl RDI, #4 # int >>> 10f cmpl RDI, RBP >>> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >>> >>> But the intermediate increment of the induction variable: >>> 0c7 incl RDX # int >>> 0de addl RDX, #2 # int >>> 0f6 addl RDX, #3 # int >>> >>> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >>> >>> for (; wi < length >> valuesPerWidth; wi++) { >>> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >>> long av = U.getLongUnaligned(a, aOffset + bi); >>> long bv = U.getLongUnaligned(b, bOffset + bi); >>> if (av != bv) { >>> >>> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >>> 0b0 movslq R8, RSI # i2l >>> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >>> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >>> 0bd cmpq RAX, RDI >>> 0c0 jne B32 P=0.000000 C=7836.000000 >>> 0c0 >>> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >>> 0c6 movl R8, RSI # spill >>> 0c9 incl R8 # int >>> 0cc movslq RDI, R8 # i2l >>> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >>> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0d9 cmpq RAX, RDI >>> 0dc jne B33 P=0.000000 C=7836.000000 >>> 0dc >>> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >>> 0e2 movl R8, RSI # spill >>> 0e5 addl R8, #2 # int >>> 0e9 movslq RDI, R8 # i2l >>> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >>> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >>> 0f6 cmpq RAX, RDI >>> 0f9 jne B33 P=0.000000 C=7836.000000 >>> 0f9 >>> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >>> 0ff movl R8, RSI # spill >>> 102 addl R8, #3 # int >>> 106 movslq RDI, R8 # i2l >>> 109 movq RAX, [RDX + #16 + RDI << #3] # long >>> 10e movq RDI, [RBP + #16 + RDI << #3] # long >>> 113 cmpq RAX, RDI >>> 116 jne B33 P=0.000000 C=7836.000000 >>> 116 >>> 11c B11: # B33 B12 <- B10 Freq: 975.841 >>> 11c movl R8, RSI # spill >>> 11f addl R8, #4 # int >>> 123 movslq RDI, R8 # i2l >>> 126 movq RAX, [RDX + #16 + RDI << #3] # long >>> 12b movq RDI, [RBP + #16 + RDI << #3] # long >>> 130 cmpq RAX, RDI >>> 133 jne B33 P=0.000000 C=7836.000000 >>> 133 >>> 139 B12: # B33 B13 <- B11 Freq: 975.841 >>> 139 movl R8, RSI # spill >>> 13c addl R8, #5 # int >>> 140 movslq RDI, R8 # i2l >>> 143 movq RAX, [RDX + #16 + RDI << #3] # long >>> 148 movq RDI, [RBP + #16 + RDI << #3] # long >>> 14d cmpq RAX, RDI >>> 150 jne B33 P=0.000000 C=7836.000000 >>> 150 >>> 156 B13: # B33 B14 <- B12 Freq: 975.84 >>> 156 movl R8, RSI # spill >>> 159 addl R8, #6 # int >>> 15d movslq RDI, R8 # i2l >>> 160 movq RAX, [RDX + #16 + RDI << #3] # long >>> 165 movq RDI, [RBP + #16 + RDI << #3] # long >>> 16a cmpq RAX, RDI >>> 16d jne B33 P=0.000000 C=7836.000000 >>> 16d >>> 173 B14: # B33 B15 <- B13 Freq: 975.84 >>> 173 movl R8, RSI # spill >>> 176 addl R8, #7 # int >>> 17a movslq RDI, R8 # i2l >>> 17d movq RAX, [RDX + #16 + RDI << #3] # long >>> 182 movq RDI, [RBP + #16 + RDI << #3] # long >>> 187 cmpq RAX, RDI >>> 18a jne B33 P=0.000000 C=7836.000000 >>> 18a >>> 190 B15: # B7 B16 <- B14 Freq: 975.839 >>> 190 addl RSI, #8 # int >>> 193 cmpl RSI, R11 >>> 196 jl B7 # loop end P=0.998980 C=7836.000000 >>> >>> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >>> >>> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >>> >>> Roland. >>> > From roland.westrelin at oracle.com Wed Dec 16 08:49:42 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 16 Dec 2015 09:49:42 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: <5670BF83.7060907@oracle.com> References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> <5670BF83.7060907@oracle.com> Message-ID: >> For reference, current webrev: >> >> http://cr.openjdk.java.net/~roland/8139771/webrev.01/ >> >>>> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. >>> >>> Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? >> >> Let me take another look at this. >> Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? > > We never peel before predicates. Peeling does not know about them. The peeled iteration is placed between predicates and peeled loop head. The comment in PhaseIdealLoop::do_peeling() implies that the peeled iteration is above the predicates. We can apply loop predication then peeling. If the peeled iteration is above the predicates, isn?t there a risk the peeled iteration is executed before a predicate it depends on for correctness? Roland. > > Vladimir > >> >> Roland. From roland.westrelin at oracle.com Wed Dec 16 09:00:39 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 16 Dec 2015 10:00:39 +0100 Subject: [9] RFR (XS): 8140659: C1: invokedynamic call patching violates JVMS-6.5.invokedynamic In-Reply-To: <56706829.1080401@oracle.com> References: <56706829.1080401@oracle.com> Message-ID: <5FC424D5-7785-4106-B7FF-AF489427CEDB@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8140659/webrev.00/ That looks good to me. Maybe we can use this opportunity to remove C1PatchInvokeDynamic now that that code has been tested for a while? Roland. From hui.shi at linaro.org Wed Dec 16 12:27:00 2015 From: hui.shi at linaro.org (Hui Shi) Date: Wed, 16 Dec 2015 20:27:00 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <56703C8C.4000801@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: Thanks Andrew, Goetz and all! Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis. 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier. 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore. 2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier. 2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier. Case #1 A a = new A(); safepoint // a can be reached from GC new B(a) allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... a.x = this; // b might visible to other threads here .... release -------- init end BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier. [EA] estimated escape information for B:: non-escaping args: {} stack-allocatable args: {1} return non-local value modified args: 0x6 0x6 flags: b="this" is not local and not arg_stack a is arg_stack means it is passed in and not assigned to other object in initializer. Case #2.1 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... safepoint // "this" is in oop map and might visible to GC thread here .... release -------- init end Case #2.2 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... release -------- init end Regards Hui On 16 December 2015 at 00:15, Andrew Haley wrote: > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > Further, if the object is NoEscape it might not be scalar > > replaced. If I remember correctly, there are various conditions, > > e.g., too big, allocated in loop. > > Well, that's the killer. The definition of "escape" we need to use > here is the really, truly, honest-to-goodness one: that this object > never becomes visible to any other thread by any means. Unless that > is so, all bets are off. In this case, what is intended is "appears > in an OOP map". > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Dec 16 12:53:35 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 16 Dec 2015 13:53:35 +0100 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> Message-ID: >> For reference, current webrev: >> >> http://cr.openjdk.java.net/~roland/8139771/webrev.01/ >> >>>> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. >>> >>> Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? >> >> Let me take another look at this. >> Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? > > So it?s not a peel but a partial peel. The CastPP has _carry_dependency set to true. My concern is that if _carry_dependency is true, we lost track of why we have that CastPP so I wonder if there can be a hidden dependency between the CastPP and the predicates: a pass of loop predication moves stuff out of the loop, that allowed some optimizations to proceed that resulted in the CastPP with _carry_dependency and then the loop is partially peeled. An alternate way to fix this could be: diff --git a/src/share/vm/opto/castnode.cpp b/src/share/vm/opto/castnode.cpp --- a/src/share/vm/opto/castnode.cpp +++ b/src/share/vm/opto/castnode.cpp @@ -79,7 +79,29 @@ // Return a node which is more "ideal" than the current node. Strip out // control copies Node *ConstraintCastNode::Ideal(PhaseGVN *phase, bool can_reshape) { - return (in(0) && remove_dead_region(phase, can_reshape)) ? this : NULL; + Node* c = in(0); + if (c != NULL && remove_dead_region(phase, can_reshape)) { + return this; + } + if (LoopLimitCheck && + c != NULL && + c->is_Proj() && + c->as_Proj()->is_uncommon_trap_if_pattern(Deoptimization::Reason_loop_limit_check)) { + c = c->in(0)->in(0); + } + if (UseLoopPredicate && + c != NULL && + c->is_Proj() && + c->as_Proj()->is_uncommon_trap_if_pattern(Deoptimization::Reason_predicate) && + c->in(0)->in(1)->Opcode() == Op_Conv2B && + c->in(0)->in(1)->in(1)->Opcode() == Op_Opaque1) { + c = c->in(0)->in(0); + } + if (c != in(0)) { + set_req(0, c); + return this; + } + return NULL; } uint ConstraintCastNode::cmp(const Node &n) const { That is move, the CastPP above the predicates if no predicates were added yet. It fixes the failure I see but it?s unclear to me that it covers all cases. Roland. From vladimir.x.ivanov at oracle.com Wed Dec 16 13:29:06 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 16:29:06 +0300 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe In-Reply-To: <5670ACD3.1050404@oracle.com> References: <567007FC.7090601@oracle.com> <5670ACD3.1050404@oracle.com> Message-ID: <56716722.7080703@oracle.com> Ioi, Sorry for the confusion, the fix does exactly that - adds ttyLocker in all 3 Disassembler::decode variants [1]. Other changes are cleanups - use ttyLocker for multi-line dumps. Also, what do you think about result handler output w/ -XX:+PrintSignatureHandlers? Leave it as is? Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00/src/share/vm/compiler/disassembler.cpp.udiff.html On 12/16/15 3:14 AM, Ioi Lam wrote: > Hi Vladimir, > > Thanks for fixing this bug. I have a suggestion: instead of doing this: > > 2642 ttyLocker ttyl; > 2643 tty->print_cr("implicit exception happened at " INTPTR_FORMAT, > p2i(pc)); > 2644 print(); > 2645 method()->print_codes(); > 2646 print_code(); > 2647 print_pcs(); > > Maybe the ttyLocker should be acquired inside Disassembler::decode? > > Your code has the advantage of keeping the related info in the same part > of the print-out, but that's a different problem than the one you want > to solve in this bug. If another thread forgets to take the tty lock > before calling Disassembler::decode, it would still cause your code to > crash. > > Thanks > - Ioi > > > On 12/15/15 4:30 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 >> https://bugs.openjdk.java.net/browse/JDK-8071374 >> >> Disassembler wraps some native disassembler, which is not necessarily >> thread-safe. It's not a problem for -XX:+PrintAssembly since access >> from compilers is serialized by Compile_lock. >> >> It is not the case anymore when there are calls from runtime (e.g., >> with -XX:+PrintSignatureHandlers). The problem can manifest as a >> failure to parse instruction stream. >> >> The fix is to serialize access to Disassembler on tty_lock. >> >> Considering most of the calls to Disassembler::decode are performed >> under tty_lock (which has the lowest rank), it's too burdensome to >> introduce a dedicated lock and a new rank to please deadlock detection >> logic. >> >> Also, some cleanups are included. >> >> Testing: failing test case from the report, JPRT. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> PS: I noted that the following code usually dumps some garbage at the >> end of the code block: >> >> src/share/vm/interpreter/interpreter.cpp: >> void SignatureHandlerLibrary::add(const methodHandle& method) { >> ... >> tty->print_cr(" --- associated result handler ---"); >> address rh_end = rh_begin; >> while (*(int*)rh_end != 0) { >> rh_end += sizeof(int); >> } >> Disassembler::decode(rh_begin, rh_end); >> >> $ java -XX:+PrintSignatureHandlers ... >> ... >> argument handler #0 for: static java.lang.Object.registerNatives()V >> (fingerprint = 349, 11 bytes generated) >> 0x0000000106d55e60: movabs $0x106c1b118,%rax >> 0x0000000106d55e6a: retq >> --- associated result handler --- >> 0x0000000106c1b118: retq // T_VOID: _native_abi_to_tosca[6] >> 0x0000000106c1b119: retq // T_FLOAT: _native_abi_to_tosca[7] >> 0x0000000106c1b11a: retq // T_DOUBLE: _native_abi_to_tosca[8] >> // T_OBJECT/T_ARRAY: _native_abi_to_tosca[9] >> 0x0000000106c1b11b: mov 0x10(%rbp),%rax >> 0x0000000106c1b11f: retq >> >> === end of AbstractInterpreter::_native_abi_to_tosca[] >> >> Garbage until (*(int*)rh_end) == 0: >> >> 0x0000000106c1b120: rex add %eax,(%rax) >> 0x0000000106c1b123: add %cl,%ah >> 0x0000000106c1b125: int3 >> 0x0000000106c1b126: int3 >> 0x0000000106c1b127: int3 >> 0x0000000106c1b128: insl (%dx),%es:(%rdi) >> 0x0000000106c1b129: sub $0x10521,%eax >> 0x0000000106c1b12e: add %al,(%rax) >> 0x0000000106c1b130: (bad) >> 0x0000000106c1b131: (bad) >> 0x0000000106c1b132: (bad) >> 0x0000000106c1b133: dec %esp >> 0x0000000106c1b135: int3 >> 0x0000000106c1b136: int3 >> 0x0000000106c1b137: int3 >> >> Maybe add some padding in either CodeletMark::~CodeletMark or >> TemplateInterpreterGenerator::generate_all()? >> >> > From vladimir.x.ivanov at oracle.com Wed Dec 16 13:37:09 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 16:37:09 +0300 Subject: [9] RFR (XS): 8140659: C1: invokedynamic call patching violates JVMS-6.5.invokedynamic In-Reply-To: <5FC424D5-7785-4106-B7FF-AF489427CEDB@oracle.com> References: <56706829.1080401@oracle.com> <5FC424D5-7785-4106-B7FF-AF489427CEDB@oracle.com> Message-ID: <56716905.8020707@oracle.com> Thanks, Roland. > That looks good to me. > Maybe we can use this opportunity to remove C1PatchInvokeDynamic now that that code has been tested for a while? Good point. Updated version: http://cr.openjdk.java.net/~vlivanov/8140659/webrev.01/ Best regards, Vladimir Ivanov From nils.eliasson at oracle.com Wed Dec 16 13:37:51 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 16 Dec 2015 14:37:51 +0100 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <56709AF6.9070701@oracle.com> References: <5670060A.6010501@oracle.com> <56709AF6.9070701@oracle.com> Message-ID: <5671692F.40304@oracle.com> Vladimir, Consider -XX:+LogCompilation -XX:CompileCommand=log,apatternthatmaynevermatch.amethod LogCompilation will be on, but the LogOption (set by command or directive) will be false. The logs will still contain nmethod installs and compilation statistics. That's why no tests using LogCompilation failed - they all use the information in the compilation statistics. Regards, Nils On 2015-12-15 23:57, Vladimir Kozlov wrote: > Nils, > > Should you also check (!LogOption && LogCompilation) "compilation > logging should be enabled if LogCompilation is set"? > > Thanks, > Vladimir > > On 12/15/15 4:22 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this change that fixes log compilation. It changed the >> default value for the log option to the command >> line flag value in compilerDirectives.hpp, updates how >> CompileCommand=log updates the value, and adds a warning if per >> method logging is used but LogCompilation is not set. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 >> Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ >> >> Regards, >> Nils Eliasson From roland.westrelin at oracle.com Wed Dec 16 13:40:30 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 16 Dec 2015 14:40:30 +0100 Subject: [9] RFR (XS): 8140659: C1: invokedynamic call patching violates JVMS-6.5.invokedynamic In-Reply-To: <56716905.8020707@oracle.com> References: <56706829.1080401@oracle.com> <5FC424D5-7785-4106-B7FF-AF489427CEDB@oracle.com> <56716905.8020707@oracle.com> Message-ID: <81B0564B-A704-4AD7-B353-202F96FA1065@oracle.com> >> Maybe we can use this opportunity to remove C1PatchInvokeDynamic now that that code has been tested for a while? > Good point. Updated version: Thanks. > http://cr.openjdk.java.net/~vlivanov/8140659/webrev.01/ Still good. Roland. From vladimir.x.ivanov at oracle.com Wed Dec 16 13:52:39 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 16:52:39 +0300 Subject: [9] RFR (XS): 8140659: C1: invokedynamic call patching violates JVMS-6.5.invokedynamic In-Reply-To: <81B0564B-A704-4AD7-B353-202F96FA1065@oracle.com> References: <56706829.1080401@oracle.com> <5FC424D5-7785-4106-B7FF-AF489427CEDB@oracle.com> <56716905.8020707@oracle.com> <81B0564B-A704-4AD7-B353-202F96FA1065@oracle.com> Message-ID: <56716CA7.7020006@oracle.com> Thank you, Roland. Best regards, Vladimir Ivanov On 12/16/15 4:40 PM, Roland Westrelin wrote: >>> Maybe we can use this opportunity to remove C1PatchInvokeDynamic now that that code has been tested for a while? >> Good point. Updated version: > > Thanks. > >> http://cr.openjdk.java.net/~vlivanov/8140659/webrev.01/ > > Still good. > > Roland. > From roland.westrelin at oracle.com Wed Dec 16 13:54:41 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 16 Dec 2015 14:54:41 +0100 Subject: RFR(XS): 8144851: java/lang/StackWalker/LocalsAndOperands.java: SEGV in StackValue::create_stack_value Message-ID: <32C6E193-577C-44C6-B8AE-1F3B034824F0@oracle.com> http://cr.openjdk.java.net/~roland/8144851/webrev.00/ The crash occurs because the local which is retrieved by the stack walking code is stored in rbp. The correct rbp location is only kept in the RegisterMap if _update_map is true for that RegisterMap which is never true for vframeStream. Roland. From vladimir.x.ivanov at oracle.com Wed Dec 16 16:55:16 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 19:55:16 +0300 Subject: [9] RFR (S): 8133612: new clone logic added in 8042235 is missing from compiler intrinsics Message-ID: <56719774.4020203@oracle.com> http://cr.openjdk.java.net/~vlivanov/8133612/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8133612 MemberName instances should be properly registered in the JVM to support class redefinition. JVM_Clone has special logic to do so, but C2 intrinsic doesn't. The proposed fix is to always call into JVM when cloning MemberNames. It is achieved by changing JVM_ACC_IS_CLONEABLE flag meaning and setting it only on classes which don't require any additional work when cloned. It allows to preserve exact intrinsic shape. The downside is that before throwing CloneNotSupportedException, it is not enough to check JVM_ACC_IS_CLONEABLE anymore, but interface type check should be redone as well. Also, it affects MemberName case, which isn't performance critical. The alternative would be to introduce 1 more guard on fast path (MemberName exact class check) which would affect all intrinsified cases. Testing: manual (trivial test w/ -XX:+CountJVMCalls), JPRT. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Dec 16 16:59:18 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 19:59:18 +0300 Subject: RFR(XS): 8144851: java/lang/StackWalker/LocalsAndOperands.java: SEGV in StackValue::create_stack_value In-Reply-To: <32C6E193-577C-44C6-B8AE-1F3B034824F0@oracle.com> References: <32C6E193-577C-44C6-B8AE-1F3B034824F0@oracle.com> Message-ID: <56719866.5000200@oracle.com> Looks good. Best regards, Vladimir Ivanov On 12/16/15 4:54 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8144851/webrev.00/ > > The crash occurs because the local which is retrieved by the stack walking code is stored in rbp. The correct rbp location is only kept in the RegisterMap if _update_map is true for that RegisterMap which is never true for vframeStream. > > Roland. > From vladimir.kozlov at oracle.com Wed Dec 16 17:10:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Dec 2015 09:10:05 -0800 Subject: RFR(XS): 8144851: java/lang/StackWalker/LocalsAndOperands.java: SEGV in StackValue::create_stack_value In-Reply-To: <32C6E193-577C-44C6-B8AE-1F3B034824F0@oracle.com> References: <32C6E193-577C-44C6-B8AE-1F3B034824F0@oracle.com> Message-ID: <56719AED.7040805@oracle.com> Good. Vladimir K On 12/16/15 5:54 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8144851/webrev.00/ > > The crash occurs because the local which is retrieved by the stack walking code is stored in rbp. The correct rbp location is only kept in the RegisterMap if _update_map is true for that RegisterMap which is never true for vframeStream. > > Roland. > From vladimir.x.ivanov at oracle.com Wed Dec 16 18:41:13 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Dec 2015 21:41:13 +0300 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe In-Reply-To: <5671AFC6.4050308@oracle.com> References: <567007FC.7090601@oracle.com> <5670ACD3.1050404@oracle.com> <56716722.7080703@oracle.com> <5671AFC6.4050308@oracle.com> Message-ID: <5671B049.5010506@oracle.com> Thanks, Ioi and Vladimir. Best regards, Vladimir Ivanov On 12/16/15 9:39 PM, Ioi Lam wrote: > Oops, sorry I missed the changes in Disassembler::decode(). The changes > look good. > > Thanks > - Ioi > > On 12/16/15 5:29 AM, Vladimir Ivanov wrote: >> Ioi, >> >> Sorry for the confusion, the fix does exactly that - adds ttyLocker in >> all 3 Disassembler::decode variants [1]. >> >> Other changes are cleanups - use ttyLocker for multi-line dumps. >> >> Also, what do you think about result handler output w/ >> -XX:+PrintSignatureHandlers? Leave it as is? >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00/src/share/vm/compiler/disassembler.cpp.udiff.html >> >> >> On 12/16/15 3:14 AM, Ioi Lam wrote: >>> Hi Vladimir, >>> >>> Thanks for fixing this bug. I have a suggestion: instead of doing this: >>> >>> 2642 ttyLocker ttyl; >>> 2643 tty->print_cr("implicit exception happened at " INTPTR_FORMAT, >>> p2i(pc)); >>> 2644 print(); >>> 2645 method()->print_codes(); >>> 2646 print_code(); >>> 2647 print_pcs(); >>> >>> Maybe the ttyLocker should be acquired inside Disassembler::decode? >>> >>> Your code has the advantage of keeping the related info in the same part >>> of the print-out, but that's a different problem than the one you want >>> to solve in this bug. If another thread forgets to take the tty lock >>> before calling Disassembler::decode, it would still cause your code to >>> crash. >>> >>> Thanks >>> - Ioi >>> >>> >>> On 12/15/15 4:30 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 >>>> https://bugs.openjdk.java.net/browse/JDK-8071374 >>>> >>>> Disassembler wraps some native disassembler, which is not necessarily >>>> thread-safe. It's not a problem for -XX:+PrintAssembly since access >>>> from compilers is serialized by Compile_lock. >>>> >>>> It is not the case anymore when there are calls from runtime (e.g., >>>> with -XX:+PrintSignatureHandlers). The problem can manifest as a >>>> failure to parse instruction stream. >>>> >>>> The fix is to serialize access to Disassembler on tty_lock. >>>> >>>> Considering most of the calls to Disassembler::decode are performed >>>> under tty_lock (which has the lowest rank), it's too burdensome to >>>> introduce a dedicated lock and a new rank to please deadlock detection >>>> logic. >>>> >>>> Also, some cleanups are included. >>>> >>>> Testing: failing test case from the report, JPRT. >>>> >>>> Thanks! >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> PS: I noted that the following code usually dumps some garbage at the >>>> end of the code block: >>>> >>>> src/share/vm/interpreter/interpreter.cpp: >>>> void SignatureHandlerLibrary::add(const methodHandle& method) { >>>> ... >>>> tty->print_cr(" --- associated result handler ---"); >>>> address rh_end = rh_begin; >>>> while (*(int*)rh_end != 0) { >>>> rh_end += sizeof(int); >>>> } >>>> Disassembler::decode(rh_begin, rh_end); >>>> >>>> $ java -XX:+PrintSignatureHandlers ... >>>> ... >>>> argument handler #0 for: static java.lang.Object.registerNatives()V >>>> (fingerprint = 349, 11 bytes generated) >>>> 0x0000000106d55e60: movabs $0x106c1b118,%rax >>>> 0x0000000106d55e6a: retq >>>> --- associated result handler --- >>>> 0x0000000106c1b118: retq // T_VOID: _native_abi_to_tosca[6] >>>> 0x0000000106c1b119: retq // T_FLOAT: _native_abi_to_tosca[7] >>>> 0x0000000106c1b11a: retq // T_DOUBLE: _native_abi_to_tosca[8] >>>> // T_OBJECT/T_ARRAY: _native_abi_to_tosca[9] >>>> 0x0000000106c1b11b: mov 0x10(%rbp),%rax >>>> 0x0000000106c1b11f: retq >>>> >>>> === end of AbstractInterpreter::_native_abi_to_tosca[] >>>> >>>> Garbage until (*(int*)rh_end) == 0: >>>> >>>> 0x0000000106c1b120: rex add %eax,(%rax) >>>> 0x0000000106c1b123: add %cl,%ah >>>> 0x0000000106c1b125: int3 >>>> 0x0000000106c1b126: int3 >>>> 0x0000000106c1b127: int3 >>>> 0x0000000106c1b128: insl (%dx),%es:(%rdi) >>>> 0x0000000106c1b129: sub $0x10521,%eax >>>> 0x0000000106c1b12e: add %al,(%rax) >>>> 0x0000000106c1b130: (bad) >>>> 0x0000000106c1b131: (bad) >>>> 0x0000000106c1b132: (bad) >>>> 0x0000000106c1b133: dec %esp >>>> 0x0000000106c1b135: int3 >>>> 0x0000000106c1b136: int3 >>>> 0x0000000106c1b137: int3 >>>> >>>> Maybe add some padding in either CodeletMark::~CodeletMark or >>>> TemplateInterpreterGenerator::generate_all()? >>>> >>>> >>> > From john.r.rose at oracle.com Wed Dec 16 19:55:28 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 16 Dec 2015 11:55:28 -0800 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5670060A.6010501@oracle.com> References: <5670060A.6010501@oracle.com> Message-ID: <5C59C82F-EAED-4F6B-8768-0FFB100415AA@oracle.com> Let's strengthen the regression test so it will catch this kind of outage next time. As we unify the various logging mechanisms in the future, we want a way to detect such problems before promotion. ? John On Dec 15, 2015, at 4:22 AM, Nils Eliasson wrote: > > Hi, > > Please review this change that fixes log compilation. It changed the default value for the log option to the command line flag value in compilerDirectives.hpp, updates how CompileCommand=log updates the value, and adds a warning if per method logging is used but LogCompilation is not set. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 > Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ > > Regards, > Nils Eliasson From pavel.punegov at oracle.com Wed Dec 16 19:56:37 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 16 Dec 2015 22:56:37 +0300 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true Message-ID: Please review this small fix to a test bug. Issue: when test builds a state for a method that doesn?t match any compileonly command it should consider that this method wasn?t set compiled/excluded with any other compileonly or exclude commands. This means that it should check that appropriate Optional is not present (isn?t set). bug: https://bugs.openjdk.java.net/browse/JDK-8145025 webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Dec 16 20:26:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Dec 2015 12:26:07 -0800 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5671692F.40304@oracle.com> References: <5670060A.6010501@oracle.com> <56709AF6.9070701@oracle.com> <5671692F.40304@oracle.com> Message-ID: <5671C8DF.2070406@oracle.com> On 12/16/15 5:37 AM, Nils Eliasson wrote: > Vladimir, > > Consider -XX:+LogCompilation -XX:CompileCommand=log,apatternthatmaynevermatch.amethod > > LogCompilation will be on, but the LogOption (set by command or directive) will be false. The logs will still contain > nmethod installs and compilation statistics. That's why no tests using LogCompilation failed - they all use the > information in the compilation statistics. Are you saying that it is acceptable case (LogCompilation=true && LogOption=false)? That is why you don't want additionla assert I suggested? Current assert only fail when (LogCompilation=false && LogOption=true) which is not related to this bug failure. Thanks, Vladimir > > Regards, > Nils > > On 2015-12-15 23:57, Vladimir Kozlov wrote: >> Nils, >> >> Should you also check (!LogOption && LogCompilation) "compilation logging should be enabled if LogCompilation is set"? >> >> Thanks, >> Vladimir >> >> On 12/15/15 4:22 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this change that fixes log compilation. It changed the default value for the log option to the command >>> line flag value in compilerDirectives.hpp, updates how CompileCommand=log updates the value, and adds a warning if per >>> method logging is used but LogCompilation is not set. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ >>> >>> Regards, >>> Nils Eliasson > From dean.long at oracle.com Wed Dec 16 20:41:56 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 16 Dec 2015 12:41:56 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <566B216B.1020204@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> Message-ID: <5671CC94.2080205@oracle.com> Ping. Could runtime folks please comment on Vladimir's suggestion to have oopDesc::print_*_on and Metadata::print_*_maybe_null support Universe::non_oop_word() values without crashing, or if I should keep this change in nmethod only. thanks, dl On 12/11/2015 11:18 AM, Dean Long wrote: > [adding hotspot-runtime-dev] > > On 12/11/2015 3:49 AM, Vladimir Ivanov wrote: >> Dean, thanks for taking care of it. >> >> Can oopDesc::print_value_on and print_value_on_maybe_null be enhanced >> instead to handle non_oop_word case (in addition to NULL case)? >> > > I thought of that, but didn't want to add > print_value_on_maybe_null_or_non_oop :-) > > If you feel strongly about that, then I should probably get input from > runtime too, since I think they own that code. > >> Also, the following is slightly misleading since metadata pointers >> aren't oops: >> void nmethod::print_recorded_metadata() { >> + if (m == (Metadata*)Universe::non_oop_word()) { >> + tty->print("non-oop word"); >> > > Would "non-metadata word" be better? > > dl > >> Best regards, >> Vladimir Ivanov >> >> On 12/11/15 6:36 AM, Dean Long wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8144852 >>> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >>> >>> The fix for [1] introduced new functions nmethod::print_recorded_oops >>> and nmethod::print_recorded_metadata that print all oop and metadata >>> values in an nmethod. Currently NULL values are handled OK, but >>> Universe::non_oop_word values cause a crash. >>> >>> (This bug is marked confidential because it was reported against one of >>> our closed ports.) >>> >>> dl >>> >>> [1] JDK-8072008: Emit direct call instead of linkTo* for recursive >>> indy/MH.invoke* calls > From nils.eliasson at oracle.com Wed Dec 16 20:50:03 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 16 Dec 2015 21:50:03 +0100 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5671C8DF.2070406@oracle.com> References: <5670060A.6010501@oracle.com> <56709AF6.9070701@oracle.com> <5671692F.40304@oracle.com> <5671C8DF.2070406@oracle.com> Message-ID: <5671CE7B.6070009@oracle.com> On 2015-12-16 21:26, Vladimir Kozlov wrote: > On 12/16/15 5:37 AM, Nils Eliasson wrote: >> Vladimir, >> >> Consider -XX:+LogCompilation >> -XX:CompileCommand=log,apatternthatmaynevermatch.amethod >> >> LogCompilation will be on, but the LogOption (set by command or >> directive) will be false. The logs will still contain >> nmethod installs and compilation statistics. That's why no tests >> using LogCompilation failed - they all use the >> information in the compilation statistics. > > Are you saying that it is acceptable case (LogCompilation=true && > LogOption=false)? That is why you don't want additionla assert I > suggested? Yes, it is an acceptable case. > Current assert only fail when (LogCompilation=false && LogOption=true) > which is not related to this bug failure. Assert? Do you mean this check: *+ if (LogOption && !LogCompilation) {* *+ st->print_cr("Warning: +LogCompilation must be set to enable compilation logging from directives");* *+ } * There is a similar check and warning for compile commands, I added the same for directives. Thanks, Nils > > Thanks, > Vladimir > >> >> Regards, >> Nils >> >> On 2015-12-15 23:57, Vladimir Kozlov wrote: >>> Nils, >>> >>> Should you also check (!LogOption && LogCompilation) "compilation >>> logging should be enabled if LogCompilation is set"? >>> >>> Thanks, >>> Vladimir >>> >>> On 12/15/15 4:22 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this change that fixes log compilation. It changed >>>> the default value for the log option to the command >>>> line flag value in compilerDirectives.hpp, updates how >>>> CompileCommand=log updates the value, and adds a warning if per >>>> method logging is used but LogCompilation is not set. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ >>>> >>>> Regards, >>>> Nils Eliasson >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Dec 16 20:53:13 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Dec 2015 12:53:13 -0800 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5671CE7B.6070009@oracle.com> References: <5670060A.6010501@oracle.com> <56709AF6.9070701@oracle.com> <5671692F.40304@oracle.com> <5671C8DF.2070406@oracle.com> <5671CE7B.6070009@oracle.com> Message-ID: <5671CF39.8040000@oracle.com> On 12/16/15 12:50 PM, Nils Eliasson wrote: > > On 2015-12-16 21:26, Vladimir Kozlov wrote: >> On 12/16/15 5:37 AM, Nils Eliasson wrote: >>> Vladimir, >>> >>> Consider -XX:+LogCompilation -XX:CompileCommand=log,apatternthatmaynevermatch.amethod >>> >>> LogCompilation will be on, but the LogOption (set by command or directive) will be false. The logs will still contain >>> nmethod installs and compilation statistics. That's why no tests using LogCompilation failed - they all use the >>> information in the compilation statistics. >> >> Are you saying that it is acceptable case (LogCompilation=true && LogOption=false)? That is why you don't want >> additionla assert I suggested? > > Yes, it is an acceptable case. Fine. > >> Current assert only fail when (LogCompilation=false && LogOption=true) which is not related to this bug failure. > Assert? Do you mean this check: Yes, this one. > > *+ if (LogOption && !LogCompilation) {* > *+ st->print_cr("Warning: +LogCompilation must be set to enable compilation logging from directives");* > *+ } * > > There is a similar check and warning for compile commands, I added the same for directives. Okay. Thanks, Vladimir > > Thanks, > Nils > > >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> On 2015-12-15 23:57, Vladimir Kozlov wrote: >>>> Nils, >>>> >>>> Should you also check (!LogOption && LogCompilation) "compilation logging should be enabled if LogCompilation is set"? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/15/15 4:22 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this change that fixes log compilation. It changed the default value for the log option to the command >>>>> line flag value in compilerDirectives.hpp, updates how CompileCommand=log updates the value, and adds a warning if per >>>>> method logging is used but LogCompilation is not set. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ >>>>> >>>>> Regards, >>>>> Nils Eliasson >>> > From nils.eliasson at oracle.com Wed Dec 16 20:58:08 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 16 Dec 2015 21:58:08 +0100 Subject: RFR(S): 8145345: LogCompilation output is empty after JEP165: Compiler Control In-Reply-To: <5671CF39.8040000@oracle.com> References: <5670060A.6010501@oracle.com> <56709AF6.9070701@oracle.com> <5671692F.40304@oracle.com> <5671C8DF.2070406@oracle.com> <5671CE7B.6070009@oracle.com> <5671CF39.8040000@oracle.com> Message-ID: <5671D060.2090903@oracle.com> Thank you Vladimir! Best regards, Nils On 2015-12-16 21:53, Vladimir Kozlov wrote: > On 12/16/15 12:50 PM, Nils Eliasson wrote: >> >> On 2015-12-16 21:26, Vladimir Kozlov wrote: >>> On 12/16/15 5:37 AM, Nils Eliasson wrote: >>>> Vladimir, >>>> >>>> Consider -XX:+LogCompilation >>>> -XX:CompileCommand=log,apatternthatmaynevermatch.amethod >>>> >>>> LogCompilation will be on, but the LogOption (set by command or >>>> directive) will be false. The logs will still contain >>>> nmethod installs and compilation statistics. That's why no tests >>>> using LogCompilation failed - they all use the >>>> information in the compilation statistics. >>> >>> Are you saying that it is acceptable case (LogCompilation=true && >>> LogOption=false)? That is why you don't want >>> additionla assert I suggested? >> >> Yes, it is an acceptable case. > > Fine. > >> >>> Current assert only fail when (LogCompilation=false && >>> LogOption=true) which is not related to this bug failure. >> Assert? Do you mean this check: > > Yes, this one. > >> >> *+ if (LogOption && !LogCompilation) {* >> *+ st->print_cr("Warning: +LogCompilation must be set to enable >> compilation logging from directives");* >> *+ } * >> >> There is a similar check and warning for compile commands, I added >> the same for directives. > > Okay. > > Thanks, > Vladimir > >> >> Thanks, >> Nils >> >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Regards, >>>> Nils >>>> >>>> On 2015-12-15 23:57, Vladimir Kozlov wrote: >>>>> Nils, >>>>> >>>>> Should you also check (!LogOption && LogCompilation) "compilation >>>>> logging should be enabled if LogCompilation is set"? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/15/15 4:22 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this change that fixes log compilation. It changed >>>>>> the default value for the log option to the command >>>>>> line flag value in compilerDirectives.hpp, updates how >>>>>> CompileCommand=log updates the value, and adds a warning if per >>>>>> method logging is used but LogCompilation is not set. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8145345 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8145345/webrev.01/ >>>>>> >>>>>> Regards, >>>>>> Nils Eliasson >>>> >> From vladimir.x.ivanov at oracle.com Wed Dec 16 22:41:45 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 17 Dec 2015 01:41:45 +0300 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <566B216B.1020204@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> Message-ID: <5671E8A9.7030304@oracle.com> Sorry for the late response, Dean. > > I thought of that, but didn't want to add > print_value_on_maybe_null_or_non_oop :-) Frankly speaking, I don't see much value in oopDesc::print_value_on_maybe_null vs oopDesc::print_value_on separation. So, I'm in favor of just enhancing current versions and not introducing new methods/longer names :-) > > If you feel strongly about that, then I should probably get input from > runtime too, since I think they own that code. I don't have a strong opinion here, but it definitely looks cleaner. > >> Also, the following is slightly misleading since metadata pointers >> aren't oops: >> void nmethod::print_recorded_metadata() { >> + if (m == (Metadata*)Universe::non_oop_word()) { >> + tty->print("non-oop word"); >> > > Would "non-metadata word" be better? Yes. Best regards, Vladimir Ivanov >> >> On 12/11/15 6:36 AM, Dean Long wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8144852 >>> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >>> >>> The fix for [1] introduced new functions nmethod::print_recorded_oops >>> and nmethod::print_recorded_metadata that print all oop and metadata >>> values in an nmethod. Currently NULL values are handled OK, but >>> Universe::non_oop_word values cause a crash. >>> >>> (This bug is marked confidential because it was reported against one of >>> our closed ports.) >>> >>> dl >>> >>> [1] JDK-8072008: Emit direct call instead of linkTo* for recursive >>> indy/MH.invoke* calls > From dean.long at oracle.com Thu Dec 17 02:15:10 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 16 Dec 2015 18:15:10 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <5671F5F6.9060605@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> <5671CC94.2080205@oracle.com> <5671F5F6.9060605@oracle.com> Message-ID: <56721AAE.8040106@oracle.com> Thanks Ioi for looking at this. Vladimir, are you OK with keeping the changes in method? If so, I will push what I have. dl On 12/16/2015 3:38 PM, Ioi Lam wrote: > Currently non_oop_word is used only in the nmethod code. If this value > is assigned to an oop or a metadata* elsewhere we would probably see > massive crashes. Adding non_oop_word to oopDesc::print_*_on would > imply that it's OK to assign this value in a more general context, > which is not true. > > So I would suggest keeping knowledge of non_oop_word inside nmethod > for now, and we can revisit this if other places start to use > non_oop_word. > > Thanks > - Ioi > > On 12/16/15 12:41 PM, Dean Long wrote: >> Ping. >> >> Could runtime folks please comment on Vladimir's suggestion to have >> oopDesc::print_*_on and >> Metadata::print_*_maybe_null support Universe::non_oop_word() values >> without crashing, or if I should keep this change in nmethod only. >> >> thanks, >> >> dl >> >> On 12/11/2015 11:18 AM, Dean Long wrote: >>> [adding hotspot-runtime-dev] >>> >>> On 12/11/2015 3:49 AM, Vladimir Ivanov wrote: >>>> Dean, thanks for taking care of it. >>>> >>>> Can oopDesc::print_value_on and print_value_on_maybe_null be >>>> enhanced instead to handle non_oop_word case (in addition to NULL >>>> case)? >>>> >>> >>> I thought of that, but didn't want to add >>> print_value_on_maybe_null_or_non_oop :-) >>> >>> If you feel strongly about that, then I should probably get input >>> from runtime too, since I think they own that code. >>> >>>> Also, the following is slightly misleading since metadata pointers >>>> aren't oops: >>>> void nmethod::print_recorded_metadata() { >>>> + if (m == (Metadata*)Universe::non_oop_word()) { >>>> + tty->print("non-oop word"); >>>> >>> >>> Would "non-metadata word" be better? >>> >>> dl >>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> On 12/11/15 6:36 AM, Dean Long wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8144852 >>>>> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >>>>> >>>>> The fix for [1] introduced new functions nmethod::print_recorded_oops >>>>> and nmethod::print_recorded_metadata that print all oop and metadata >>>>> values in an nmethod. Currently NULL values are handled OK, but >>>>> Universe::non_oop_word values cause a crash. >>>>> >>>>> (This bug is marked confidential because it was reported against >>>>> one of >>>>> our closed ports.) >>>>> >>>>> dl >>>>> >>>>> [1] JDK-8072008: Emit direct call instead of linkTo* for >>>>> recursive >>>>> indy/MH.invoke* calls >>> >> > From nils.eliasson at oracle.com Thu Dec 17 10:05:32 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 17 Dec 2015 11:05:32 +0100 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true In-Reply-To: References: Message-ID: <567288EC.3020001@oracle.com> Hi Pavel, Looks good. //Nils On 2015-12-16 20:56, Pavel Punegov wrote: > Please review this small fix to a test bug. > > Issue: when test builds a state for a method that doesn?t match any > compileonly command it should consider that this method wasn?t set > compiled/excluded with any other compileonly or exclude commands. This > means that it should check that appropriate Optional is not present > (isn?t set). > > bug: https://bugs.openjdk.java.net/browse/JDK-8145025 > webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ > > > ? Thanks, > Pavel Punegov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Thu Dec 17 11:15:29 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 17 Dec 2015 12:15:29 +0100 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: <566FEB61.6040200@redhat.com> References: <566FEB61.6040200@redhat.com> Message-ID: Hi, thanks everybody for your comments. I would suggest the following: 1. UseSSE42Intrinsics - in shared code this flag is only used in library_call.cpp: (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) this can be easily replaced by: (!Matcher::match_rule_supported(Op_StrIndexOf)) where the check for UseSSE42Intrinsics is moved into the arch-dependent match_rule_supported() function in the .ad files - afterwards the flag can be moved from globals.hpp to globals_x86.hpp - and deleted from vm_version_aarch64.cpp (I've checked that match_rule_supported() does 'the right thing on aarch64) - I'll put all this in my change for "8145336: PPC64: fix string intrinsics after CompactStrings change" [1] 2. UseSSE - this is a little more complicated because it is more widely used in shared code - I'd suggest to also move the option from globals.hpp to globals_x86.hpp - and clean up the shared code usages (most of it may be done by using already existing flags like Matcher::strict_fp_requires_explicit_rounding. Maybe we'll have to introduce more (but not expose them as command line option)) - because this is a bigger change I've opened "8145665: Make UseSSE an x86-specific option and cleanup its usage in shared code" [2] for it. @Andrew&Andrew: I've noticed that you don't set UseSSE on aarch64 which means you are using the default value '99'. On sparc and ppc64 we set UseSSE to zero. Maybe you want to do that as well on aarch64 until 8145665 will be resolved? Regards, Volker [1] https://bugs.openjdk.java.net/browse/JDK-8145336 [2] https://bugs.openjdk.java.net/browse/JDK-8145665 On Tue, Dec 15, 2015 at 11:28 AM, Andrew Haley wrote: > On 30/11/15 18:54, Volker Simonis wrote: >> Moreover, "UseSSE42Intrinsics" is clearly a architecture-dependant >> option. I already wondered that according to vm_version_aarch64.cpp it >> seems to exists on aarch64 (is this really true Andrew?). But it's >> surely not available on PowerPC, SPARC, ... > > Sure it is. The flag is just oddly-named, that's all: it means > "InlineStringIndexOf". I have no idea why a name like > UseSSE42Intrinsics ever made its way into the shared code. > > Andrew. From aph at redhat.com Thu Dec 17 12:30:22 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 12:30:22 +0000 Subject: String intrinsics defunct on arch != amd64 after 8141132: JEP 254: Compact Strings In-Reply-To: References: <566FEB61.6040200@redhat.com> Message-ID: <5672AADE.70909@redhat.com> On 12/17/2015 11:15 AM, Volker Simonis wrote: > @Andrew&Andrew: I've noticed that you don't set UseSSE on aarch64 > which means you are using the default value '99'. On sparc and ppc64 > we set UseSSE to zero. Maybe you want to do that as well on aarch64 > until 8145665 will be resolved? What for, exactly? Will it fix anything? Make anything faster? Surely I want to leave this set to 99 until 8145665 is resolved. Andrew. From vladimir.x.ivanov at oracle.com Thu Dec 17 13:07:26 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 17 Dec 2015 16:07:26 +0300 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <56721AAE.8040106@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> <5671CC94.2080205@oracle.com> <5671F5F6.9060605@oracle.com> <56721AAE.8040106@oracle.com> Message-ID: <5672B38E.5040901@oracle.com> > Vladimir, are you OK with keeping the changes in method? If so, I will > push what I have. I'm fine with leaving oopDesc::print_* as is. Best regards, Vladimir Ivanov > On 12/16/2015 3:38 PM, Ioi Lam wrote: >> Currently non_oop_word is used only in the nmethod code. If this value >> is assigned to an oop or a metadata* elsewhere we would probably see >> massive crashes. Adding non_oop_word to oopDesc::print_*_on would >> imply that it's OK to assign this value in a more general context, >> which is not true. >> >> So I would suggest keeping knowledge of non_oop_word inside nmethod >> for now, and we can revisit this if other places start to use >> non_oop_word. >> >> Thanks >> - Ioi >> >> On 12/16/15 12:41 PM, Dean Long wrote: >>> Ping. >>> >>> Could runtime folks please comment on Vladimir's suggestion to have >>> oopDesc::print_*_on and >>> Metadata::print_*_maybe_null support Universe::non_oop_word() values >>> without crashing, or if I should keep this change in nmethod only. >>> >>> thanks, >>> >>> dl >>> >>> On 12/11/2015 11:18 AM, Dean Long wrote: >>>> [adding hotspot-runtime-dev] >>>> >>>> On 12/11/2015 3:49 AM, Vladimir Ivanov wrote: >>>>> Dean, thanks for taking care of it. >>>>> >>>>> Can oopDesc::print_value_on and print_value_on_maybe_null be >>>>> enhanced instead to handle non_oop_word case (in addition to NULL >>>>> case)? >>>>> >>>> >>>> I thought of that, but didn't want to add >>>> print_value_on_maybe_null_or_non_oop :-) >>>> >>>> If you feel strongly about that, then I should probably get input >>>> from runtime too, since I think they own that code. >>>> >>>>> Also, the following is slightly misleading since metadata pointers >>>>> aren't oops: >>>>> void nmethod::print_recorded_metadata() { >>>>> + if (m == (Metadata*)Universe::non_oop_word()) { >>>>> + tty->print("non-oop word"); >>>>> >>>> >>>> Would "non-metadata word" be better? >>>> >>>> dl >>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> On 12/11/15 6:36 AM, Dean Long wrote: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8144852 >>>>>> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >>>>>> >>>>>> The fix for [1] introduced new functions nmethod::print_recorded_oops >>>>>> and nmethod::print_recorded_metadata that print all oop and metadata >>>>>> values in an nmethod. Currently NULL values are handled OK, but >>>>>> Universe::non_oop_word values cause a crash. >>>>>> >>>>>> (This bug is marked confidential because it was reported against >>>>>> one of >>>>>> our closed ports.) >>>>>> >>>>>> dl >>>>>> >>>>>> [1] JDK-8072008: Emit direct call instead of linkTo* for >>>>>> recursive >>>>>> indy/MH.invoke* calls >>>> >>> >> > From pavel.punegov at oracle.com Thu Dec 17 13:44:24 2015 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Thu, 17 Dec 2015 16:44:24 +0300 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true In-Reply-To: <567288EC.3020001@oracle.com> References: <567288EC.3020001@oracle.com> Message-ID: <037269E6-9A07-4436-86E5-3E19D260D063@oracle.com> Thanks for review, Nils ? Pavel. > On 17 Dec 2015, at 13:05, Nils Eliasson wrote: > > Hi Pavel, > > Looks good. > > //Nils > > On 2015-12-16 20:56, Pavel Punegov wrote: >> Please review this small fix to a test bug. >> >> Issue: when test builds a state for a method that doesn?t match any compileonly command it should consider that this method wasn?t set compiled/excluded with any other compileonly or exclude commands. This means that it should check that appropriate Optional is not present (isn?t set). >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8145025 >> webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ >> ? Thanks, >> Pavel Punegov >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Dec 17 13:54:20 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 17 Dec 2015 13:54:20 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> Hi Hui Shi, my concern was not limited to 8144993, but also with respect to 8136596 which is already pushed. I have written the following small java example: public class TestAllocMemBar{ static final int loop_cnt = 20000; void dont_inline_me() {} public class A{ public B b; } public class B{ public B(A a) { a.b = B.this; } } public void TestMethod() { A a = new A(); dont_inline_me(); //System.gc(); B b = new B(a); } public static void main(String args[]){ TestAllocMemBar xyz = new TestAllocMemBar(); long duration = System.nanoTime(); for (int x = 0; x < loop_cnt; x++) { xyz.TestMethod(); } duration = System.nanoTime() - duration; System.out.println("duration: " + duration/1000/loop_cnt + " us per iteration"); } } Execution shows (tested on PPC64): openjdk_9/bin/java -XX:+UseConcMarkSweepGC -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileCommand="exclude TestAllocMemBar::dont_inline_me" -XX:+PrintInlining -XX:+PrintEscapeAnalysis -XX:-EliminateAllocations TestAllocMemBar ? ======== Connection graph for TestAllocMemBar::TestMethod JavaObject NoEscape(NoEscape) [ 59F 179F [ 37 42 ]] 25 Allocate === 5 6 7 8 1 ( 23 21 22 1 10 1 1 ) [[ 26 27 28 35 36 37 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:0 !jvms: TestAllocMemBar::TestMethod @ bci:0 LocalVar [ 25P [ 42 59b ]] 37 Proj === 25 [[ 38 42 59 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:0 LocalVar [ 37 25P [ 179b ]] 42 CheckCastPP === 39 37 [[ 179 183 179 119 98 93 ]] #TestAllocMemBar$A:NotNull:exact * Oop:TestAllocMemBar$A:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:0 JavaObject NoEscape(NoEscape) NSR [ 153F [ 131 136 180 179 ]] 119 Allocate === 105 100 101 8 1 ( 54 117 22 1 10 42 1 ) [[ 120 121 122 129 130 131 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:13 !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 119P [ 136 153b ]] 131 Proj === 119 [[ 132 136 153 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 131 119P [ 180 ]] 136 CheckCastPP === 133 131 [[ 180 193 ]] #TestAllocMemBar$B:NotNull:exact * Oop:TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 136 119P [ 179 ]] 180 EncodeP === _ 136 [[ 181 ]] #narrowoop: TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar$B:: @ bci:11 TestAllocMemBar::TestMethod @ bci:19 @ 5 TestAllocMemBar$A:: (10 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 10 TestAllocMemBar::dont_inline_me (1 bytes) not compilable (disabled) @ 19 TestAllocMemBar$B:: (15 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) duration: 3 us per iteration So you can see that both Allocations have the state NoEscape, but there?s a safepoint (the non-inlined call) between them. Concurrent GC could access the obj header and read stale data (and possibly crash). OptoAssembly shows that the MemBar was optimized out (probably due to 8136596). However, we may have luck. Maybe no concurrent GC accesses the header of newly created objects. But I don?t know if this is true which is the reason why I posted this question originally. Keep in mind that objects can get allocated in old gen. I still could imaging that these 2 optimization may be dangerous. Best regards, Martin From: Hui Shi [mailto:hui.shi at linaro.org] Sent: Mittwoch, 16. Dezember 2015 13:27 To: Andrew Haley Cc: Lindenmaier, Goetz ; Vitaly Davidovich ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode Thanks Andrew, Goetz and all! Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis. 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier. 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore. 2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier. 2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier. Case #1 A a = new A(); safepoint // a can be reached from GC new B(a) allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... a.x = this; // b might visible to other threads here .... release -------- init end BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier. [EA] estimated escape information for B:: non-escaping args: {} stack-allocatable args: {1} return non-local value modified args: 0x6 0x6 flags: b="this" is not local and not arg_stack a is arg_stack means it is passed in and not assigned to other object in initializer. Case #2.1 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... safepoint // "this" is in oop map and might visible to GC thread here .... release -------- init end Case #2.2 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... release -------- init end Regards Hui On 16 December 2015 at 00:15, Andrew Haley > wrote: On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > Further, if the object is NoEscape it might not be scalar > replaced. If I remember correctly, there are various conditions, > e.g., too big, allocated in loop. Well, that's the killer. The definition of "escape" we need to use here is the really, truly, honest-to-goodness one: that this object never becomes visible to any other thread by any means. Unless that is so, all bets are off. In this case, what is intended is "appears in an OOP map". Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Dec 17 13:59:47 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 13:59:47 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> Message-ID: <5672BFD3.7040307@redhat.com> On 12/17/2015 01:54 PM, Doerr, Martin wrote: > So you can see that both Allocations have the state NoEscape, but > there?s a safepoint (the non-inlined call) between them. Concurrent > GC could access the obj header and read stale data (and possibly > crash). OptoAssembly shows that the MemBar was optimized out > (probably due to 8136596). > > However, we may have luck. Maybe no concurrent GC accesses the > header of newly created objects. But I don?t know if this is true > which is the reason why I posted this question originally. Keep in > mind that objects can get allocated in old gen. So, they are both NoEscape. So do the objects actually get allocated? Or are they scalar-replaced? Andrew. From hui.shi at linaro.org Thu Dec 17 15:28:35 2015 From: hui.shi at linaro.org (Hui Shi) Date: Thu, 17 Dec 2015 23:28:35 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672BFD3.7040307@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> Message-ID: Thanks Martin! Could discussion limit to 8144993 in this thread. Stated in early mail, it looks safe in 3 cases for references from both GC thread or other java thread. 8136596 enhances original optimization from noEcape to both noescape and argescape. As said in your new example, both allocations are noescape, so it's not directly related with 8136596. How about starting a new thread discussing if there is possible danger in original storestore barrier optimization? Regards Hui On 17 December 2015 at 21:59, Andrew Haley wrote: > On 12/17/2015 01:54 PM, Doerr, Martin wrote: > > > So you can see that both Allocations have the state NoEscape, but > > there?s a safepoint (the non-inlined call) between them. Concurrent > > GC could access the obj header and read stale data (and possibly > > crash). OptoAssembly shows that the MemBar was optimized out > > (probably due to 8136596). > > > > However, we may have luck. Maybe no concurrent GC accesses the > > header of newly created objects. But I don?t know if this is true > > which is the reason why I posted this question originally. Keep in > > mind that objects can get allocated in old gen. > > So, they are both NoEscape. So do the objects actually get allocated? > Or are they scalar-replaced? > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Dec 17 15:34:54 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 15:34:54 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> Message-ID: <5672D61E.3020805@redhat.com> On 12/17/2015 03:28 PM, Hui Shi wrote: > Could discussion limit to 8144993 in this thread. Stated in early mail, it > looks safe in 3 cases for references from both GC thread or other java > thread. > > 8136596 enhances original optimization from noEcape to both noescape and > argescape. As said in your new example, both allocations are noescape, so > it's not directly related with 8136596. How about starting a new thread > discussing if there is possible danger in original storestore barrier > optimization? I say we should not do that. Martin's concern is real, and you have shown no reason to suppose that removing the memory barriers will not result in a concurrent GC seeing stale object headers. As it stands, unless someone can come up with something convincing, we're going to have to restore those memory barriers. 8144993 should not be committed until this issue is resolved. Andrew. From aph at redhat.com Thu Dec 17 15:43:38 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 15:43:38 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672D61E.3020805@redhat.com> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> Message-ID: <5672D82A.309@redhat.com> The potential problem only arises if "this" is published unsafely and the object to which it is published doesn't escape. Can't we detect unsafe publication? It ought to be easier than escape analysis: it's a matter of detecting that "this" escapes from the constructor. Andrew. From martin.doerr at sap.com Thu Dec 17 17:58:22 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 17 Dec 2015 17:58:22 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672D82A.309@redhat.com> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Hi Andrew, thanks for your emails. Many memory barriers are only there for concurrent java threads and are not relevant for GC. They are opportunities for EscapeAnalysis-based optimizations. The MemBarStoreStore after the Allocation actually has this purpose plus the additional purpose to satisfy GC requirements. EscapeAnalysis was not designed to analyze "escape to concurrent GC". I guess it is difficult to analyze this in general. So maybe it would be better to change the condition for the MemBarStoreStore barrier insertion to something like "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." with the first function containing the knowledge about all GCs. You also had asked if the objects in my example were scalar replaced. By default, they do get scalar-replaced, but I had prevented this by -XX:-EliminateAllocations which does not influence the escape state and the membar optimizations. Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 17. Dezember 2015 16:44 To: Hui Shi Cc: Doerr, Martin ; Lindenmaier, Goetz ; Vitaly Davidovich ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode The potential problem only arises if "this" is published unsafely and the object to which it is published doesn't escape. Can't we detect unsafe publication? It ought to be easier than escape analysis: it's a matter of detecting that "this" escapes from the constructor. Andrew. From vitalyd at gmail.com Thu Dec 17 18:10:44 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 17 Dec 2015 13:10:44 -0500 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. I'd say that's a big problem, no? The membar elimination is only safe if the allocation is actually removed. If the analysis says it's NoEscape but compiler still allocates it for whatever reason (Goetz mentioned a couple earlier in this thread), then it seems insufficient to rely on just the analysis result. On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin wrote: > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads and are > not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose plus > the additional purpose to satisfy GC requirements. EscapeAnalysis was not > designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." > with the first function containing the knowledge about all GCs. > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > Cc: Doerr, Martin ; Lindenmaier, Goetz < > goetz.lindenmaier at sap.com>; Vitaly Davidovich ; > Aleksey Shipilev ; Vladimir Kozlov < > vladimir.kozlov at oracle.com>; hotspot compiler < > hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev < > aarch64-port-dev at openjdk.java.net>; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Thu Dec 17 19:14:30 2015 From: dean.long at oracle.com (Dean Long) Date: Thu, 17 Dec 2015 11:14:30 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <5672B38E.5040901@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> <5671CC94.2080205@oracle.com> <5671F5F6.9060605@oracle.com> <56721AAE.8040106@oracle.com> <5672B38E.5040901@oracle.com> Message-ID: <56730996.4020702@oracle.com> Thanks Vladimir. dl On 12/17/2015 5:07 AM, Vladimir Ivanov wrote: >> Vladimir, are you OK with keeping the changes in method? If so, I will >> push what I have. > I'm fine with leaving oopDesc::print_* as is. > > Best regards, > Vladimir Ivanov > >> On 12/16/2015 3:38 PM, Ioi Lam wrote: >>> Currently non_oop_word is used only in the nmethod code. If this value >>> is assigned to an oop or a metadata* elsewhere we would probably see >>> massive crashes. Adding non_oop_word to oopDesc::print_*_on would >>> imply that it's OK to assign this value in a more general context, >>> which is not true. >>> >>> So I would suggest keeping knowledge of non_oop_word inside nmethod >>> for now, and we can revisit this if other places start to use >>> non_oop_word. >>> >>> Thanks >>> - Ioi >>> >>> On 12/16/15 12:41 PM, Dean Long wrote: >>>> Ping. >>>> >>>> Could runtime folks please comment on Vladimir's suggestion to have >>>> oopDesc::print_*_on and >>>> Metadata::print_*_maybe_null support Universe::non_oop_word() values >>>> without crashing, or if I should keep this change in nmethod only. >>>> >>>> thanks, >>>> >>>> dl >>>> >>>> On 12/11/2015 11:18 AM, Dean Long wrote: >>>>> [adding hotspot-runtime-dev] >>>>> >>>>> On 12/11/2015 3:49 AM, Vladimir Ivanov wrote: >>>>>> Dean, thanks for taking care of it. >>>>>> >>>>>> Can oopDesc::print_value_on and print_value_on_maybe_null be >>>>>> enhanced instead to handle non_oop_word case (in addition to NULL >>>>>> case)? >>>>>> >>>>> >>>>> I thought of that, but didn't want to add >>>>> print_value_on_maybe_null_or_non_oop :-) >>>>> >>>>> If you feel strongly about that, then I should probably get input >>>>> from runtime too, since I think they own that code. >>>>> >>>>>> Also, the following is slightly misleading since metadata pointers >>>>>> aren't oops: >>>>>> void nmethod::print_recorded_metadata() { >>>>>> + if (m == (Metadata*)Universe::non_oop_word()) { >>>>>> + tty->print("non-oop word"); >>>>>> >>>>> >>>>> Would "non-metadata word" be better? >>>>> >>>>> dl >>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> On 12/11/15 6:36 AM, Dean Long wrote: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8144852 >>>>>>> http://cr.openjdk.java.net/~dlong//8144852/webrev/ >>>>>>> >>>>>>> The fix for [1] introduced new functions >>>>>>> nmethod::print_recorded_oops >>>>>>> and nmethod::print_recorded_metadata that print all oop and >>>>>>> metadata >>>>>>> values in an nmethod. Currently NULL values are handled OK, but >>>>>>> Universe::non_oop_word values cause a crash. >>>>>>> >>>>>>> (This bug is marked confidential because it was reported against >>>>>>> one of >>>>>>> our closed ports.) >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> [1] JDK-8072008: Emit direct call instead of linkTo* for >>>>>>> recursive >>>>>>> indy/MH.invoke* calls >>>>> >>>> >>> >> From christian.thalinger at oracle.com Thu Dec 17 23:12:07 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 17 Dec 2015 13:12:07 -1000 Subject: RFR (S): 8145714: [JVMCI] SPARC broken after JDK-8134994 Message-ID: https://bugs.openjdk.java.net/browse/JDK-8145714 http://cr.openjdk.java.net/~twisti/8145714/webrev.01/ I forgot to move the SPARC part to vmStructs_jvmci.cpp. From vladimir.kozlov at oracle.com Thu Dec 17 23:15:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 17 Dec 2015 15:15:26 -0800 Subject: RFR (S): 8145714: [JVMCI] SPARC broken after JDK-8134994 In-Reply-To: References: Message-ID: <5673420E.1020605@oracle.com> Good. Vladimir On 12/17/15 3:12 PM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8145714 > http://cr.openjdk.java.net/~twisti/8145714/webrev.01/ > > I forgot to move the SPARC part to vmStructs_jvmci.cpp. > From christian.thalinger at oracle.com Fri Dec 18 00:01:00 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 17 Dec 2015 14:01:00 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> Message-ID: <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> Quick update on this one. I am trying to get the AArch64 Graal port building using your patch. I had to make smaller changes to your code and once I have it working I will push this to hs-comp. > On Dec 2, 2015, at 8:49 AM, Christian Thalinger wrote: > >> >> On Dec 2, 2015, at 2:22 AM, Andrew Haley wrote: >> >> On 02/12/15 12:09, Roland Schatz wrote: >>> On 12/02/2015 12:33 PM, Andrew Haley wrote: >>>> On 02/12/15 11:23, Roland Schatz wrote: >>>>> Perhaps we should have a series of test analogous to the >>>>> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >>>>> code installation. For that we would need a platform dependent "fake" >>>>> compiler (e.g. handwritten assembly for well-known test methods). >>>> Maybe. But if there is no way to actually exercise the code which >>>> is in HotSpot, why is it there? >>> It's an interface for compilers. You can exercise the code, you just >>> have to write a compiler ;) >> >> Sure, but I don't see why we can't have a tiny compiler in the test >> suite. > > Wow. This is getting crazy now :-) > > Anyway, let?s push what we have now and wait for the AArch64 backend to be functional. Then we can fix the CodeInstaller methods. > >> >> Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.civlin at intel.com Fri Dec 18 04:03:06 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Fri, 18 Dec 2015 04:03:06 +0000 Subject: RFR: 8145717: Use AVX3 instructions for Arrays.equals() intrinsic Message-ID: <39F83597C33E5F408096702907E6C4500F10A172@ORSMSX104.amr.corp.intel.com> We would like to contribute AVX3 patch for Arrays.equals() intrinsic. This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up): - on a random (any size) about 10% speed-up; - on a long (equal or not - randomly) about 25%; - on a long equal (it is the longest processing, since all the bytes matter) almost 40% - test time dropped from 12 sec to 7.5 sec. Contributor: Jan Civlin. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8145717 Webrev: http://cr.openjdk.java.net/~kvn/8145717/webrev/ From goetz.lindenmaier at sap.com Fri Dec 18 10:43:44 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Dec 2015 10:43:44 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> Hi Hui, > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > Thanks Andrew, Goetz and all! > > Major concern is will removing storestore barrier cause other threads read > stale data for newly allocated object. Other threads include java thread or > concurrent GC thread. It should be safe with following analysis. > > 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize > storestore barrier. > 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove > storestore. > 2.1 If there is a safe point between storestore and release, b is visible to GC > in initializer, but at safe point, it should have a memory barrier. > 2.2 If there is no safe point between storestore and release. b will be visible > to other thread after release memory barrier. I think this describes the situation correctly wrt. to my counterexample. I'm not sure whether there are other possibilities. Is the test for 1.) already implemented? How do you do this? Is inlining of the constructor delayed when you do your optimization, so you can find the call to it? Or do you find the BCEA information via the class that is reachable over the type information? How do you known then which constructor was called if there are several ones? Best regards, Goetz. > > Case #1 > A a = new A(); > safepoint // a can be reached from GC > new B(a) > > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > a.x = this; // b might visible to other threads here > .... > release > -------- init end > > BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be > treated as escaped in its initialzer, so change will not optimize storestore > barrier. > [EA] estimated escape information for B:: > non-escaping args: {} > stack-allocatable args: {1} > return non-local value > modified args: 0x6 0x6 > flags: > b="this" is not local and not arg_stack > a is arg_stack means it is passed in and not assigned to other object in > initializer. > > Case #2.1 > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > safepoint // "this" is in oop map and might visible to GC thread here > .... > release > -------- init end > > Case #2.2 > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > release > -------- init end > > Regards > Hui > > On 16 December 2015 at 00:15, Andrew Haley > wrote: > > > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > Further, if the object is NoEscape it might not be scalar > > replaced. If I remember correctly, there are various conditions, > > e.g., too big, allocated in loop. > > Well, that's the killer. The definition of "escape" we need to use > here is the really, truly, honest-to-goodness one: that this object > never becomes visible to any other thread by any means. Unless > that > is so, all bets are off. In this case, what is intended is "appears > in an OOP map". > > Andrew. > > From goetz.lindenmaier at sap.com Fri Dec 18 11:09:41 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Dec 2015 11:09:41 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> Hi > > You also had asked if the objects in my example were scalar replaced. > > By default, they do get scalar-replaced, but I had prevented this by -XX:- > > EliminateAllocations which does not influence the escape state and the > > membar optimizations. > > I'd say that's a big problem, no? The membar elimination is only safe if the > allocation is actually removed. If the analysis says it's NoEscape but compiler > still allocates it for whatever reason (Goetz mentioned a couple earlier in this > thread), then it seems insufficient to rely on just the analysis result. Well, if it's NoEscape it's safe to remove the barriers wrt. to Java semantics, no matter what other optimizations (here: scalar replacement) do. But here we look at the importance of the barrier to the runtime system, which is VM implementation specific. In particular, the new optimization addresses also objects that escape, as long as they don't escape before the barrier at the end of the constructor. Best regards, Goetz. > On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin > wrote: > > > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads > and are not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose > plus the additional purpose to satisfy GC requirements. EscapeAnalysis was > not designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc- > >does_not_escape..." with the first function containing the knowledge > about all GCs. > > You also had asked if the objects in my example were scalar replaced. > By default, they do get scalar-replaced, but I had prevented this by -XX:- > EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com > ] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > > Cc: Doerr, Martin >; Lindenmaier, Goetz > >; > Vitaly Davidovich >; > Aleksey Shipilev >; Vladimir Kozlov > >; > hotspot compiler >; aarch64-port-dev > dev at openjdk.java.net> >; Mikael Gerdin > (mikael.gerdin at oracle.com > ) > > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than > escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > > From nils.eliasson at oracle.com Fri Dec 18 11:51:53 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 18 Dec 2015 12:51:53 +0100 Subject: RFR(S): 8145566: PrintNMethods compile command broken since b89 Message-ID: <5673F359.7080608@oracle.com> Hi, Please review this patch. Fixes an issue where TypedOptionMatchers freed their Symbols twice (already decremented in super class destructor). This affects CompileCommand option after a while when the Symbols are cleaned. Bug: https://bugs.openjdk.java.net/browse/JDK-8145566 Webrev: http://cr.openjdk.java.net/~neliasso/8145566/ Regards, Nils From tobias.hartmann at oracle.com Fri Dec 18 12:30:23 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 18 Dec 2015 13:30:23 +0100 Subject: RFR(S): 8145566: PrintNMethods compile command broken since b89 In-Reply-To: <5673F359.7080608@oracle.com> References: <5673F359.7080608@oracle.com> Message-ID: <5673FC5F.9080600@oracle.com> Hi Nils, looks good to me. I verified that your patch fixes the problem. Best, Tobias On 18.12.2015 12:51, Nils Eliasson wrote: > Hi, > > Please review this patch. Fixes an issue where TypedOptionMatchers freed their Symbols twice (already decremented in super class destructor). This affects CompileCommand option after a while when the Symbols are cleaned. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145566 > Webrev: http://cr.openjdk.java.net/~neliasso/8145566/ > > Regards, > Nils From hui.shi at linaro.org Fri Dec 18 12:45:43 2015 From: hui.shi at linaro.org (Hui Shi) Date: Fri, 18 Dec 2015 20:45:43 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> Message-ID: Thanks Gotez! case 1) can be handle with current patch. BCEA information is getting from owning method when inserting release memory barrier for final field write. Final field is initialized in its owning allocation node's constructor method. Following code is in parse::do_exits, alloc->compute_MemBar_redundancy get constructor method's BCEA information and check if allocation escape in constructor method. if (method()->is_initializer() && (wrote_final() || PPC64_ONLY(wrote_volatile() ||) (AlwaysSafeConstructors && wrote_fields()))) { _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final()); + + // If Memory barrier is created for final fields write + // and allocation node does not escape the initialize method, + // then barrier introduced by allocation node can be removed. + if (DoEscapeAnalysis && alloc_with_final()) { + AllocateNode *alloc = AllocateNode::Ideal_allocation(alloc_with_final(), &_gvn); + alloc->compute_MemBar_redundancy(method()); + } Regards Hui On 18 December 2015 at 18:43, Lindenmaier, Goetz wrote: > Hi Hui, > > > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > > AllocationNode > > > > Thanks Andrew, Goetz and all! > > > > Major concern is will removing storestore barrier cause other threads > read > > stale data for newly allocated object. Other threads include java thread > or > > concurrent GC thread. It should be safe with following analysis. > > > > 1. If BCEA result "this"(b) escapes in its initializer, change will not > optimize > > storestore barrier. > > 2. If BCEA result "this"(b) does not escape in its initializer, it's > safe to remove > > storestore. > > 2.1 If there is a safe point between storestore and release, b is > visible to GC > > in initializer, but at safe point, it should have a memory barrier. > > 2.2 If there is no safe point between storestore and release. b will > be visible > > to other thread after release memory barrier. > I think this describes the situation correctly wrt. to my counterexample. > I'm > not sure whether there are other possibilities. > > Is the test for 1.) already implemented? > How do you do this? Is inlining of the constructor delayed when you do > your optimization, so you can find the call to it? Or do you find the > BCEA information > via the class that is reachable over the type information? How do you > known then > which constructor was called if there are several ones? > > Best regards, > Goetz. > > > > > > > > > Case #1 > > A a = new A(); > > safepoint // a can be reached from GC > > new B(a) > > > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > a.x = this; // b might visible to other threads here > > .... > > release > > -------- init end > > > > BCEA result indicate "this"(b) is not local and not arg_stack. So "b" > will be > > treated as escaped in its initialzer, so change will not optimize > storestore > > barrier. > > [EA] estimated escape information for B:: > > non-escaping args: {} > > stack-allocatable args: {1} > > return non-local value > > modified args: 0x6 0x6 > > flags: > > b="this" is not local and not arg_stack > > a is arg_stack means it is passed in and not assigned to other > object in > > initializer. > > > > Case #2.1 > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > safepoint // "this" is in oop map and might visible to GC thread here > > .... > > release > > -------- init end > > > > Case #2.2 > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > release > > -------- init end > > > > Regards > > Hui > > > > On 16 December 2015 at 00:15, Andrew Haley > > wrote: > > > > > > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > > > Further, if the object is NoEscape it might not be scalar > > > replaced. If I remember correctly, there are various conditions, > > > e.g., too big, allocated in loop. > > > > Well, that's the killer. The definition of "escape" we need to use > > here is the really, truly, honest-to-goodness one: that this object > > never becomes visible to any other thread by any means. Unless > > that > > is so, all bets are off. In this case, what is intended is > "appears > > in an OOP map". > > > > Andrew. > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hui.shi at linaro.org Fri Dec 18 13:10:06 2015 From: hui.shi at linaro.org (Hui Shi) Date: Fri, 18 Dec 2015 21:10:06 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: Thanks Andrew and Martin! Agree, it's better fix original storestore barrier optimization with escape information. When entering PhaseMacroExpand::expand_allocate_common, object must be allocated on heap and can't be scalar replaced? This issue can't be solved by detecting unsafe publish only in constructor, in following example, b is published outside constructor and storestore barrier still can't be removed. public void TestMethod() { A a = new A(); dont_inline_me(); //System.gc(); B b = new B(); // empty constructor // nosafe point a.b = b; } Martin proposed fix looks reasonable, disable original storestore barrier optimization if GC threads might reference allocated object. Regards Hui On 18 December 2015 at 01:58, Doerr, Martin wrote: > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads and are > not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose plus > the additional purpose to satisfy GC requirements. EscapeAnalysis was not > designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." > with the first function containing the knowledge about all GCs. > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > Cc: Doerr, Martin ; Lindenmaier, Goetz < > goetz.lindenmaier at sap.com>; Vitaly Davidovich ; > Aleksey Shipilev ; Vladimir Kozlov < > vladimir.kozlov at oracle.com>; hotspot compiler < > hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev < > aarch64-port-dev at openjdk.java.net>; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From axel.siebenborn at sap.com Fri Dec 18 13:11:12 2015 From: axel.siebenborn at sap.com (Axel Siebenborn) Date: Fri, 18 Dec 2015 14:11:12 +0100 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> References: <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> Message-ID: <567405F0.6060707@sap.com> Hi, the concern raised in this mail thread is, that an object will be known known to the GC, though it could be seen, as not fully initialized. I share the opinion, that this might be the case. However, I don't see, how it can be a problem with the concurrent threads of G1 or CMS. The G1 collector uses its snapshot at the beginning and just scans objects, that where reachable at the initial marking. Newly allocated objects are considered as life. The CMS collector doesn't scan young generation. For the exceptional cases, where an object is allocated directly in old gen, there is a store_store_membar after setting the class field to null allocating a chunk from the freelist. Concurrent threads can handle objects with null as class and won't scan them. A function "gc_requires_initialized_new_obj_headers()" would return false, for all current GC implementations, but might be useful to make both compiler and GC developers aware of a potential problem. Regards, Axel From tobias.hartmann at oracle.com Fri Dec 18 14:52:34 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 18 Dec 2015 15:52:34 +0100 Subject: [9] RFR(S): 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI Message-ID: <56741DB2.20100@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8145754 http://cr.openjdk.java.net/~thartmann/8145754/webrev.00/ PhaseIdealLoop::is_scaled_iv_plus_offset() returns false if the expression is an AddI node with the scaled iv as second input because is_scaled_iv() is only invoked for exp->in(1). We need to check exp->in(2) as well like we do for the SubI node (and also in SWPointer::scaled_iv_plus_offset()). Background: I caught this bug with my prototype fix for JDK-6675699 while investigating a regression with the SPECjvm2008 mpeg benchmark. It turned out that with my fix I was hitting the LoopUnrollLimit for loops in some hot methods and therefore the loops were not completely unrolled/removed. Compared to the baseline version, some RC predicates were not emitted and therefore range checks in the loop body were not eliminated. As a result, the loop body node count was too high for unrolling. The RC predicates were not added because IdealLoopTree::is_range_check_if() uses is_scaled_iv_plus_offset() which fails. I think the input nodes of the AddI node are swapped with my fix because the node indices are slightly different and commute() (see addnode.cpp) sorts inputs according to their index: // Otherwise, sort inputs (commutativity) to help value numbering. if( in1->_idx > in2->_idx ) { add->swap_edges(1, 2); return true; } I verified that this also happens without my fix for JDK-6675699 by adding assert(!is_scaled_iv(exp->in(2), iv, p_scale), "Found input at position 2!"); and running JPRT. As expected, we hit the assert. Thanks, Tobias From tobias.hartmann at oracle.com Fri Dec 18 15:24:21 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 18 Dec 2015 16:24:21 +0100 Subject: [9] RFR(S): 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true Message-ID: <56742525.7040600@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8144487 http://cr.openjdk.java.net/~thartmann/8144487/webrev.00/ The fix for JDK-7107042 introduced a 'skip_loop_opts' flag for PhaseIdealLoop::build_and_optimize() to not execute loop optimizations before EA. We need to restore the major_progress flag before calling igvn.optimize() because other code depends on the fact that we don't execute more loop optimizations if major_progress() is not set (for example, in ConvI2LNode::Ideal). Thanks, Tobias From aleksey.shipilev at oracle.com Fri Dec 18 16:05:23 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 18 Dec 2015 19:05:23 +0300 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe In-Reply-To: <567007FC.7090601@oracle.com> References: <567007FC.7090601@oracle.com> Message-ID: <56742EC3.9060707@oracle.com> On 12/15/2015 03:30 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 Thank you, PrintAssembly works in fastdebug with this patch. I can finally use JMH perfasm there again. Cheers, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Fri Dec 18 16:08:04 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 18 Dec 2015 19:08:04 +0300 Subject: [9] RFR (S): 8071374: Native disassembler implementation may be not thread-safe In-Reply-To: <56742EC3.9060707@oracle.com> References: <567007FC.7090601@oracle.com> <56742EC3.9060707@oracle.com> Message-ID: <56742F64.8060804@oracle.com> Thanks for the confirmation, Aleksey. Best regards, Vladimir Ivanov On 12/18/15 7:05 PM, Aleksey Shipilev wrote: > On 12/15/2015 03:30 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8071374/webrev.00 > > Thank you, PrintAssembly works in fastdebug with this patch. > I can finally use JMH perfasm there again. > > Cheers, > -Aleksey > From roland.westrelin at oracle.com Fri Dec 18 16:13:58 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 18 Dec 2015 17:13:58 +0100 Subject: [9] RFR (S): 8133612: new clone logic added in 8042235 is missing from compiler intrinsics In-Reply-To: <56719774.4020203@oracle.com> References: <56719774.4020203@oracle.com> Message-ID: <1C4254C0-90CE-411B-AF75-1A678ABBBD45@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8133612/webrev.00 That looks good to me. Roland. From vladimir.x.ivanov at oracle.com Fri Dec 18 16:14:55 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 18 Dec 2015 19:14:55 +0300 Subject: [9] RFR (S): 8133612: new clone logic added in 8042235 is missing from compiler intrinsics In-Reply-To: <1C4254C0-90CE-411B-AF75-1A678ABBBD45@oracle.com> References: <56719774.4020203@oracle.com> <1C4254C0-90CE-411B-AF75-1A678ABBBD45@oracle.com> Message-ID: <567430FF.3070808@oracle.com> Thanks, Roland! Best regards, Vladimir Ivanov On 12/18/15 7:13 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8133612/webrev.00 > > That looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Fri Dec 18 18:58:40 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Dec 2015 10:58:40 -0800 Subject: RFR(S): 8145566: PrintNMethods compile command broken since b89 In-Reply-To: <5673F359.7080608@oracle.com> References: <5673F359.7080608@oracle.com> Message-ID: <56745760.3060307@oracle.com> Good. Thanks, Vladimir On 12/18/15 3:51 AM, Nils Eliasson wrote: > Hi, > > Please review this patch. Fixes an issue where TypedOptionMatchers freed their Symbols twice (already decremented in > super class destructor). This affects CompileCommand option after a while when the Symbols are cleaned. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145566 > Webrev: http://cr.openjdk.java.net/~neliasso/8145566/ > > Regards, > Nils From vladimir.kozlov at oracle.com Fri Dec 18 19:00:38 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Dec 2015 11:00:38 -0800 Subject: [9] RFR(S): 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI In-Reply-To: <56741DB2.20100@oracle.com> References: <56741DB2.20100@oracle.com> Message-ID: <567457D6.6000500@oracle.com> Looks good. Thanks, Vladimir On 12/18/15 6:52 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8145754 > http://cr.openjdk.java.net/~thartmann/8145754/webrev.00/ > > PhaseIdealLoop::is_scaled_iv_plus_offset() returns false if the expression is an AddI node with the scaled iv as second input because is_scaled_iv() is only invoked for exp->in(1). We need to check exp->in(2) as well like we do for the SubI node (and also in SWPointer::scaled_iv_plus_offset()). > > Background: > I caught this bug with my prototype fix for JDK-6675699 while investigating a regression with the SPECjvm2008 mpeg benchmark. It turned out that with my fix I was hitting the LoopUnrollLimit for loops in some hot methods and therefore the loops were not completely unrolled/removed. Compared to the baseline version, some RC predicates were not emitted and therefore range checks in the loop body were not eliminated. As a result, the loop body node count was too high for unrolling. The RC predicates were not added because IdealLoopTree::is_range_check_if() uses is_scaled_iv_plus_offset() which fails. I think the input nodes of the AddI node are swapped with my fix because the node indices are slightly different and commute() (see addnode.cpp) sorts inputs according to their index: > // Otherwise, sort inputs (commutativity) to help value numbering. > if( in1->_idx > in2->_idx ) { > add->swap_edges(1, 2); > return true; > } > > I verified that this also happens without my fix for JDK-6675699 by adding > assert(!is_scaled_iv(exp->in(2), iv, p_scale), "Found input at position 2!"); > and running JPRT. As expected, we hit the assert. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Fri Dec 18 19:02:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Dec 2015 11:02:12 -0800 Subject: [9] RFR(S): 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true In-Reply-To: <56742525.7040600@oracle.com> References: <56742525.7040600@oracle.com> Message-ID: <56745834.50506@oracle.com> Fix is good. Thanks, Vladimir On 12/18/15 7:24 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8144487 > http://cr.openjdk.java.net/~thartmann/8144487/webrev.00/ > > The fix for JDK-7107042 introduced a 'skip_loop_opts' flag for PhaseIdealLoop::build_and_optimize() to not execute loop optimizations before EA. We need to restore the major_progress flag before calling igvn.optimize() because other code depends on the fact that we don't execute more loop optimizations if major_progress() is not set (for example, in ConvI2LNode::Ideal). > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Fri Dec 18 19:16:07 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Dec 2015 11:16:07 -0800 Subject: RFR: 8145717: Use AVX3 instructions for Arrays.equals() intrinsic In-Reply-To: <39F83597C33E5F408096702907E6C4500F10A172@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F10A172@ORSMSX104.amr.corp.intel.com> Message-ID: <56745B77.4080505@oracle.com> Looks good. I will sponsor it. Thanks, Vladimir On 12/17/15 8:03 PM, Civlin, Jan wrote: > We would like to contribute AVX3 patch for Arrays.equals() intrinsic. > > This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up): > - on a random (any size) about 10% speed-up; > - on a long (equal or not - randomly) about 25%; > - on a long equal (it is the longest processing, since all the bytes matter) almost 40% - test time dropped from 12 sec to 7.5 sec. > > > Contributor: Jan Civlin. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8145717 > Webrev: http://cr.openjdk.java.net/~kvn/8145717/webrev/ > From jan.civlin at intel.com Fri Dec 18 19:17:56 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Fri, 18 Dec 2015 19:17:56 +0000 Subject: RFR: 8145717: Use AVX3 instructions for Arrays.equals() intrinsic In-Reply-To: <56745B77.4080505@oracle.com> References: <39F83597C33E5F408096702907E6C4500F10A172@ORSMSX104.amr.corp.intel.com> <56745B77.4080505@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F10A284@ORSMSX104.amr.corp.intel.com> Thank you, Vladimir. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, December 18, 2015 11:16 AM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8145717: Use AVX3 instructions for Arrays.equals() intrinsic Looks good. I will sponsor it. Thanks, Vladimir On 12/17/15 8:03 PM, Civlin, Jan wrote: > We would like to contribute AVX3 patch for Arrays.equals() intrinsic. > > This utilizes 512 bits registers on AVX3 architecture and delivers performance gain (speed-up): > - on a random (any size) about 10% speed-up; > - on a long (equal or not - randomly) about 25%; > - on a long equal (it is the longest processing, since all the bytes matter) almost 40% - test time dropped from 12 sec to 7.5 sec. > > > Contributor: Jan Civlin. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8145717 > Webrev: http://cr.openjdk.java.net/~kvn/8145717/webrev/ > From aph at redhat.com Fri Dec 18 20:23:07 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 18 Dec 2015 20:23:07 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567405F0.6060707@sap.com> References: <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> Message-ID: <56746B2B.4040602@redhat.com> Hi, On 18/12/15 13:11, Axel Siebenborn wrote: > the concern raised in this mail thread is, that an object will be known > known to the GC, though it could be seen, as not fully initialized. I > share the opinion, that this might be the case. > However, I don't see, how it can be a problem with the concurrent > threads of G1 or CMS. > > The G1 collector uses its snapshot at the beginning and just scans > objects, that where reachable at the initial marking. Newly allocated > objects are considered as life. Right, so say the safepoint in question is the SATB. Then the newly- created objects get scanned, and we have the problem as described. Right? Or perhaps I'm missing something. Andrew. From christian.thalinger at oracle.com Sat Dec 19 01:01:51 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 18 Dec 2015 15:01:51 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> Message-ID: Hmm. My build of open AArch64 crashes either with: # SIGSEGV (0xb) at pc=0x0000ffff9c08f694, pid=18650, tid=18651 # # JRE version: OpenJDK Runtime Environment (9.0) (build 9-internal+0-2015-12-18-163208.root.hs-comp-openjdk) # Java VM: OpenJDK 64-Bit Server VM (9-internal+0-2015-12-18-163208.root.hs-comp-openjdk, mixed mode, tiered, compressed oops, g1 gc, linux-aarch64) # Problematic frame: # j java.lang.invoke.MemberName$Factory.newMemberBuffer(I)[Ljava/lang/invoke/MemberName;+18 or # SIGSEGV (0xb) at pc=0x0000ffff8c31761c, pid=18763, tid=18764 # # JRE version: OpenJDK Runtime Environment (9.0) (build 9-internal+0-2015-12-18-163208.root.hs-comp-openjdk) # Java VM: OpenJDK 64-Bit Server VM (9-internal+0-2015-12-18-163208.root.hs-comp-openjdk, mixed mode, tiered, compressed oops, g1 gc, linux-aarch64) # Problematic frame: # v ~RuntimeStub::g1_post_barrier_slow Runtime1 stub Parallel GC works. Maybe I screwed up the merge. Could some please check? > On Dec 17, 2015, at 2:01 PM, Christian Thalinger wrote: > > Quick update on this one. I am trying to get the AArch64 Graal port building using your patch. I had to make smaller changes to your code and once I have it working I will push this to hs-comp. > >> On Dec 2, 2015, at 8:49 AM, Christian Thalinger > wrote: >> >>> >>> On Dec 2, 2015, at 2:22 AM, Andrew Haley > wrote: >>> >>> On 02/12/15 12:09, Roland Schatz wrote: >>>> On 12/02/2015 12:33 PM, Andrew Haley wrote: >>>>> On 02/12/15 11:23, Roland Schatz wrote: >>>>>> Perhaps we should have a series of test analogous to the >>>>>> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >>>>>> code installation. For that we would need a platform dependent "fake" >>>>>> compiler (e.g. handwritten assembly for well-known test methods). >>>>> Maybe. But if there is no way to actually exercise the code which >>>>> is in HotSpot, why is it there? >>>> It's an interface for compilers. You can exercise the code, you just >>>> have to write a compiler ;) >>> >>> Sure, but I don't see why we can't have a tiny compiler in the test >>> suite. >> >> Wow. This is getting crazy now :-) >> >> Anyway, let?s push what we have now and wait for the AArch64 backend to be functional. Then we can fix the CodeInstaller methods. >> >>> >>> Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Sat Dec 19 03:05:22 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Sat, 19 Dec 2015 03:05:22 +0000 Subject: RFR (M): 8145688: Update for x86 pow in the math lib Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569DCDB0@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch which optimizes Math.pow() for 64 and 32 bit X86 architecture using Intel LIBM implementation. This passes all the jtreg test in hotspot and PowTests.java in jdk/tests/java/lang/Math. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8145688 webrev: http://cr.openjdk.java.net/~mcberg/8145688/webrev.02/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Sat Dec 19 20:12:30 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Sat, 19 Dec 2015 21:12:30 +0100 Subject: RFR(S): 8145566: PrintNMethods compile command broken since b89 In-Reply-To: <56745760.3060307@oracle.com> References: <5673F359.7080608@oracle.com> <56745760.3060307@oracle.com> Message-ID: <5675BA2E.6040306@oracle.com> Thanks for your reviews Tobias and Vladimir, Regards, Nils On 2015-12-18 19:58, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 12/18/15 3:51 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this patch. Fixes an issue where TypedOptionMatchers >> freed their Symbols twice (already decremented in >> super class destructor). This affects CompileCommand option after a >> while when the Symbols are cleaned. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8145566 >> Webrev: http://cr.openjdk.java.net/~neliasso/8145566/ >> >> Regards, >> Nils From nils.eliasson at oracle.com Sun Dec 20 10:23:42 2015 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Sun, 20 Dec 2015 11:23:42 +0100 Subject: RFR(XS): 8145328: SEGV in DirectivesStack::getMatchingDirective Message-ID: <567681AE.3060603@oracle.com> Hi, Please review this fix of getMatchingDirective when running with JVMCI. This also fixes JDK-8145331 that will be closed as a dupe as soon as this fix is in. Testing: All tests that failed passes now. Bug: https://bugs.openjdk.java.net/browse/JDK-8145328 Webrev: http://cr.openjdk.java.net/~neliasso/8145328/webrev.01/ Regards, Nils Eliasson From christian.thalinger at oracle.com Mon Dec 21 06:34:47 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Sun, 20 Dec 2015 20:34:47 -1000 Subject: RFR(XS): 8145328: SEGV in DirectivesStack::getMatchingDirective In-Reply-To: <567681AE.3060603@oracle.com> References: <567681AE.3060603@oracle.com> Message-ID: <0A50AABD-3C63-42BB-9B90-3EFFDCEF09DA@oracle.com> This fix makes sense. Looks good. > On Dec 20, 2015, at 12:23 AM, Nils Eliasson wrote: > > Hi, > > Please review this fix of getMatchingDirective when running with JVMCI. > > This also fixes JDK-8145331 that will be closed as a dupe as soon as this fix is in. > > Testing: > All tests that failed passes now. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145328 > Webrev: http://cr.openjdk.java.net/~neliasso/8145328/webrev.01/ > > Regards, > Nils Eliasson From christian.thalinger at oracle.com Mon Dec 21 06:42:46 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Sun, 20 Dec 2015 20:42:46 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> Message-ID: Alright, here is what I would like to integrate: http://cr.openjdk.java.net/~twisti/8143072/webrev.01/ There are not many source changes but I wanted to send a webrev anyway. You will notice that the test changes are gone. That?s because of the unfortunate situation with the open and closed AArch64 port. Before I haven?t found a way to reuse the open AArch64 JVMCI changes with our closed port I can?t enable the tests on AArch64 because we would see all of them fail. And I really want to avoid to copy the files verbatim into our closed port. I hope you are okay with that for a little while. > On Dec 18, 2015, at 3:01 PM, Christian Thalinger wrote: > > Hmm. My build of open AArch64 crashes either with: > > # SIGSEGV (0xb) at pc=0x0000ffff9c08f694, pid=18650, tid=18651 > # > # JRE version: OpenJDK Runtime Environment (9.0) (build 9-internal+0-2015-12-18-163208.root.hs-comp-openjdk) > # Java VM: OpenJDK 64-Bit Server VM (9-internal+0-2015-12-18-163208.root.hs-comp-openjdk, mixed mode, tiered, compressed oops, g1 gc, linux-aarch64) > # Problematic frame: > # j java.lang.invoke.MemberName$Factory.newMemberBuffer(I)[Ljava/lang/invoke/MemberName;+18 > > or > > # SIGSEGV (0xb) at pc=0x0000ffff8c31761c, pid=18763, tid=18764 > # > # JRE version: OpenJDK Runtime Environment (9.0) (build 9-internal+0-2015-12-18-163208.root.hs-comp-openjdk) > # Java VM: OpenJDK 64-Bit Server VM (9-internal+0-2015-12-18-163208.root.hs-comp-openjdk, mixed mode, tiered, compressed oops, g1 gc, linux-aarch64) > # Problematic frame: > # v ~RuntimeStub::g1_post_barrier_slow Runtime1 stub > > Parallel GC works. Maybe I screwed up the merge. Could some please check? > >> On Dec 17, 2015, at 2:01 PM, Christian Thalinger wrote: >> >> Quick update on this one. I am trying to get the AArch64 Graal port building using your patch. I had to make smaller changes to your code and once I have it working I will push this to hs-comp. >> >>> On Dec 2, 2015, at 8:49 AM, Christian Thalinger wrote: >>> >>>> >>>> On Dec 2, 2015, at 2:22 AM, Andrew Haley wrote: >>>> >>>> On 02/12/15 12:09, Roland Schatz wrote: >>>>> On 12/02/2015 12:33 PM, Andrew Haley wrote: >>>>>> On 02/12/15 11:23, Roland Schatz wrote: >>>>>>> Perhaps we should have a series of test analogous to the >>>>>>> test/compiler/jvmci/errors tests, but for "working" instead of "broken" >>>>>>> code installation. For that we would need a platform dependent "fake" >>>>>>> compiler (e.g. handwritten assembly for well-known test methods). >>>>>> Maybe. But if there is no way to actually exercise the code which >>>>>> is in HotSpot, why is it there? >>>>> It's an interface for compilers. You can exercise the code, you just >>>>> have to write a compiler ;) >>>> >>>> Sure, but I don't see why we can't have a tiny compiler in the test >>>> suite. >>> >>> Wow. This is getting crazy now :-) >>> >>> Anyway, let?s push what we have now and wait for the AArch64 backend to be functional. Then we can fix the CodeInstaller methods. >>> >>>> >>>> Andrew. >> > From vladimir.kozlov at oracle.com Mon Dec 21 07:32:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 20 Dec 2015 23:32:26 -0800 Subject: RFR(XS): 8145328: SEGV in DirectivesStack::getMatchingDirective In-Reply-To: <567681AE.3060603@oracle.com> References: <567681AE.3060603@oracle.com> Message-ID: <5677AB0A.80403@oracle.com> Good. Thanks, Vladimir On 12/20/15 2:23 AM, Nils Eliasson wrote: > Hi, > > Please review this fix of getMatchingDirective when running with JVMCI. > > This also fixes JDK-8145331 that will be closed as a dupe as soon as this fix is in. > > Testing: > All tests that failed passes now. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145328 > Webrev: http://cr.openjdk.java.net/~neliasso/8145328/webrev.01/ > > Regards, > Nils Eliasson From tobias.hartmann at oracle.com Mon Dec 21 09:04:34 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Dec 2015 10:04:34 +0100 Subject: [9] RFR(S): 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI In-Reply-To: <567457D6.6000500@oracle.com> References: <56741DB2.20100@oracle.com> <567457D6.6000500@oracle.com> Message-ID: <5677C0A2.9050707@oracle.com> Thanks, Vladimir! Best, Tobias On 18.12.2015 20:00, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 12/18/15 6:52 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8145754 >> http://cr.openjdk.java.net/~thartmann/8145754/webrev.00/ >> >> PhaseIdealLoop::is_scaled_iv_plus_offset() returns false if the expression is an AddI node with the scaled iv as second input because is_scaled_iv() is only invoked for exp->in(1). We need to check exp->in(2) as well like we do for the SubI node (and also in SWPointer::scaled_iv_plus_offset()). >> >> Background: >> I caught this bug with my prototype fix for JDK-6675699 while investigating a regression with the SPECjvm2008 mpeg benchmark. It turned out that with my fix I was hitting the LoopUnrollLimit for loops in some hot methods and therefore the loops were not completely unrolled/removed. Compared to the baseline version, some RC predicates were not emitted and therefore range checks in the loop body were not eliminated. As a result, the loop body node count was too high for unrolling. The RC predicates were not added because IdealLoopTree::is_range_check_if() uses is_scaled_iv_plus_offset() which fails. I think the input nodes of the AddI node are swapped with my fix because the node indices are slightly different and commute() (see addnode.cpp) sorts inputs according to their index: >> // Otherwise, sort inputs (commutativity) to help value numbering. >> if( in1->_idx > in2->_idx ) { >> add->swap_edges(1, 2); >> return true; >> } >> >> I verified that this also happens without my fix for JDK-6675699 by adding >> assert(!is_scaled_iv(exp->in(2), iv, p_scale), "Found input at position 2!"); >> and running JPRT. As expected, we hit the assert. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Mon Dec 21 09:04:41 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Dec 2015 10:04:41 +0100 Subject: [9] RFR(S): 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true In-Reply-To: <56745834.50506@oracle.com> References: <56742525.7040600@oracle.com> <56745834.50506@oracle.com> Message-ID: <5677C0A9.8060706@oracle.com> Thanks, Vladimir! Best, Tobias On 18.12.2015 20:02, Vladimir Kozlov wrote: > Fix is good. > > Thanks, > Vladimir > > On 12/18/15 7:24 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8144487 >> http://cr.openjdk.java.net/~thartmann/8144487/webrev.00/ >> >> The fix for JDK-7107042 introduced a 'skip_loop_opts' flag for PhaseIdealLoop::build_and_optimize() to not execute loop optimizations before EA. We need to restore the major_progress flag before calling igvn.optimize() because other code depends on the fact that we don't execute more loop optimizations if major_progress() is not set (for example, in ConvI2LNode::Ideal). >> >> Thanks, >> Tobias >> From axel.siebenborn at sap.com Mon Dec 21 09:10:08 2015 From: axel.siebenborn at sap.com (Axel Siebenborn) Date: Mon, 21 Dec 2015 10:10:08 +0100 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <56746B2B.4040602@redhat.com> References: <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> Message-ID: <5677C1F0.7040007@sap.com> Hi, On 18.12.2015 21:23, Andrew Haley wrote: > Hi, > > On 18/12/15 13:11, Axel Siebenborn wrote: > >> the concern raised in this mail thread is, that an object will be known >> known to the GC, though it could be seen, as not fully initialized. I >> share the opinion, that this might be the case. >> However, I don't see, how it can be a problem with the concurrent >> threads of G1 or CMS. >> >> The G1 collector uses its snapshot at the beginning and just scans >> objects, that where reachable at the initial marking. Newly allocated >> objects are considered as life. > Right, so say the safepoint in question is the SATB. Then the newly- > created objects get scanned, and we have the problem as described. > Right? Or perhaps I'm missing something. > > Andrew. Only objects, that where live at the SATB safepoint get scanned. Newly-created objects will not be scanned. Axel From aph at redhat.com Mon Dec 21 09:14:01 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Dec 2015 09:14:01 +0000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> Message-ID: <5677C2D9.5090702@redhat.com> On 21/12/15 06:42, Christian Thalinger wrote: > Alright, here is what I would like to integrate: > > http://cr.openjdk.java.net/~twisti/8143072/webrev.01/ > > There are not many source changes but I wanted to send a webrev > anyway. > > You will notice that the test changes are gone. That?s because of > the unfortunate situation with the open and closed AArch64 port. > Before I haven?t found a way to reuse the open AArch64 JVMCI changes > with our closed port I can?t enable the tests on AArch64 because we > would see all of them fail. And I really want to avoid to copy the > files verbatim into our closed port. Needs must, I guess. It's a silly situation, but we are where we are. :-) I kinda assumed that the proprietary AArch64 port had a different name internally, ARM64 or something. We are going to have to fix this somehow. Andrew. From aph at redhat.com Mon Dec 21 09:22:23 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Dec 2015 09:22:23 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5677C1F0.7040007@sap.com> References: <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> <5677C1F0.7040007@sap.com> Message-ID: <5677C4CF.6060208@redhat.com> On 21/12/15 09:10, Axel Siebenborn wrote: > > On 18.12.2015 21:23, Andrew Haley wrote: >> >> On 18/12/15 13:11, Axel Siebenborn wrote: >> >>> the concern raised in this mail thread is, that an object will be known >>> known to the GC, though it could be seen, as not fully initialized. I >>> share the opinion, that this might be the case. >>> However, I don't see, how it can be a problem with the concurrent >>> threads of G1 or CMS. >>> >>> The G1 collector uses its snapshot at the beginning and just scans >>> objects, that where reachable at the initial marking. Newly allocated >>> objects are considered as life. >> >> Right, so say the safepoint in question is the SATB. Then the newly- >> created objects get scanned, and we have the problem as described. >> Right? Or perhaps I'm missing something. > > Only objects, that where live at the SATB safepoint get scanned. > Newly-created objects will not be scanned. So why are these newly-created objects not prematurely collected? How does the GC know? Is it that the immediately reachable objects get scanned and their references pushed immediately but SATB does not follow those references? If there is a mechanism to prevent the problem as described, can you tell us how that mechanism works? Andrew. From axel.siebenborn at sap.com Mon Dec 21 10:09:24 2015 From: axel.siebenborn at sap.com (Axel Siebenborn) Date: Mon, 21 Dec 2015 11:09:24 +0100 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5677C4CF.6060208@redhat.com> References: <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> <5677C1F0.7040007@sap.com> <5677C4CF.6060208@redhat.com> Message-ID: <5677CFD4.6040709@sap.com> On 21.12.2015 10:22, Andrew Haley wrote: > On 21/12/15 09:10, Axel Siebenborn wrote: >> On 18.12.2015 21:23, Andrew Haley wrote: >>> On 18/12/15 13:11, Axel Siebenborn wrote: >>> >>>> the concern raised in this mail thread is, that an object will be known >>>> known to the GC, though it could be seen, as not fully initialized. I >>>> share the opinion, that this might be the case. >>>> However, I don't see, how it can be a problem with the concurrent >>>> threads of G1 or CMS. >>>> >>>> The G1 collector uses its snapshot at the beginning and just scans >>>> objects, that where reachable at the initial marking. Newly allocated >>>> objects are considered as life. >>> Right, so say the safepoint in question is the SATB. Then the newly- >>> created objects get scanned, and we have the problem as described. >>> Right? Or perhaps I'm missing something. >> Only objects, that where live at the SATB safepoint get scanned. >> Newly-created objects will not be scanned. > So why are these newly-created objects not prematurely collected? The newly-created objects are not part of the SATB. > > How does the GC know? Is it that the immediately reachable objects > get scanned and their references pushed immediately but SATB does not > follow those references? The GC maintains marks (_next_top_at_mark_start) for each region to indicate if objects have to be scanned. Concurrent marking threads follow all references, that are below that marks. Objects above this mark don't get scanned, and are not pushed onto a marking stack. The newly-allocated object are considered live. They will be copied if the region is to be evacuated, no matter if they are reachable or not. > > If there is a mechanism to prevent the problem as described, can you > tell us how that mechanism works? > > Andrew. > I hope, the comments above make it clearer. Axel From aph at redhat.com Mon Dec 21 10:23:40 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Dec 2015 10:23:40 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5677CFD4.6040709@sap.com> References: <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> <5677C1F0.7040007@sap.com> <5677C4CF.6060208@redhat.com> <5677CFD4.6040709@sap.com> Message-ID: <5677D32C.2030400@redhat.com> On 21/12/15 10:09, Axel Siebenborn wrote: >> > How does the GC know? Is it that the immediately reachable >> > objects get scanned and their references pushed immediately but >> > SATB does not follow those references? > > The GC maintains marks (_next_top_at_mark_start) for each region to > indicate if objects have to be scanned. Concurrent marking threads > follow all references, that are below that marks. Objects above this > mark don't get scanned, and are not pushed onto a marking stack. The > newly-allocated object are considered live. They will be copied if the > region is to be evacuated, no matter if they are reachable or not. Okay, I think I'm getting there. So, in this scenario, the older object is below the mark, so it must be scanned. Ths older object contains a reference to a newly-allocated object which is above the mark. This newly-allocated object is not yet observable by the GC thread because there has been no memory fence. So, the newly-allocated object must not be scanned by the GC thread. There is no need to scan the new object for references to other objects because those objects must have been reachable from a root at the time of the SATB or are newly-created in which case they won't be collected. If that's right it seems reasonable, but I confess it makes me very nervous to base our code generation on some particular properties of a concurrent GC. Andrew. From vladimir.kozlov at oracle.com Mon Dec 21 20:43:23 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 21 Dec 2015 12:43:23 -0800 Subject: RFR (M): 8145688: Update for x86 pow in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569DCDB0@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A569DCDB0@ORSMSX106.amr.corp.intel.com> Message-ID: <5678646B.8000307@oracle.com> Hi Vivek. Also always use {}: + if (VM_Version::supports_sse3()) StubRoutines::_dlog = generate_libmLog(); + StubRoutines::_dpow = generate_libmPow(); I see fast_log() is using movddup() sse3 instruction. Can you replace it with sse2 equivalent and have conditional code generation depending on sse3 presence instead of limiting it in stubGenerator? And please add comments to all #endif in macroAssembler_x86_libm.cpp. I forgot to ask that in previous changes. When #ifdef block is big we put comment to see what scope is that: #endif // _LP64 #endif // !_LP64 Thanks, Vladimir On 12/18/15 7:05 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch which optimizes Math.pow() for 64 and 32 bit X86 architecture using Intel LIBM > implementation. > > This passes all the jtreg test in hotspot and PowTests.java in jdk/tests/java/lang/Math. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8145688 > webrev: > > http://cr.openjdk.java.net/~mcberg/8145688/webrev.02/ > > Thanks and regards, > > Vivek > From christian.thalinger at oracle.com Tue Dec 22 00:49:16 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 21 Dec 2015 14:49:16 -1000 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: <5677C2D9.5090702@redhat.com> References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> <5677C2D9.5090702@redhat.com> Message-ID: > On Dec 20, 2015, at 11:14 PM, Andrew Haley wrote: > > On 21/12/15 06:42, Christian Thalinger wrote: >> Alright, here is what I would like to integrate: >> >> http://cr.openjdk.java.net/~twisti/8143072/webrev.01/ >> >> There are not many source changes but I wanted to send a webrev >> anyway. >> >> You will notice that the test changes are gone. That?s because of >> the unfortunate situation with the open and closed AArch64 port. >> Before I haven?t found a way to reuse the open AArch64 JVMCI changes >> with our closed port I can?t enable the tests on AArch64 because we >> would see all of them fail. And I really want to avoid to copy the >> files verbatim into our closed port. > > Needs must, I guess. It's a silly situation, but we are where we are. > :-) > > I kinda assumed that the proprietary AArch64 port had a different name > internally, ARM64 or something. We are going to have to fix this > somehow. It does have a different name in the build system but not in jtreg. Anyway, I think I found a solution we can use for now. I sorted out the build issues and since all existing tests (except the ones from 8144704 which do not run on AArch64, yet) only need the Java part of JVMCI we are good. Also, I noticed you pulled the CodeInstaller changes in your latest webrev. Here is my webrev: http://cr.openjdk.java.net/~twisti/8143072/webrev.02/ While doing all these changes I realized it would be better to move the _features/_cpuFeatures fields up into the parent class Abstract_VM_Version: http://cr.openjdk.java.net/~twisti/8143072/webrev.02/src/share/vm/runtime/vm_version.hpp.udiff.html and use that on all platforms. Internally we discussed if the name should contain ?cpu? and we ended up deciding to not do it but instead file an Enhancement to move the CPU features logic into a separate class: https://bugs.openjdk.java.net/browse/JDK-8145956 A JPRT job is running right now to verify I didn?t break anything. > > Andrew. > > From doug.simon at oracle.com Tue Dec 22 14:50:42 2015 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 22 Dec 2015 15:50:42 +0100 Subject: RFR: 8146001: Remove support for command line options from JVMCI Message-ID: The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. This change removes the JVMCI command line option support. https://bugs.openjdk.java.net/browse/JDK-8146001 http://cr.openjdk.java.net/~dnsimon/8146001/ -Doug From vivek.r.deshpande at intel.com Wed Dec 23 00:55:53 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 23 Dec 2015 00:55:53 +0000 Subject: RFR (M): 8145688: Update for x86 pow in the math lib In-Reply-To: <5678646B.8000307@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A569DCDB0@ORSMSX106.amr.corp.intel.com> <5678646B.8000307@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569DFEAD@ORSMSX106.amr.corp.intel.com> Hi Vladimir Please find the updated webrev for pow at this location for your review. http://cr.openjdk.java.net/~vdeshpande/libm_pow/8145688/webrev.00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8145688 Thank you. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, December 21, 2015 12:43 PM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya Subject: Re: RFR (M): 8145688: Update for x86 pow in the math lib Hi Vivek. Also always use {}: + if (VM_Version::supports_sse3()) StubRoutines::_dlog = generate_libmLog(); + StubRoutines::_dpow = generate_libmPow(); I see fast_log() is using movddup() sse3 instruction. Can you replace it with sse2 equivalent and have conditional code generation depending on sse3 presence instead of limiting it in stubGenerator? And please add comments to all #endif in macroAssembler_x86_libm.cpp. I forgot to ask that in previous changes. When #ifdef block is big we put comment to see what scope is that: #endif // _LP64 #endif // !_LP64 Thanks, Vladimir On 12/18/15 7:05 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch which optimizes Math.pow() for 64 and 32 bit X86 architecture using Intel LIBM > implementation. > > This passes all the jtreg test in hotspot and PowTests.java in jdk/tests/java/lang/Math. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8145688 > webrev: > > http://cr.openjdk.java.net/~mcberg/8145688/webrev.02/ > > Thanks and regards, > > Vivek > From vivek.r.deshpande at intel.com Wed Dec 23 01:41:54 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 23 Dec 2015 01:41:54 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <566234C6.8010806@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> HI All I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review. http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev.00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 Thank you. Regards, Vivek -----Original Message----- From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] Sent: Friday, December 04, 2015 4:50 PM To: Deshpande, Vivek R; Vladimir Kozlov Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Hi Vivek, On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: > Hi > > Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. > Let me know your thoughts. As a rough test of another sin/cos implementation, StrictMath.{sin, cos} can be used a reference with the following caveat: there isn't an indication of which why the error is in a StrictMath result. Let me given an example, if StrictMath.sin(x) => y then one of the following should be true Math.sin(x) => y Math.sin(x) => Math.nextUp(y) Math.sin(x) => Math.nextDown(y) That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR equal to one of the floating-point numbers adjacent to that result. Of these three options, only two area allowed by the accuracy requirements of the StrictMath.sin specification. However, since StrictMath.sin doesn't give an indication of which way its error went (if it rounded up or down), there is no indication without additional work which of nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). HTH, -Joe > > Regards, > Vivek > > -----Original Message----- > From: joe darcy [mailto:joe.darcy at oracle.com] > Sent: Thursday, December 03, 2015 1:29 PM > To: Vladimir Kozlov; Deshpande, Vivek R > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > Hello, > > On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >> Vivek, >> >> I think Joe is asking you to write these tests as hotspot regression >> test in hotspot/test/compiler. > Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. > > Thanks, > > -Joe > >> Vladimir >> >> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>> Hi Joe >>> >>> It would be great if you would please share the additional tests >>> with us. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Thursday, December 03, 2015 1:17 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> I think it is unwise for this large of an implementation change to >>> be pushed with no tests targeting the specifics of the new implementation. >>> >>> The worst-case tests in the jdk repo are the mathematical worst >>> cases for floating-point approximations, in other words the cases >>> were the exact mathematical answer is closes to half-way between two >>> representation floating-point numbers. Passing such tests is >>> necessary but not sufficient condition for a new implementation. >>> >>> Chers, >>> >>> -Joe >>> >>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>> Okay, looks reasonable to me. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>> Hi Vladimir >>>>> >>>>> This is the link for the updated webrev with latest hotspot source >>>>> as base for your review. >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: Deshpande, Vivek R >>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>> To: 'Vladimir Kozlov'; joe darcy >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Hi Vladimir >>>>> >>>>> This is the link for the updated webrev for your review. >>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>> To: Deshpande, Vivek R; joe darcy >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> Please send link to new webrev on cr server. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> Please find the webrev with your suggested updates attached with >>>>>> the mail. >>>>>> We will update it in the jbs entry soon. >>>>>> Please let me know if it needs further changes. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: Deshpande, Vivek R >>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> HI Vladimir, Joe >>>>>> >>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>> have mentioned. It passed those tests. >>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>> >>>>>> Could I get those tests around the boundary values. Would >>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>> If yes, then it has passed those boundary cases. >>>>>> >>>>>> I would work on adding either diagnostic flag or just one flag >>>>>> for libm and send out the webrev soon. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Hello, >>>>>> >>>>>> Just getting added to the thread.. >>>>>> >>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>> Thank you, for explanation, Vivek. >>>>>>> >>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>> Hotspot tests. >>>>>>> >>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>> switch between FDLIBM and LIBM. >>>>>>>> >>>>>>>> Quick explanation: >>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>> should be = 0.19457293629570216 >>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>> 0.1945729362957022 >>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>> library result is between the above two values and Exact result >>>>>>>> would be pretty close to it. >>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>> StrictMath >>>>>>>> - 1ulp, according to our test. >>>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>>> I >>>>>>> think) and it should be consistent for Interpreter and code >>>>>>> generated by JIT compilers: >>>>>>> >>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin >>>>>>> % >>>>>>> 28 >>>>>>> do >>>>>>> u >>>>>>> ble%29 >>>>>>> >>>>>> That interpretation of the spec is not quite right. For the Math >>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>> closest to the exact result must be returned. For the methods >>>>>> with a >>>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>>> the true result can be returned, subject to the monotonicity >>>>>> constraints of the specification of the particular method. >>>>>> >>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>> I was thinking about using existing >>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>>> versions of functions which accept intrinsic ID instead of >>>>>>> methodHandle. >>>>>>> >>>>>>> If you still want to use flags make them diagnostic. >>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>> >>>>>>>> Also the performance gain ~4x is with >>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>> LIBM code and compilers use FDLIB? >>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>> >>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>> boost to the StrictMath methods that have been ported. >>>>>> >>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>> testing. >>>>>> For example, part of patch says >>>>>> >>>>>> # For sin >>>>>> >>>>>> +// This means that the main path is actually only taken for >>>>>> +// 2^-252 <= |X| < 90112. >>>>>> >>>>>> # For cos >>>>>> >>>>>> +// This means that the main path is actually only taken for >>>>>> +// 2^-252 <= |X| < 90112. >>>>>> >>>>>> If nothing else, there are no tests at around those boundary >>>>>> values, which is unacceptable. There should also be some tests of >>>>>> values of interest to the algorithm in question. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>> questions and give more data if needed. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>> log() changes did not have flags. >>>>>>>>> >>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>> >>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>> Hi Vivek, >>>>>>>> >>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>> file bugs and fixed them after FC. >>>>>>>> >>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>> the only thing holding it from push. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>> Hi all >>>>>>>>>> >>>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>>> and >>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>> implementation. >>>>>>>>>> >>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>> >>>>>>>>>> The option to use the optimizations are >>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> >>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>> >>>>>>>>>> Bug-id: >>>>>>>>>> >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>> webrev: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>> >>>>>>>>>> Thanks and regards, >>>>>>>>>> >>>>>>>>>> Vivek >>>>>>>>>> From vladimir.kozlov at oracle.com Wed Dec 23 09:25:09 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 01:25:09 -0800 Subject: RFR: 143072: [JVMCI] Port JVMCI to AArch64 In-Reply-To: References: <564A1917.3020802@redhat.com> <564A39E6.60808@redhat.com> <2B0C94AB-C9F1-4D5E-A5B4-F3B000F58B06@oracle.com> <565C6888.2040803@redhat.com> <565C9196.4060702@redhat.com> <565D9839.50705@oracle.com> <565DB160.7000505@redhat.com> <565DDFC1.7020006@redhat.com> <565ED4B9.3020003@oracle.com> <565ED70A.9060509@redhat.com> <565EDF64.7080504@oracle.com> <565EE277.7030907@redhat.com> <97A31572-DD8D-45E0-AAF5-E47B251CE633@oracle.com> <20D7BD6F-0413-4C18-A6B7-3FAC85FBAF93@oracle.com> <5677C2D9.5090702@redhat.com> Message-ID: <567A6875.1080008@oracle.com> Looks good to me. Changes are done according to discussion. Thanks, Vladimir On 12/21/15 4:49 PM, Christian Thalinger wrote: > >> On Dec 20, 2015, at 11:14 PM, Andrew Haley wrote: >> >> On 21/12/15 06:42, Christian Thalinger wrote: >>> Alright, here is what I would like to integrate: >>> >>> http://cr.openjdk.java.net/~twisti/8143072/webrev.01/ >>> >>> There are not many source changes but I wanted to send a webrev >>> anyway. >>> >>> You will notice that the test changes are gone. That?s because of >>> the unfortunate situation with the open and closed AArch64 port. >>> Before I haven?t found a way to reuse the open AArch64 JVMCI changes >>> with our closed port I can?t enable the tests on AArch64 because we >>> would see all of them fail. And I really want to avoid to copy the >>> files verbatim into our closed port. >> >> Needs must, I guess. It's a silly situation, but we are where we are. >> :-) >> >> I kinda assumed that the proprietary AArch64 port had a different name >> internally, ARM64 or something. We are going to have to fix this >> somehow. > > It does have a different name in the build system but not in jtreg. > > Anyway, I think I found a solution we can use for now. I sorted out the build issues and since all existing tests (except the ones from 8144704 which do not run on AArch64, yet) only need the Java part of JVMCI we are good. Also, I noticed you pulled the CodeInstaller changes in your latest webrev. > > Here is my webrev: > > http://cr.openjdk.java.net/~twisti/8143072/webrev.02/ > > While doing all these changes I realized it would be better to move the _features/_cpuFeatures fields up into the parent class Abstract_VM_Version: > > http://cr.openjdk.java.net/~twisti/8143072/webrev.02/src/share/vm/runtime/vm_version.hpp.udiff.html > > and use that on all platforms. Internally we discussed if the name should contain ?cpu? and we ended up deciding to not do it but instead file an Enhancement to move the CPU features logic into a separate class: > > https://bugs.openjdk.java.net/browse/JDK-8145956 > > A JPRT job is running right now to verify I didn?t break anything. > >> >> Andrew. >> >> > From martin.doerr at sap.com Wed Dec 23 15:42:47 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 23 Dec 2015 15:42:47 +0000 Subject: RFR(M): 8145913: PPC64: add Montgomery multiply intrinsic Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672289E1A@DEWDFEMB19C.global.corp.sap> Hi, I've ported the Montgomery multiplication from x86. The webrev is here: http://cr.openjdk.java.net/~mdoerr/8145913_ppc_montgomery/webrev.00/ It only touches PPC64 files. It also contains some early feedback from G?tz and some additional PPC64 cleanup. Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hui.shi at linaro.org Wed Dec 23 16:01:27 2015 From: hui.shi at linaro.org (Hui Shi) Date: Thu, 24 Dec 2015 00:01:27 +0800 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5677D32C.2030400@redhat.com> References: <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> <5677C1F0.7040007@sap.com> <5677C4CF.6060208@redhat.com> <5677CFD4.6040709@sap.com> <5677D32C.2030400@redhat.com> Message-ID: Thanks all! Axel has explained how concurrent GC thread will not scan newly created object for both CMS and G1. So original optimziation skips generating storestore memory barrier for none escape allocation is safe. It might need more explanation here. Back to discussion about patch for this thread, Gotez's concern about newly allocated object referenced by object visible to concurrent GC thread is explained. I'm not sure if there is other correctness concerns about patch for this thread. Could I go forward and ask for sponsor help on JTPR test and push? Regards Hui On 21 December 2015 at 18:23, Andrew Haley wrote: > On 21/12/15 10:09, Axel Siebenborn wrote: > > >> > How does the GC know? Is it that the immediately reachable > >> > objects get scanned and their references pushed immediately but > >> > SATB does not follow those references? > > > > The GC maintains marks (_next_top_at_mark_start) for each region to > > indicate if objects have to be scanned. Concurrent marking threads > > follow all references, that are below that marks. Objects above this > > mark don't get scanned, and are not pushed onto a marking stack. The > > newly-allocated object are considered live. They will be copied if the > > region is to be evacuated, no matter if they are reachable or not. > > Okay, I think I'm getting there. > > So, in this scenario, the older object is below the mark, so it must > be scanned. Ths older object contains a reference to a > newly-allocated object which is above the mark. This newly-allocated > object is not yet observable by the GC thread because there has been > no memory fence. So, the newly-allocated object must not be scanned > by the GC thread. There is no need to scan the new object for > references to other objects because those objects must have been > reachable from a root at the time of the SATB or are newly-created in > which case they won't be collected. > > If that's right it seems reasonable, but I confess it makes me very > nervous to base our code generation on some particular properties of a > concurrent GC. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Wed Dec 23 17:42:36 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 23 Dec 2015 17:42:36 +0000 Subject: RFR(M): 8145913: PPC64: add Montgomery multiply intrinsic In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672289E1A@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB4181165672289E1A@DEWDFEMB19C.global.corp.sap> Message-ID: <567ADD0C.6090807@redhat.com> Hi, On 23/12/15 15:42, Doerr, Martin wrote: > I've ported the Montgomery multiplication from x86. > > The webrev is here: > http://cr.openjdk.java.net/~mdoerr/8145913_ppc_montgomery/webrev.00/ > > It only touches PPC64 files. It also contains some early feedback from G?tz and some additional PPC64 cleanup. > > Please review. Looks good. This needs work: +// The threshold at which squaring is advantageous was determined +// experimentally on an i7-3930K (Ivy Bridge) CPU @ 3.5GHz. +#define MONTGOMERY_SQUARING_THRESHOLD 64 I'm sure it won't take long to find an appropriate threshold for the CPU you most care about. Hey, 64 might be best for you too, but at least you get to insert the name of a PowerPC in that comment. Andrew. From aph at redhat.com Wed Dec 23 17:55:44 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 23 Dec 2015 17:55:44 +0000 Subject: RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> <567405F0.6060707@sap.com> <56746B2B.4040602@redhat.com> <5677C1F0.7040007@sap.com> <5677C4CF.6060208@redhat.com> <5677CFD4.6040709@sap.com> <5677D32C.2030400@redhat.com> Message-ID: <567AE020.8050902@redhat.com> On 23/12/15 16:01, Hui Shi wrote: > Axel has explained how concurrent GC thread will not scan newly created > object for both CMS and G1. So original optimziation skips generating > storestore memory barrier for none escape allocation is safe. It might need > more explanation here. > > Back to discussion about patch for this thread, Gotez's concern about newly > allocated object referenced by object visible to concurrent GC thread is > explained. I'm not sure if there is other correctness concerns about patch > for this thread. Could I go forward and ask for sponsor help on JTPR test > and push? I think so. I'm a little nervous, but it is a performance improvement. We're going to need some appropriate comments. Andrew. From igor.ignatyev at oracle.com Wed Dec 23 20:20:45 2015 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Dec 2015 23:20:45 +0300 Subject: RFR(XS) : 8146129 : quarantine compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Message-ID: <6F0B3CB2-E600-457E-9140-6F7F7334D146@oracle.com> http://cr.openjdk.java.net/~iignatyev/8146129/webrev.01/ > 1 line changed: 1 ins; 0 del; 0 mod; Hi all, Could you please review the patch which quarantines compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java? TestAESIntrinsicsOnSupportedConfig is a newly added test which timeouts, so it should be quarantined to reduce noise level. JBS: https://bugs.openjdk.java.net/browse/JDK-8146129 Thanks, ? Igor From christian.thalinger at oracle.com Wed Dec 23 21:00:56 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 23 Dec 2015 11:00:56 -1000 Subject: RFR (XS): 8146100: compiler/jvmci/code/SimpleCodeInstallationTest.java JUnit Failure: expected:<12> but was:<109710641> Message-ID: https://bugs.openjdk.java.net/browse/JDK-8146100 Windows uses different argument registers so we have to get them from the RegisterConfig. Tested with a JPRT job that ran all JVMCI tests. diff -r 46122d93612d test/compiler/jvmci/code/amd64/AMD64TestAssembler.java --- a/test/compiler/jvmci/code/amd64/AMD64TestAssembler.java Mon Dec 21 22:17:23 2015 +0100 +++ b/test/compiler/jvmci/code/amd64/AMD64TestAssembler.java Tue Dec 22 23:32:01 2015 -0800 @@ -34,8 +34,10 @@ import jdk.vm.ci.code.DebugInfo; import jdk.vm.ci.code.InfopointReason; import jdk.vm.ci.code.Register; import jdk.vm.ci.code.StackSlot; +import jdk.vm.ci.code.CallingConvention.Type; import jdk.vm.ci.hotspot.HotSpotConstant; import jdk.vm.ci.meta.JavaConstant; +import jdk.vm.ci.meta.JavaKind; import jdk.vm.ci.meta.LIRKind; import jdk.vm.ci.meta.VMConstant; @@ -61,11 +63,11 @@ public class AMD64TestAssembler extends } public Register emitIntArg0() { - return AMD64.rsi; + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCall, JavaKind.Int)[0]; } public Register emitIntArg1() { - return AMD64.rdx; + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCall, JavaKind.Int)[1]; } private void emitREX(boolean w, int r, int x, int b) { diff -r 46122d93612d test/compiler/jvmci/code/sparc/SPARCTestAssembler.java --- a/test/compiler/jvmci/code/sparc/SPARCTestAssembler.java Mon Dec 21 22:17:23 2015 +0100 +++ b/test/compiler/jvmci/code/sparc/SPARCTestAssembler.java Tue Dec 22 23:32:01 2015 -0800 @@ -32,8 +32,10 @@ import jdk.vm.ci.code.DebugInfo; import jdk.vm.ci.code.InfopointReason; import jdk.vm.ci.code.Register; import jdk.vm.ci.code.StackSlot; +import jdk.vm.ci.code.CallingConvention.Type; import jdk.vm.ci.hotspot.HotSpotConstant; import jdk.vm.ci.meta.JavaConstant; +import jdk.vm.ci.meta.JavaKind; import jdk.vm.ci.meta.LIRKind; import jdk.vm.ci.meta.VMConstant; import jdk.vm.ci.sparc.SPARC; @@ -80,11 +82,11 @@ public class SPARCTestAssembler extends } public Register emitIntArg0() { - return SPARC.i0; + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCallee, JavaKind.Int)[0]; } public Register emitIntArg1() { - return SPARC.i1; + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCallee, JavaKind.Int)[1]; } public Register emitLoadInt(int c) { -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Dec 23 21:01:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 13:01:04 -0800 Subject: [9] RFR (XS) 8146119: java/lang/Math/PowTests.java fails on solaris-x64 using -Xcomp Message-ID: <567B0B90.7000004@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8146119 http://cr.openjdk.java.net/~kvn/8146119/webrev/ New SunStudio C++ compiler generates incorrect code in library_call.cpp. All build versions are affected. It is also failed with -xO0 level so I removed any optimizations. Tested with failed test. Thanks, Vladimir From vladimir.kozlov at oracle.com Wed Dec 23 21:02:16 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 13:02:16 -0800 Subject: RFR (XS): 8146100: compiler/jvmci/code/SimpleCodeInstallationTest.java JUnit Failure: expected:<12> but was:<109710641> In-Reply-To: References: Message-ID: <567B0BD8.3010300@oracle.com> Looks good. Thanks, Vladimir On 12/23/15 1:00 PM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8146100 > > Windows uses different argument registers so we have to get them from the RegisterConfig. Tested with a JPRT job that > ran all JVMCI tests. > > diff -r 46122d93612d test/compiler/jvmci/code/amd64/AMD64TestAssembler.java > --- a/test/compiler/jvmci/code/amd64/AMD64TestAssembler.javaMon Dec 21 22:17:23 2015 +0100 > +++ b/test/compiler/jvmci/code/amd64/AMD64TestAssembler.javaTue Dec 22 23:32:01 2015 -0800 > @@ -34,8 +34,10 @@ import jdk.vm.ci.code.DebugInfo; > import jdk.vm.ci.code.InfopointReason; > import jdk.vm.ci.code.Register; > import jdk.vm.ci.code.StackSlot; > +import jdk.vm.ci.code.CallingConvention.Type; > import jdk.vm.ci.hotspot.HotSpotConstant; > import jdk.vm.ci.meta.JavaConstant; > +import jdk.vm.ci.meta.JavaKind; > import jdk.vm.ci.meta.LIRKind; > import jdk.vm.ci.meta.VMConstant; > > > @@ -61,11 +63,11 @@ public class AMD64TestAssembler extends > } > > > public Register emitIntArg0() { > - return AMD64.rsi; > + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCall, JavaKind.Int)[0]; > } > > > public Register emitIntArg1() { > - return AMD64.rdx; > + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCall, JavaKind.Int)[1]; > } > > > private void emitREX(boolean w, int r, int x, int b) { > diff -r 46122d93612d test/compiler/jvmci/code/sparc/SPARCTestAssembler.java > --- a/test/compiler/jvmci/code/sparc/SPARCTestAssembler.javaMon Dec 21 22:17:23 2015 +0100 > +++ b/test/compiler/jvmci/code/sparc/SPARCTestAssembler.javaTue Dec 22 23:32:01 2015 -0800 > @@ -32,8 +32,10 @@ import jdk.vm.ci.code.DebugInfo; > import jdk.vm.ci.code.InfopointReason; > import jdk.vm.ci.code.Register; > import jdk.vm.ci.code.StackSlot; > +import jdk.vm.ci.code.CallingConvention.Type; > import jdk.vm.ci.hotspot.HotSpotConstant; > import jdk.vm.ci.meta.JavaConstant; > +import jdk.vm.ci.meta.JavaKind; > import jdk.vm.ci.meta.LIRKind; > import jdk.vm.ci.meta.VMConstant; > import jdk.vm.ci.sparc.SPARC; > @@ -80,11 +82,11 @@ public class SPARCTestAssembler extends > } > > > public Register emitIntArg0() { > - return SPARC.i0; > + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCallee, JavaKind.Int)[0]; > } > > > public Register emitIntArg1() { > - return SPARC.i1; > + return codeCache.getRegisterConfig().getCallingConventionRegisters(Type.JavaCallee, JavaKind.Int)[1]; > } > > > public Register emitLoadInt(int c) { > From christian.thalinger at oracle.com Wed Dec 23 21:02:23 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 23 Dec 2015 11:02:23 -1000 Subject: RFR(XS) : 8146129 : quarantine compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java In-Reply-To: <6F0B3CB2-E600-457E-9140-6F7F7334D146@oracle.com> References: <6F0B3CB2-E600-457E-9140-6F7F7334D146@oracle.com> Message-ID: <23A93A6B-7ED3-42DD-B305-3E664307A615@oracle.com> Yes, please! Looks good. > On Dec 23, 2015, at 10:20 AM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev/8146129/webrev.01/ >> 1 line changed: 1 ins; 0 del; 0 mod; > > Hi all, > > Could you please review the patch which quarantines compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java? > TestAESIntrinsicsOnSupportedConfig is a newly added test which timeouts, so it should be quarantined to reduce noise level. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8146129 > > Thanks, > ? Igor From christian.thalinger at oracle.com Wed Dec 23 21:05:30 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 23 Dec 2015 11:05:30 -1000 Subject: [9] RFR (XS) 8146119: java/lang/Math/PowTests.java fails on solaris-x64 using -Xcomp In-Reply-To: <567B0B90.7000004@oracle.com> References: <567B0B90.7000004@oracle.com> Message-ID: Unfortunate but looks good. > On Dec 23, 2015, at 11:01 AM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8146119 > > http://cr.openjdk.java.net/~kvn/8146119/webrev/ > > New SunStudio C++ compiler generates incorrect code in library_call.cpp. All build versions are affected. > It is also failed with -xO0 level so I removed any optimizations. > > Tested with failed test. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Wed Dec 23 21:06:30 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 13:06:30 -0800 Subject: [9] RFR (XS) 8146119: java/lang/Math/PowTests.java fails on solaris-x64 using -Xcomp In-Reply-To: References: <567B0B90.7000004@oracle.com> Message-ID: <567B0CD6.2070107@oracle.com> Thanks! On 12/23/15 1:05 PM, Christian Thalinger wrote: > Unfortunate but looks good. > >> On Dec 23, 2015, at 11:01 AM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8146119 >> >> http://cr.openjdk.java.net/~kvn/8146119/webrev/ >> >> New SunStudio C++ compiler generates incorrect code in library_call.cpp. All build versions are affected. >> It is also failed with -xO0 level so I removed any optimizations. >> >> Tested with failed test. >> >> Thanks, >> Vladimir > From rednaxelafx at gmail.com Wed Dec 23 23:07:17 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 15:07:17 -0800 Subject: IGVN worklist ordering Message-ID: Hi compiler team, I?d like to ask about the IGVN worklist ordering: 1. Is there a rule of thumb of which nodes should call record_for_igvn()? If so, in what order (e.g. data nodes vs. their control dependence)? Apparently, nodes that get a record_for_igvn() call right after their creation usually wants to delay the call to transform(), perhaps due to potential optimizations to their control input, or because they're created after parsing and might affect other nodes. But what would be a good list of guidelines about exactly what kind of nodes, or what kind of patterns in the IR, that should be considered as candidates to put onto the IGVN worklist, and vice versa? 2. Since it?s problematic for IGVN to process a node whose control path is already dead (but not yet collapsed to TOP), why isn?t there a mechanism built-in to IGVN?s worklist or PhaseIterGVN::transform_old() so that dead control paths always collapses before IGVN decides to process a node? I?ve had a doubt on this topic for quite a while now, so I?m seeking advice from all of you who have had to deal with bugs in this area. There have been a lot of bugs related to the IGVN worklist ordering in the past, the most of the fixes does either: a) add missing record_for_igvn() calls for problematic nodes b) switch the order of adjacent record_for_igvn() calls on related nodes Is it possible or a good idea to fix it down at the core of IGVN? Happy holidays, guys! Best regards, Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Dec 24 00:56:01 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 19:56:01 -0500 Subject: Optimization question Message-ID: Hi guys, Consider code like this: static double mean(double[] array, double[] weights) { if (array.length != weights.length) throw ...; double sum = 0; double wsum = 0; for(int i = 0; i < array.length; i++) { sum += array[i] * weights[i]; wsum += weights[i]; } return sum / wsum; } static double mean(double[] array) { return mean(array, allOnes(array.length)); } static double[] allOnes(int n) { double[] d = new double[n]; Arrays.fill(d, 1); return d; } Now suppose I call mean(double[]) overload like this: double[] d = {1,2,3,4}; Using 8u51 with C2 compiler: 1) it looks like the array allocation from allOnes isn't eliminated. 2) moreover it looked like array was zeroed (rep stosd with rax holding zero). Unless I misread the asm, I thought an allocation followed by Arrays.fill skips the zeroing? 3) ideally, this case would reduce to code that just does a plain unweighted mean with no multiplication by the weight and no summation for the weighted sum (weight sum is just array length). Is this simply too much analysis to ask for? Thanks -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Dec 24 01:33:06 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 17:33:06 -0800 Subject: Optimization question In-Reply-To: References: Message-ID: Hi Vitaly, For 2), there's this bug [1] which isn't really fixed, but rather, the "skip zeroing of an array fully covered by a Arrays.fill()" optimization was turned off for the moment. For 1), if all methods are inlined as expected, I believe it's because of HotSpot C2's order of optimizations: escape analysis / scalar replacement happens before loop optimizations. The latter can fully unroll your loop in mean(double[] array, double[] weights) if the input array is a constant like the one you tested, but it's too late -- the former cannot scalar replace an array if there are stores to it with unknown indices. See [2] for more details. - Kris [1]: https://bugs.openjdk.java.net/browse/JDK-7196857 [2]: ConnectionGraph::adjust_scalar_replaceable_state() in opto/escape.cpp // 3. An object is not scalar replaceable if it has a field with unknown // offset (array's element is accessed in loop). On Wed, Dec 23, 2015 at 4:56 PM, Vitaly Davidovich wrote: > Hi guys, > > Consider code like this: > > static double mean(double[] array, double[] weights) { > if (array.length != weights.length) throw ...; > double sum = 0; > double wsum = 0; > for(int i = 0; i < array.length; i++) { > sum += array[i] * weights[i]; > wsum += weights[i]; > } > return sum / wsum; > } > > static double mean(double[] array) { > return mean(array, allOnes(array.length)); > } > > static double[] allOnes(int n) { > double[] d = new double[n]; > Arrays.fill(d, 1); > return d; > } > > Now suppose I call mean(double[]) overload like this: > > double[] d = {1,2,3,4}; > > Using 8u51 with C2 compiler: > > 1) it looks like the array allocation from allOnes isn't eliminated. > 2) moreover it looked like array was zeroed (rep stosd with rax holding > zero). Unless I misread the asm, I thought an allocation followed by > Arrays.fill skips the zeroing? > 3) ideally, this case would reduce to code that just does a plain > unweighted mean with no multiplication by the weight and no summation for the > weighted sum (weight sum is just array length). Is this simply too much > analysis to ask for? > > Thanks > > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Dec 24 01:55:01 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 20:55:01 -0500 Subject: Optimization question In-Reply-To: References: Message-ID: Hi Kris, Thanks for the reply. Some comments below. On Wednesday, December 23, 2015, Krystal Mok wrote: > Hi Vitaly, > > For 2), there's this bug [1] which isn't really fixed, but rather, the > "skip zeroing of an array fully covered by a Arrays.fill()" optimization > was turned off for the moment. > That JBS talks about ints but I take it the problem is more general and the opto was disabled entirely? In the case I showed above, the fill is direct successor of the allocation with no uses of the zero values, but of course if the optimization is disabled entirely that would explain it. > > For 1), if all methods are inlined as expected, I believe it's because of > HotSpot C2's order of optimizations: escape analysis / scalar replacement > happens before loop optimizations. The latter can fully unroll your loop in > mean(double[] array, double[] weights) if the input array is a constant > like the one you tested, but it's too late -- the former cannot scalar > replace an array if there are stores to it with unknown indices. See [2] > for more details. > Yes everything is inlined here. So what stores are we talking about here? The weights array is filled, yes, but only read from in mean(). In fact, both arrays are read only. Also, in this particular example the array indices are "known" given the actual input array is constant. Thanks and happy holidays! > > - Kris > > [1]: https://bugs.openjdk.java.net/browse/JDK-7196857 > [2]: ConnectionGraph::adjust_scalar_replaceable_state() in opto/escape.cpp > // 3. An object is not scalar replaceable if it has a field with > unknown > // offset (array's element is accessed in loop). > > On Wed, Dec 23, 2015 at 4:56 PM, Vitaly Davidovich > wrote: > >> Hi guys, >> >> Consider code like this: >> >> static double mean(double[] array, double[] weights) { >> if (array.length != weights.length) throw ...; >> double sum = 0; >> double wsum = 0; >> for(int i = 0; i < array.length; i++) { >> sum += array[i] * weights[i]; >> wsum += weights[i]; >> } >> return sum / wsum; >> } >> >> static double mean(double[] array) { >> return mean(array, allOnes(array.length)); >> } >> >> static double[] allOnes(int n) { >> double[] d = new double[n]; >> Arrays.fill(d, 1); >> return d; >> } >> >> Now suppose I call mean(double[]) overload like this: >> >> double[] d = {1,2,3,4}; >> >> Using 8u51 with C2 compiler: >> >> 1) it looks like the array allocation from allOnes isn't eliminated. >> 2) moreover it looked like array was zeroed (rep stosd with rax holding >> zero). Unless I misread the asm, I thought an allocation followed by >> Arrays.fill skips the zeroing? >> 3) ideally, this case would reduce to code that just does a plain >> unweighted mean with no multiplication by the weight and no summation for the >> weighted sum (weight sum is just array length). Is this simply too much >> analysis to ask for? >> >> Thanks >> >> >> -- >> Sent from my phone >> > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Dec 24 01:59:58 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 17:59:58 -0800 Subject: Optimization question In-Reply-To: References: Message-ID: <567B519E.9070409@oracle.com> Unfortunately whole loop unrolling happens after Escape analysis is done. As result we can't eliminate allocations since we don't know which element of arrays is referenced in loop: JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 ]] 185 AllocateArray === 127 124 178 8 1 ( 111 99 20 98 1 72 1 1 130 1 NSR - Non Scalar Replaceable. After loop is unrolled the result is calculated but arrays are still allocated. We do remove NSR allocations for boxing objects but not regular allocations: // Eliminate boxing allocations which are not used // regardless scalar replaceable status. bool boxing_alloc = C->eliminate_boxing() && tklass->klass()->is_instance_klass() && tklass->klass()->as_instance_klass()->is_box_klass(); if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != NULL))) { return false; } Only allocation followed by arraycopy skips zeroing, not by fill() call. Arrays.fill() is implemented as loop. if (init != NULL && init->is_complete_with_arraycopy() && k->is_type_array_klass()) { // Don't zero type array during slow allocation in VM since // it will be initialized later by arraycopy in compiled code. slow_call_address = OptoRuntime::new_array_nozero_Java(); Regards, Vladimir On 12/23/15 4:56 PM, Vitaly Davidovich wrote: > Hi guys, > > Consider code like this: > > static double mean(double[] array, double[] weights) { > if (array.length != weights.length) throw ...; > double sum = 0; > double wsum = 0; > for(int i = 0; i < array.length; i++) { > sum += array[i] * weights[i]; > wsum += weights[i]; > } > return sum / wsum; > } > > static double mean(double[] array) { > return mean(array, allOnes(array.length)); > } > > static double[] allOnes(int n) { > double[] d = new double[n]; > Arrays.fill(d, 1); > return d; > } > > Now suppose I call mean(double[]) overload like this: > > double[] d = {1,2,3,4}; > > Using 8u51 with C2 compiler: > > 1) it looks like the array allocation from allOnes isn't eliminated. > 2) moreover it looked like array was zeroed (rep stosd with rax holding zero). Unless I misread the asm, I thought an > allocation followed by Arrays.fill skips the zeroing? > 3) ideally, this case would reduce to code that just does a plain unweighted mean with no multiplication by the weight > and no summation for the weighted sum (weight sum is just array length). Is this simply too much analysis to ask for? > > Thanks > > > -- > Sent from my phone From rednaxelafx at gmail.com Thu Dec 24 02:03:14 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 18:03:14 -0800 Subject: Optimization question In-Reply-To: References: Message-ID: Comments inline below: On Wed, Dec 23, 2015 at 5:55 PM, Vitaly Davidovich wrote: > Hi Kris, > > Thanks for the reply. Some comments below. > > On Wednesday, December 23, 2015, Krystal Mok > wrote: > >> Hi Vitaly, >> >> For 2), there's this bug [1] which isn't really fixed, but rather, the >> "skip zeroing of an array fully covered by a Arrays.fill()" optimization >> was turned off for the moment. >> > > That JBS talks about ints but I take it the problem is more general and > the opto was disabled entirely? In the case I showed above, the fill is > direct successor of the allocation with no uses of the zero values, but of > course if the optimization is disabled entirely that would explain it. > Yes, the problem is more general. There's an issue with the implementation of the optimization, which doesn't take proper dominance information into account, so if there's a read of an array element between the allocation and the Arrays.fill(), it might see a non-zero value. That's not just for int[]'s, but for all array types. So the optimization was temporarily turned off. I haven't been following the status of that bug, though, so I'm not sure if it's fixed in some newer version already. > >> For 1), if all methods are inlined as expected, I believe it's because of >> HotSpot C2's order of optimizations: escape analysis / scalar replacement >> happens before loop optimizations. The latter can fully unroll your loop in >> mean(double[] array, double[] weights) if the input array is a constant >> like the one you tested, but it's too late -- the former cannot scalar >> replace an array if there are stores to it with unknown indices. See [2] >> for more details. >> > > Yes everything is inlined here. So what stores are we talking about here? > The weights array is filled, yes, but only read from in mean(). In fact, > both arrays are read only. Also, in this particular example the array > indices are "known" given the actual input array is constant. > Oops, I misread your code (I only skimmed through it, didn't read it carefully, sorry!) But it's the same thing, if you have either a load or a store on a element whose index is unknown to EA, it can't scalar replace the array. > > Thanks and happy holidays! > The same to you! - Kris > >> - Kris >> >> [1]: https://bugs.openjdk.java.net/browse/JDK-7196857 >> [2]: ConnectionGraph::adjust_scalar_replaceable_state() in opto/escape.cpp >> // 3. An object is not scalar replaceable if it has a field with >> unknown >> // offset (array's element is accessed in loop). >> >> On Wed, Dec 23, 2015 at 4:56 PM, Vitaly Davidovich >> wrote: >> >>> Hi guys, >>> >>> Consider code like this: >>> >>> static double mean(double[] array, double[] weights) { >>> if (array.length != weights.length) throw ...; >>> double sum = 0; >>> double wsum = 0; >>> for(int i = 0; i < array.length; i++) { >>> sum += array[i] * weights[i]; >>> wsum += weights[i]; >>> } >>> return sum / wsum; >>> } >>> >>> static double mean(double[] array) { >>> return mean(array, allOnes(array.length)); >>> } >>> >>> static double[] allOnes(int n) { >>> double[] d = new double[n]; >>> Arrays.fill(d, 1); >>> return d; >>> } >>> >>> Now suppose I call mean(double[]) overload like this: >>> >>> double[] d = {1,2,3,4}; >>> >>> Using 8u51 with C2 compiler: >>> >>> 1) it looks like the array allocation from allOnes isn't eliminated. >>> 2) moreover it looked like array was zeroed (rep stosd with rax holding >>> zero). Unless I misread the asm, I thought an allocation followed by >>> Arrays.fill skips the zeroing? >>> 3) ideally, this case would reduce to code that just does a plain >>> unweighted mean with no multiplication by the weight and no summation for the >>> weighted sum (weight sum is just array length). Is this simply too much >>> analysis to ask for? >>> >>> Thanks >>> >>> >>> -- >>> Sent from my phone >>> >> >> > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Dec 24 02:13:33 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 21:13:33 -0500 Subject: Optimization question In-Reply-To: <567B519E.9070409@oracle.com> References: <567B519E.9070409@oracle.com> Message-ID: Hi Vladimir, On Wednesday, December 23, 2015, Vladimir Kozlov wrote: > Unfortunately whole loop unrolling happens after Escape analysis is done. > As result we can't eliminate allocations since we don't know which element > of arrays is referenced in loop: > > JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 ]] > 185 AllocateArray === 127 124 178 8 1 ( 111 99 20 98 1 > 72 1 1 130 1 > > NSR - Non Scalar Replaceable. > > After loop is unrolled the result is calculated but arrays are still > allocated. Ah ok, that's what Kris was saying as well. But why does unrolling matter for this purpose? Even if loop is not unrolled is it not known which elements are accessed? Also, what do you mean by "result is calculated"? What result? :) > We do remove NSR allocations for boxing objects but not regular > allocations: > > // Eliminate boxing allocations which are not used > // regardless scalar replaceable status. > bool boxing_alloc = C->eliminate_boxing() && > tklass->klass()->is_instance_klass() && > tklass->klass()->as_instance_klass()->is_box_klass(); > if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != NULL))) { > return false; > } > > Only allocation followed by arraycopy skips zeroing, not by fill() call. > Arrays.fill() is implemented as loop. > > if (init != NULL && init->is_complete_with_arraycopy() && > k->is_type_array_klass()) { > // Don't zero type array during slow allocation in VM since > // it will be initialized later by arraycopy in compiled code. > slow_call_address = OptoRuntime::new_array_nozero_Java(); Hmm, I'm pretty sure fill() following an allocation had the same zeroing elision applied to it. Kris notes the opto is turned off due to implementation issues, which would explain why I still see zeroing. Thanks > > Regards, > Vladimir > > On 12/23/15 4:56 PM, Vitaly Davidovich wrote: > >> Hi guys, >> >> Consider code like this: >> >> static double mean(double[] array, double[] weights) { >> if (array.length != weights.length) throw ...; >> double sum = 0; >> double wsum = 0; >> for(int i = 0; i < array.length; i++) { >> sum += array[i] * weights[i]; >> wsum += weights[i]; >> } >> return sum / wsum; >> } >> >> static double mean(double[] array) { >> return mean(array, allOnes(array.length)); >> } >> >> static double[] allOnes(int n) { >> double[] d = new double[n]; >> Arrays.fill(d, 1); >> return d; >> } >> >> Now suppose I call mean(double[]) overload like this: >> >> double[] d = {1,2,3,4}; >> >> Using 8u51 with C2 compiler: >> >> 1) it looks like the array allocation from allOnes isn't eliminated. >> 2) moreover it looked like array was zeroed (rep stosd with rax holding >> zero). Unless I misread the asm, I thought an >> allocation followed by Arrays.fill skips the zeroing? >> 3) ideally, this case would reduce to code that just does a plain >> unweighted mean with no multiplication by the weight >> and no summation for the weighted sum (weight sum is just array length). >> Is this simply too much analysis to ask for? >> >> Thanks >> >> >> -- >> Sent from my phone >> > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Dec 24 02:16:08 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 18:16:08 -0800 Subject: IGVN worklist ordering In-Reply-To: References: Message-ID: <567B5568.2010508@oracle.com> One of the problems is that IGVN worklist is not "true" queue. There is no guarantee that nodes will be processed in the order they were placed on worklist. Long ago I played with "real" queue worklist but it bring only increase in memory consumption. 1. We only delay transformation when there are merge paths to wait when all paths are complete. 2. In reality you can't say that some path is dead until some data nodes and depending on them condition code are processed. We do try to eliminate dead nodes recursively when we find one - see remove_globally_dead_node() and kill_dead_code(). But it not always work. Vladimir On 12/23/15 3:07 PM, Krystal Mok wrote: > Hi compiler team, > > I?d like to ask about the IGVN worklist ordering: > > 1. Is there a rule of thumb of which nodes should call record_for_igvn()? If so, in what order (e.g. data nodes vs. > their control dependence)? > > Apparently, nodes that get a record_for_igvn() call right after their creation usually wants to delay the call to > transform(), perhaps due to potential optimizations to their control input, or because they're created after parsing and > might affect other nodes. > > But what would be a good list of guidelines about exactly what kind of nodes, or what kind of patterns in the IR, that > should be considered as candidates to put onto the IGVN worklist, and vice versa? > > 2. Since it?s problematic for IGVN to process a node whose control path is already dead (but not yet collapsed to TOP), > why isn?t there a mechanism built-in to IGVN?s worklist or PhaseIterGVN::transform_old() so that dead control paths > always collapses before IGVN decides to process a node? > > > I?ve had a doubt on this topic for quite a while now, so I?m seeking advice from all of you who have had to deal with > bugs in this area. > There have been a lot of bugs related to the IGVN worklist ordering in the past, the most of the fixes does either: > a) add missing record_for_igvn() calls for problematic nodes > b) switch the order of adjacent record_for_igvn() calls on related nodes > Is it possible or a good idea to fix it down at the core of IGVN? > > Happy holidays, guys! > > Best regards, > Kris From vitalyd at gmail.com Thu Dec 24 02:17:57 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 21:17:57 -0500 Subject: Optimization question In-Reply-To: References: Message-ID: On Wednesday, December 23, 2015, Krystal Mok wrote: > Comments inline below: > > On Wed, Dec 23, 2015 at 5:55 PM, Vitaly Davidovich > wrote: > >> Hi Kris, >> >> Thanks for the reply. Some comments below. >> >> On Wednesday, December 23, 2015, Krystal Mok > > wrote: >> >>> Hi Vitaly, >>> >>> For 2), there's this bug [1] which isn't really fixed, but rather, the >>> "skip zeroing of an array fully covered by a Arrays.fill()" optimization >>> was turned off for the moment. >>> >> >> That JBS talks about ints but I take it the problem is more general and >> the opto was disabled entirely? In the case I showed above, the fill is >> direct successor of the allocation with no uses of the zero values, but of >> course if the optimization is disabled entirely that would explain it. >> > > Yes, the problem is more general. There's an issue with the implementation > of the optimization, which doesn't take proper dominance information into > account, so if there's a read of an array element between the allocation > and the Arrays.fill(), it might see a non-zero value. That's not just for > int[]'s, but for all array types. > So the optimization was temporarily turned off. I haven't been following > the status of that bug, though, so I'm not sure if it's fixed in some newer > version already. > Understood, thanks > > >> >>> For 1), if all methods are inlined as expected, I believe it's because >>> of HotSpot C2's order of optimizations: escape analysis / scalar >>> replacement happens before loop optimizations. The latter can fully unroll >>> your loop in mean(double[] array, double[] weights) if the input array is a >>> constant like the one you tested, but it's too late -- the former cannot >>> scalar replace an array if there are stores to it with unknown indices. See >>> [2] for more details. >>> >> >> Yes everything is inlined here. So what stores are we talking about here? >> The weights array is filled, yes, but only read from in mean(). In fact, >> both arrays are read only. Also, in this particular example the array >> indices are "known" given the actual input array is constant. >> > > Oops, I misread your code (I only skimmed through it, didn't read it > carefully, sorry!) > But it's the same thing, if you have either a load or a store on a element > whose index is unknown to EA, it can't scalar replace the array. > No worries. Ok so Vladimir mentioned the same thing about any access that EA doesn't know about. I guess I'm still unclear on why unrolling needs to happen given array length can be deduced, loop stride is constant, and loop body shows which arrays and indices are accessed and in what manner (read, write, both). It seems like all the info is there even without unrolling. Is this just an implementation detail or am I missing something fundamental? > > >> >> Thanks and happy holidays! >> > > The same to you! > > - Kris > > >> >>> - Kris >>> >>> [1]: https://bugs.openjdk.java.net/browse/JDK-7196857 >>> [2]: ConnectionGraph::adjust_scalar_replaceable_state() in >>> opto/escape.cpp >>> // 3. An object is not scalar replaceable if it has a field with >>> unknown >>> // offset (array's element is accessed in loop). >>> >>> On Wed, Dec 23, 2015 at 4:56 PM, Vitaly Davidovich >>> wrote: >>> >>>> Hi guys, >>>> >>>> Consider code like this: >>>> >>>> static double mean(double[] array, double[] weights) { >>>> if (array.length != weights.length) throw ...; >>>> double sum = 0; >>>> double wsum = 0; >>>> for(int i = 0; i < array.length; i++) { >>>> sum += array[i] * weights[i]; >>>> wsum += weights[i]; >>>> } >>>> return sum / wsum; >>>> } >>>> >>>> static double mean(double[] array) { >>>> return mean(array, allOnes(array.length)); >>>> } >>>> >>>> static double[] allOnes(int n) { >>>> double[] d = new double[n]; >>>> Arrays.fill(d, 1); >>>> return d; >>>> } >>>> >>>> Now suppose I call mean(double[]) overload like this: >>>> >>>> double[] d = {1,2,3,4}; >>>> >>>> Using 8u51 with C2 compiler: >>>> >>>> 1) it looks like the array allocation from allOnes isn't eliminated. >>>> 2) moreover it looked like array was zeroed (rep stosd with rax holding >>>> zero). Unless I misread the asm, I thought an allocation followed by >>>> Arrays.fill skips the zeroing? >>>> 3) ideally, this case would reduce to code that just does a plain >>>> unweighted mean with no multiplication by the weight and no summation for the >>>> weighted sum (weight sum is just array length). Is this simply too much >>>> analysis to ask for? >>>> >>>> Thanks >>>> >>>> >>>> -- >>>> Sent from my phone >>>> >>> >>> >> >> -- >> Sent from my phone >> > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Dec 24 02:20:14 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 18:20:14 -0800 Subject: Optimization question In-Reply-To: References: <567B519E.9070409@oracle.com> Message-ID: Comments inline below: On Wed, Dec 23, 2015 at 6:13 PM, Vitaly Davidovich wrote: > Hi Vladimir, > > On Wednesday, December 23, 2015, Vladimir Kozlov < > vladimir.kozlov at oracle.com> wrote: > >> Unfortunately whole loop unrolling happens after Escape analysis is done. >> As result we can't eliminate allocations since we don't know which >> element of arrays is referenced in loop: >> >> JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 ]] >> 185 AllocateArray === 127 124 178 8 1 ( 111 99 20 98 1 >> 72 1 1 130 1 >> >> NSR - Non Scalar Replaceable. >> >> After loop is unrolled the result is calculated but arrays are still >> allocated. > > > Ah ok, that's what Kris was saying as well. But why does unrolling matter > for this purpose? Even if loop is not unrolled is it not known which > elements are accessed? > When the loop is fully unrolled, the indices of the elements in the array becomes a compile-time constant. That, in turn, can be recognized by EA, allowing the array to be scalar replaceable -- if EA happens after loop unrolling, that is. > > Also, what do you mean by "result is calculated"? What result? :) > Vladimir is probably talking about the "sum" and "wsum" local variable in your program is constant folded after loop unrolling. But the array had to stay. - Kris > > >> We do remove NSR allocations for boxing objects but not regular >> allocations: >> >> // Eliminate boxing allocations which are not used >> // regardless scalar replaceable status. >> bool boxing_alloc = C->eliminate_boxing() && >> tklass->klass()->is_instance_klass() && >> >> tklass->klass()->as_instance_klass()->is_box_klass(); >> if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != NULL))) >> { >> return false; >> } >> >> Only allocation followed by arraycopy skips zeroing, not by fill() call. >> Arrays.fill() is implemented as loop. >> >> if (init != NULL && init->is_complete_with_arraycopy() && >> k->is_type_array_klass()) { >> // Don't zero type array during slow allocation in VM since >> // it will be initialized later by arraycopy in compiled code. >> slow_call_address = OptoRuntime::new_array_nozero_Java(); > > > Hmm, I'm pretty sure fill() following an allocation had the same zeroing > elision applied to it. Kris notes the opto is turned off due to > implementation issues, which would explain why I still see zeroing. > > Thanks > >> >> Regards, >> Vladimir >> >> On 12/23/15 4:56 PM, Vitaly Davidovich wrote: >> >>> Hi guys, >>> >>> Consider code like this: >>> >>> static double mean(double[] array, double[] weights) { >>> if (array.length != weights.length) throw ...; >>> double sum = 0; >>> double wsum = 0; >>> for(int i = 0; i < array.length; i++) { >>> sum += array[i] * weights[i]; >>> wsum += weights[i]; >>> } >>> return sum / wsum; >>> } >>> >>> static double mean(double[] array) { >>> return mean(array, allOnes(array.length)); >>> } >>> >>> static double[] allOnes(int n) { >>> double[] d = new double[n]; >>> Arrays.fill(d, 1); >>> return d; >>> } >>> >>> Now suppose I call mean(double[]) overload like this: >>> >>> double[] d = {1,2,3,4}; >>> >>> Using 8u51 with C2 compiler: >>> >>> 1) it looks like the array allocation from allOnes isn't eliminated. >>> 2) moreover it looked like array was zeroed (rep stosd with rax holding >>> zero). Unless I misread the asm, I thought an >>> allocation followed by Arrays.fill skips the zeroing? >>> 3) ideally, this case would reduce to code that just does a plain >>> unweighted mean with no multiplication by the weight >>> and no summation for the weighted sum (weight sum is just array >>> length). Is this simply too much analysis to ask for? >>> >>> Thanks >>> >>> >>> -- >>> Sent from my phone >>> >> > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Dec 24 02:22:32 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 18:22:32 -0800 Subject: Optimization question In-Reply-To: References: <567B519E.9070409@oracle.com> Message-ID: <567B56E8.2080502@oracle.com> On 12/23/15 6:13 PM, Vitaly Davidovich wrote: > Hi Vladimir, > > On Wednesday, December 23, 2015, Vladimir Kozlov > wrote: > > Unfortunately whole loop unrolling happens after Escape analysis is done. > As result we can't eliminate allocations since we don't know which element of arrays is referenced in loop: > > JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 ]] 185 AllocateArray === 127 124 178 > 8 1 ( 111 99 20 98 1 72 1 1 130 1 > > NSR - Non Scalar Replaceable. > > After loop is unrolled the result is calculated but arrays are still allocated. > > > Ah ok, that's what Kris was saying as well. But why does unrolling matter for this purpose? Even if loop is not > unrolled is it not known which elements are accessed? If loop is not whole unrolled we can't eliminate load instructions sine we don't which element is loaded: for(int i = 0; i < array.length; i++) { sum += array[i] * weights[i]; > > Also, what do you mean by "result is calculated"? What result? :) After loop is unrolled mean()is collapsed to return pre-calculated result if we have both allocation inlined: static double result; static void test() { double[] d = {1,2,3,4}; result = mean(d); } 10c # MachConstantBaseNode (empty encoding) 10c movsd XMM0, [constant table base + #0] # load from constant table: double=#2.500000 114 movq R10, java/lang/Class:exact * # ptr 11e movsd [R10 + #104 (8-bit)], XMM0 # double ! Field: TestFillArray.result 124 addq rsp, 16 # Destroy frame popq rbp testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 12f ret Vladimir > > > We do remove NSR allocations for boxing objects but not regular allocations: > > // Eliminate boxing allocations which are not used > // regardless scalar replaceable status. > bool boxing_alloc = C->eliminate_boxing() && > tklass->klass()->is_instance_klass() && > tklass->klass()->as_instance_klass()->is_box_klass(); > if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != NULL))) { > return false; > } > > Only allocation followed by arraycopy skips zeroing, not by fill() call. Arrays.fill() is implemented as loop. > > if (init != NULL && init->is_complete_with_arraycopy() && > k->is_type_array_klass()) { > // Don't zero type array during slow allocation in VM since > // it will be initialized later by arraycopy in compiled code. > slow_call_address = OptoRuntime::new_array_nozero_Java(); > > > Hmm, I'm pretty sure fill() following an allocation had the same zeroing elision applied to it. Kris notes the opto is > turned off due to implementation issues, which would explain why I still see zeroing. > > Thanks > > > Regards, > Vladimir > > On 12/23/15 4:56 PM, Vitaly Davidovich wrote: > > Hi guys, > > Consider code like this: > > static double mean(double[] array, double[] weights) { > if (array.length != weights.length) throw ...; > double sum = 0; > double wsum = 0; > for(int i = 0; i < array.length; i++) { > sum += array[i] * weights[i]; > wsum += weights[i]; > } > return sum / wsum; > } > > static double mean(double[] array) { > return mean(array, allOnes(array.length)); > } > > static double[] allOnes(int n) { > double[] d = new double[n]; > Arrays.fill(d, 1); > return d; > } > > Now suppose I call mean(double[]) overload like this: > > double[] d = {1,2,3,4}; > > Using 8u51 with C2 compiler: > > 1) it looks like the array allocation from allOnes isn't eliminated. > 2) moreover it looked like array was zeroed (rep stosd with rax holding zero). Unless I misread the asm, I > thought an > allocation followed by Arrays.fill skips the zeroing? > 3) ideally, this case would reduce to code that just does a plain unweighted mean with no multiplication by the > weight > and no summation for the weighted sum (weight sum is just array length). Is this simply too much analysis to > ask for? > > Thanks > > > -- > Sent from my phone > > > > -- > Sent from my phone From vitalyd at gmail.com Thu Dec 24 02:30:13 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 21:30:13 -0500 Subject: Optimization question In-Reply-To: <567B56E8.2080502@oracle.com> References: <567B519E.9070409@oracle.com> <567B56E8.2080502@oracle.com> Message-ID: On Wednesday, December 23, 2015, Vladimir Kozlov wrote: > On 12/23/15 6:13 PM, Vitaly Davidovich wrote: > >> Hi Vladimir, >> >> On Wednesday, December 23, 2015, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> Unfortunately whole loop unrolling happens after Escape analysis is >> done. >> As result we can't eliminate allocations since we don't know which >> element of arrays is referenced in loop: >> >> JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 ]] >> 185 AllocateArray === 127 124 178 >> 8 1 ( 111 99 20 98 1 72 1 1 130 1 >> >> NSR - Non Scalar Replaceable. >> >> After loop is unrolled the result is calculated but arrays are still >> allocated. >> >> >> Ah ok, that's what Kris was saying as well. But why does unrolling >> matter for this purpose? Even if loop is not >> unrolled is it not known which elements are accessed? >> > > If loop is not whole unrolled we can't eliminate load instructions sine we > don't which element is loaded: Ok. Can't say I understand why it needs to be fully unrolled to determine which elements will be accessed, but that's fine - thanks. > > for(int i = 0; i < array.length; i++) { > sum += array[i] * weights[i]; > > >> Also, what do you mean by "result is calculated"? What result? :) >> > > After loop is unrolled mean()is collapsed to return pre-calculated result > if we have both allocation inlined: > > static double result; > > static void test() { > double[] d = {1,2,3,4}; > result = mean(d); > } > > 10c # MachConstantBaseNode (empty encoding) > 10c movsd XMM0, [constant table base + #0] # load from > constant table: double=#2.500000 > 114 movq R10, java/lang/Class:exact * # ptr > 11e movsd [R10 + #104 (8-bit)], XMM0 # double ! Field: > TestFillArray.result > 124 addq rsp, 16 # Destroy frame > popq rbp > testl rax, [rip + #offset_to_poll_page] # Safepoint: poll > for GC > 12f ret Interesting, I'm pretty sure I didn't see a precomputed constant returned but I'll double check again tomorrow. What's this pseudo assembly above? Is that available in product builds or is it some debug build output? Thanks for the replies. > > Vladimir > > >> >> We do remove NSR allocations for boxing objects but not regular >> allocations: >> >> // Eliminate boxing allocations which are not used >> // regardless scalar replaceable status. >> bool boxing_alloc = C->eliminate_boxing() && >> tklass->klass()->is_instance_klass() && >> >> tklass->klass()->as_instance_klass()->is_box_klass(); >> if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != >> NULL))) { >> return false; >> } >> >> Only allocation followed by arraycopy skips zeroing, not by fill() >> call. Arrays.fill() is implemented as loop. >> >> if (init != NULL && init->is_complete_with_arraycopy() && >> k->is_type_array_klass()) { >> // Don't zero type array during slow allocation in VM since >> // it will be initialized later by arraycopy in compiled code. >> slow_call_address = OptoRuntime::new_array_nozero_Java(); >> >> >> Hmm, I'm pretty sure fill() following an allocation had the same zeroing >> elision applied to it. Kris notes the opto is >> turned off due to implementation issues, which would explain why I still >> see zeroing. >> >> Thanks >> >> >> Regards, >> Vladimir >> >> On 12/23/15 4:56 PM, Vitaly Davidovich wrote: >> >> Hi guys, >> >> Consider code like this: >> >> static double mean(double[] array, double[] weights) { >> if (array.length != weights.length) throw ...; >> double sum = 0; >> double wsum = 0; >> for(int i = 0; i < array.length; i++) { >> sum += array[i] * weights[i]; >> wsum += weights[i]; >> } >> return sum / wsum; >> } >> >> static double mean(double[] array) { >> return mean(array, allOnes(array.length)); >> } >> >> static double[] allOnes(int n) { >> double[] d = new double[n]; >> Arrays.fill(d, 1); >> return d; >> } >> >> Now suppose I call mean(double[]) overload like this: >> >> double[] d = {1,2,3,4}; >> >> Using 8u51 with C2 compiler: >> >> 1) it looks like the array allocation from allOnes isn't >> eliminated. >> 2) moreover it looked like array was zeroed (rep stosd with rax >> holding zero). Unless I misread the asm, I >> thought an >> allocation followed by Arrays.fill skips the zeroing? >> 3) ideally, this case would reduce to code that just does a plain >> unweighted mean with no multiplication by the >> weight >> and no summation for the weighted sum (weight sum is just array >> length). Is this simply too much analysis to >> ask for? >> >> Thanks >> >> >> -- >> Sent from my phone >> >> >> >> -- >> Sent from my phone >> > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Dec 24 02:39:59 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 21:39:59 -0500 Subject: Optimization question In-Reply-To: References: <567B519E.9070409@oracle.com> <567B56E8.2080502@oracle.com> Message-ID: By the way, would it make sense to run the EA pass again if at least one loop was fully unrolled in the loop transformation pass? Or is this too expensive? I know EA is costly so just wondering. On Wednesday, December 23, 2015, Vitaly Davidovich wrote: > > > On Wednesday, December 23, 2015, Vladimir Kozlov < > vladimir.kozlov at oracle.com > > wrote: > >> On 12/23/15 6:13 PM, Vitaly Davidovich wrote: >> >>> Hi Vladimir, >>> >>> On Wednesday, December 23, 2015, Vladimir Kozlov < >>> vladimir.kozlov at oracle.com > wrote: >>> >>> Unfortunately whole loop unrolling happens after Escape analysis is >>> done. >>> As result we can't eliminate allocations since we don't know which >>> element of arrays is referenced in loop: >>> >>> JavaObject NoEscape(NoEscape) NSR [ 397F 275F 276F 398F [ 197 202 >>> ]] 185 AllocateArray === 127 124 178 >>> 8 1 ( 111 99 20 98 1 72 1 1 130 1 >>> >>> NSR - Non Scalar Replaceable. >>> >>> After loop is unrolled the result is calculated but arrays are still >>> allocated. >>> >>> >>> Ah ok, that's what Kris was saying as well. But why does unrolling >>> matter for this purpose? Even if loop is not >>> unrolled is it not known which elements are accessed? >>> >> >> If loop is not whole unrolled we can't eliminate load instructions sine >> we don't which element is loaded: > > > Ok. Can't say I understand why it needs to be fully unrolled to determine > which elements will be accessed, but that's fine - thanks. > > >> >> for(int i = 0; i < array.length; i++) { >> sum += array[i] * weights[i]; >> >> >>> Also, what do you mean by "result is calculated"? What result? :) >>> >> >> After loop is unrolled mean()is collapsed to return pre-calculated result >> if we have both allocation inlined: >> >> static double result; >> >> static void test() { >> double[] d = {1,2,3,4}; >> result = mean(d); >> } >> >> 10c # MachConstantBaseNode (empty encoding) >> 10c movsd XMM0, [constant table base + #0] # load from >> constant table: double=#2.500000 >> 114 movq R10, java/lang/Class:exact * # ptr >> 11e movsd [R10 + #104 (8-bit)], XMM0 # double ! Field: >> TestFillArray.result >> 124 addq rsp, 16 # Destroy frame >> popq rbp >> testl rax, [rip + #offset_to_poll_page] # Safepoint: poll >> for GC >> 12f ret > > > Interesting, I'm pretty sure I didn't see a precomputed constant returned > but I'll double check again tomorrow. > > What's this pseudo assembly above? Is that available in product builds or > is it some debug build output? > > Thanks for the replies. > > >> >> Vladimir >> >> >>> >>> We do remove NSR allocations for boxing objects but not regular >>> allocations: >>> >>> // Eliminate boxing allocations which are not used >>> // regardless scalar replaceable status. >>> bool boxing_alloc = C->eliminate_boxing() && >>> tklass->klass()->is_instance_klass() && >>> >>> tklass->klass()->as_instance_klass()->is_box_klass(); >>> if (!alloc->_is_scalar_replaceable && (!boxing_alloc || (res != >>> NULL))) { >>> return false; >>> } >>> >>> Only allocation followed by arraycopy skips zeroing, not by fill() >>> call. Arrays.fill() is implemented as loop. >>> >>> if (init != NULL && init->is_complete_with_arraycopy() && >>> k->is_type_array_klass()) { >>> // Don't zero type array during slow allocation in VM since >>> // it will be initialized later by arraycopy in compiled code. >>> slow_call_address = OptoRuntime::new_array_nozero_Java(); >>> >>> >>> Hmm, I'm pretty sure fill() following an allocation had the same zeroing >>> elision applied to it. Kris notes the opto is >>> turned off due to implementation issues, which would explain why I still >>> see zeroing. >>> >>> Thanks >>> >>> >>> Regards, >>> Vladimir >>> >>> On 12/23/15 4:56 PM, Vitaly Davidovich wrote: >>> >>> Hi guys, >>> >>> Consider code like this: >>> >>> static double mean(double[] array, double[] weights) { >>> if (array.length != weights.length) throw ...; >>> double sum = 0; >>> double wsum = 0; >>> for(int i = 0; i < array.length; i++) { >>> sum += array[i] * weights[i]; >>> wsum += weights[i]; >>> } >>> return sum / wsum; >>> } >>> >>> static double mean(double[] array) { >>> return mean(array, allOnes(array.length)); >>> } >>> >>> static double[] allOnes(int n) { >>> double[] d = new double[n]; >>> Arrays.fill(d, 1); >>> return d; >>> } >>> >>> Now suppose I call mean(double[]) overload like this: >>> >>> double[] d = {1,2,3,4}; >>> >>> Using 8u51 with C2 compiler: >>> >>> 1) it looks like the array allocation from allOnes isn't >>> eliminated. >>> 2) moreover it looked like array was zeroed (rep stosd with rax >>> holding zero). Unless I misread the asm, I >>> thought an >>> allocation followed by Arrays.fill skips the zeroing? >>> 3) ideally, this case would reduce to code that just does a >>> plain unweighted mean with no multiplication by the >>> weight >>> and no summation for the weighted sum (weight sum is just array >>> length). Is this simply too much analysis to >>> ask for? >>> >>> Thanks >>> >>> >>> -- >>> Sent from my phone >>> >>> >>> >>> -- >>> Sent from my phone >>> >> > > -- > Sent from my phone > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Dec 24 02:42:47 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 18:42:47 -0800 Subject: IGVN worklist ordering In-Reply-To: <567B5568.2010508@oracle.com> References: <567B5568.2010508@oracle.com> Message-ID: Hi Vladimir, Thanks a lot for your reply! I really appreciate it. On Wed, Dec 23, 2015 at 6:16 PM, Vladimir Kozlov wrote: > One of the problems is that IGVN worklist is not "true" queue. There is no > guarantee that nodes will be processed in the order they were placed on > worklist. Long ago I played with "real" queue worklist but it bring only > increase in memory consumption. > > 1. We only delay transformation when there are merge paths to wait when > all paths are complete. > Yes, this is a good rule of thumb. Let's take Parse::sharpen_type_after_if() for example. // Look for opportunities to sharpen the type of a node // whose klass is compared with a constant klass. if (btest == BoolTest::eq && tcon->isa_klassptr()) { Node* obj = extract_obj_from_klass_load(&_gvn, val); const TypeOopPtr* con_type = tcon->isa_klassptr()->as_instance_type(); if (obj != NULL && (con_type->isa_instptr() || con_type->isa_aryptr())) { // Found: // Bool(CmpP(LoadKlass(obj._klass), ConP(Foo.klass)), [eq]) // or the narrowOop equivalent. const Type* obj_type = _gvn.type(obj); const TypeOopPtr* tboth = obj_type->join(con_type)->isa_oopptr(); if (tboth != NULL && tboth->klass_is_exact() && tboth != obj_type && tboth->higher_equal(obj_type)) { // obj has to be of the exact type Foo if the CmpP succeeds. int obj_in_map = map()->find_edge(obj); JVMState* jvms = this->jvms(); if (obj_in_map >= 0 && (jvms->is_loc(obj_in_map) || jvms->is_stk(obj_in_map))) { TypeNode* ccast = new (C) CheckCastPPNode(control(), obj, tboth); const Type* tcc = ccast->as_Type()->type(); assert(tcc != obj_type && tcc->higher_equal(obj_type), "must improve"); // Delay transform() call to allow recovery of pre-cast value // at the control merge. _gvn.set_type_bottom(ccast); record_for_igvn(ccast); // Here's the payoff. replace_in_map(obj, ccast); } } } } The CheckCastPP node replaces the original value on a certain path after a control split. It's not a node whose control is a merge point (Region, Loop, etc); the merge point after this split should see the original (pre-case) value. Then is the record_for_igvn(ccast) call necessary here? If so, how does it really work? (I wrote the code above, but thinking of it now, I don't seem to fully understand why I wrote it that way...) > 2. In reality you can't say that some path is dead until some data nodes > and depending on them condition code are processed. We do try to eliminate > dead nodes recursively when we find one - see remove_globally_dead_node() > and kill_dead_code(). But it not always work. > > Thanks a lot for the pointer. I'm looking at kill_dead_code() right now. I might have additional questions with regards to them later. Happy holidays! Best regards, Kris > Vladimir > > > On 12/23/15 3:07 PM, Krystal Mok wrote: > >> Hi compiler team, >> >> I?d like to ask about the IGVN worklist ordering: >> >> 1. Is there a rule of thumb of which nodes should call record_for_igvn()? >> If so, in what order (e.g. data nodes vs. >> their control dependence)? >> >> Apparently, nodes that get a record_for_igvn() call right after their >> creation usually wants to delay the call to >> transform(), perhaps due to potential optimizations to their control >> input, or because they're created after parsing and >> might affect other nodes. >> >> But what would be a good list of guidelines about exactly what kind of >> nodes, or what kind of patterns in the IR, that >> should be considered as candidates to put onto the IGVN worklist, and >> vice versa? >> >> 2. Since it?s problematic for IGVN to process a node whose control path >> is already dead (but not yet collapsed to TOP), >> why isn?t there a mechanism built-in to IGVN?s worklist or >> PhaseIterGVN::transform_old() so that dead control paths >> always collapses before IGVN decides to process a node? >> >> >> I?ve had a doubt on this topic for quite a while now, so I?m seeking >> advice from all of you who have had to deal with >> bugs in this area. >> There have been a lot of bugs related to the IGVN worklist ordering in >> the past, the most of the fixes does either: >> a) add missing record_for_igvn() calls for problematic nodes >> b) switch the order of adjacent record_for_igvn() calls on related nodes >> Is it possible or a good idea to fix it down at the core of IGVN? >> >> Happy holidays, guys! >> >> Best regards, >> Kris >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Dec 24 02:49:26 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Dec 2015 18:49:26 -0800 Subject: Optimization question In-Reply-To: References: Message-ID: Resending to the list... On Wed, Dec 23, 2015 at 6:17 PM, Vitaly Davidovich wrote: > Ok so Vladimir mentioned the same thing about any access that EA doesn't > know about. I guess I'm still unclear on why unrolling needs to happen > given array length can be deduced, loop stride is constant, and loop body > shows which arrays and indices are accessed and in what manner (read, > write, both). It seems like all the info is there even without unrolling. > Is this just an implementation detail or am I missing something fundamental? > > It's an implementation detail of HotSpot C2. The loop structure information that you're talking about are actually not available until when loop optimization kicks in: - Counted loops are discovered (which computes the loop stride and bounds) - Loops are then unrolled (and only then the indices become constants) Unfortunately that happens after EA. Now, suppose EA tries to handle the "array" in your example, and scalar replaces its elements into local variable e0, e1, e2 and e3. Then what does "array[i]" translate to? If the index "i" is known as a constant, e.g. 2, then array[2] would be translated to e2. Otherwise there no straightforward way to translate it, since the 4 new local variable are "unrelated" (not guaranteed to be packed together anymore), you can't even try to efficiently make an interior pointer to dynamically point to them. - Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Dec 24 02:56:40 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 23 Dec 2015 21:56:40 -0500 Subject: Optimization question In-Reply-To: References: Message-ID: On Wednesday, December 23, 2015, Krystal Mok wrote: > Resending to the list... > > On Wed, Dec 23, 2015 at 6:17 PM, Vitaly Davidovich > wrote: > >> Ok so Vladimir mentioned the same thing about any access that EA doesn't >> know about. I guess I'm still unclear on why unrolling needs to happen >> given array length can be deduced, loop stride is constant, and loop body >> shows which arrays and indices are accessed and in what manner (read, >> write, both). It seems like all the info is there even without unrolling. >> Is this just an implementation detail or am I missing something fundamental? >> >> > It's an implementation detail of HotSpot C2. > > The loop structure information that you're talking about are actually not > available until when loop optimization kicks in: > - Counted loops are discovered (which computes the loop stride and bounds) > - Loops are then unrolled (and only then the indices become constants) > > Unfortunately that happens after EA. > > Now, suppose EA tries to handle the "array" in your example, and scalar > replaces its elements into local variable e0, e1, e2 and e3. > Then what does "array[i]" translate to? If the index "i" is known as a > constant, e.g. 2, then array[2] would be translated to e2. > Otherwise there no straightforward way to translate it, since the 4 new > local variable are "unrelated" (not guaranteed to be packed together > anymore), you can't even try to efficiently make an interior pointer to > dynamically point to them. > Yeah I see how piggybacking on unrolling helps. My thinking was you know the range of indices that will be accessed (assuming the constant array length is propagated); including the number of items. If number of accesses is within some threshold for scalar replacement, you could then allocate that many locals and assign them from the array based on the range and stride. I should also mention that Kris told me about PrintOptoAssembly, which is what Vladimir's output was; alas, it's nonproduct flag so I'll be sticking to raw full assembly reading :). At any rate, I think you guys have fully answered my original questions, so thanks very much for taking the time to do that. > - Kris > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Dec 24 03:13:25 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 19:13:25 -0800 Subject: IGVN worklist ordering In-Reply-To: References: <567B5568.2010508@oracle.com> Message-ID: <567B62D5.9010908@oracle.com> > Then is the record_for_igvn(ccast) call necessary here? If so, how does it really work? Consider next code: class A {} class A1 extends A {} class A2 extends A {} void test(class A a) { if (b1) { a = new A1(); } else if (b2) { a = new A2(); } if (a.getClass() == A.class) // Here cast (A)a could be incorrectly replaced by // (A1)a cast if second path was not processed yet Vladimir On 12/23/15 6:42 PM, Krystal Mok wrote: > Hi Vladimir, > > Thanks a lot for your reply! I really appreciate it. > > On Wed, Dec 23, 2015 at 6:16 PM, Vladimir Kozlov > wrote: > > One of the problems is that IGVN worklist is not "true" queue. There is no guarantee that nodes will be processed in > the order they were placed on worklist. Long ago I played with "real" queue worklist but it bring only increase in > memory consumption. > > 1. We only delay transformation when there are merge paths to wait when all paths are complete. > > > Yes, this is a good rule of thumb. > > Let's take Parse::sharpen_type_after_if() for example. > > // Look for opportunities to sharpen the type of a node > // whose klass is compared with a constant klass. > if (btest == BoolTest::eq && tcon->isa_klassptr()) { > Node* obj = extract_obj_from_klass_load(&_gvn, val); > const TypeOopPtr* con_type = tcon->isa_klassptr()->as_instance_type(); > if (obj != NULL && (con_type->isa_instptr() || con_type->isa_aryptr())) { > // Found: > // Bool(CmpP(LoadKlass(obj._klass), ConP(Foo.klass)), [eq]) > // or the narrowOop equivalent. > const Type* obj_type = _gvn.type(obj); > const TypeOopPtr* tboth = obj_type->join(con_type)->isa_oopptr(); > if (tboth != NULL && tboth->klass_is_exact() && tboth != obj_type && > tboth->higher_equal(obj_type)) { > // obj has to be of the exact type Foo if the CmpP succeeds. > int obj_in_map = map()->find_edge(obj); > JVMState* jvms = this->jvms(); > if (obj_in_map >= 0 && > (jvms->is_loc(obj_in_map) || jvms->is_stk(obj_in_map))) { > TypeNode* ccast = new (C) CheckCastPPNode(control(), obj, tboth); > const Type* tcc = ccast->as_Type()->type(); > assert(tcc != obj_type && tcc->higher_equal(obj_type), "must improve"); > // Delay transform() call to allow recovery of pre-cast value > // at the control merge. > _gvn.set_type_bottom(ccast); > record_for_igvn(ccast); > // Here's the payoff. > replace_in_map(obj, ccast); > } > } > } > } > > The CheckCastPP node replaces the original value on a certain path after a control split. It's not a node whose control > is a merge point (Region, Loop, etc); the merge point after this split should see the original (pre-case) value. > Then is the record_for_igvn(ccast) call necessary here? If so, how does it really work? > > (I wrote the code above, but thinking of it now, I don't seem to fully understand why I wrote it that way...) > > > 2. In reality you can't say that some path is dead until some data nodes and depending on them condition code are > processed. We do try to eliminate dead nodes recursively when we find one - see remove_globally_dead_node() and > kill_dead_code(). But it not always work. > > Thanks a lot for the pointer. I'm looking at kill_dead_code() right now. I might have additional questions with regards > to them later. > > Happy holidays! > > Best regards, > Kris > > Vladimir > > > On 12/23/15 3:07 PM, Krystal Mok wrote: > > Hi compiler team, > > I?d like to ask about the IGVN worklist ordering: > > 1. Is there a rule of thumb of which nodes should call record_for_igvn()? If so, in what order (e.g. data nodes vs. > their control dependence)? > > Apparently, nodes that get a record_for_igvn() call right after their creation usually wants to delay the call to > transform(), perhaps due to potential optimizations to their control input, or because they're created after > parsing and > might affect other nodes. > > But what would be a good list of guidelines about exactly what kind of nodes, or what kind of patterns in the > IR, that > should be considered as candidates to put onto the IGVN worklist, and vice versa? > > 2. Since it?s problematic for IGVN to process a node whose control path is already dead (but not yet collapsed > to TOP), > why isn?t there a mechanism built-in to IGVN?s worklist or PhaseIterGVN::transform_old() so that dead control paths > always collapses before IGVN decides to process a node? > > > I?ve had a doubt on this topic for quite a while now, so I?m seeking advice from all of you who have had to deal > with > bugs in this area. > There have been a lot of bugs related to the IGVN worklist ordering in the past, the most of the fixes does either: > a) add missing record_for_igvn() calls for problematic nodes > b) switch the order of adjacent record_for_igvn() calls on related nodes > Is it possible or a good idea to fix it down at the core of IGVN? > > Happy holidays, guys! > > Best regards, > Kris > > From vladimir.kozlov at oracle.com Thu Dec 24 05:11:13 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Dec 2015 21:11:13 -0800 Subject: RFR (M): 8145688: Update for x86 pow in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569DFEAD@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A569DCDB0@ORSMSX106.amr.corp.intel.com> <5678646B.8000307@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFEAD@ORSMSX106.amr.corp.intel.com> Message-ID: <567B7E71.1040005@oracle.com> Looks good. I will push it shortly. Thanks, Vladimir On 12/22/15 4:55 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please find the updated webrev for pow at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_pow/8145688/webrev.00/ > Bug ID: https://bugs.openjdk.java.net/browse/JDK-8145688 > Thank you. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, December 21, 2015 12:43 PM > To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net > Cc: Viswanathan, Sandhya > Subject: Re: RFR (M): 8145688: Update for x86 pow in the math lib > > Hi Vivek. > > Also always use {}: > > + if (VM_Version::supports_sse3()) > StubRoutines::_dlog = generate_libmLog(); > + StubRoutines::_dpow = generate_libmPow(); > > I see fast_log() is using movddup() sse3 instruction. Can you replace it with sse2 equivalent and have conditional code generation depending on sse3 presence instead of limiting it in stubGenerator? > > And please add comments to all #endif in macroAssembler_x86_libm.cpp. I forgot to ask that in previous changes. > When #ifdef block is big we put comment to see what scope is that: > > #endif // _LP64 > > #endif // !_LP64 > > Thanks, > Vladimir > > On 12/18/15 7:05 PM, Deshpande, Vivek R wrote: >> Hi all >> >> I would like to contribute a patch which optimizes Math.pow() for 64 and 32 bit X86 architecture using Intel LIBM >> implementation. >> >> This passes all the jtreg test in hotspot and PowTests.java in jdk/tests/java/lang/Math. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8145688 >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8145688/webrev.02/ >> >> Thanks and regards, >> >> Vivek >> From igor.ignatyev at oracle.com Thu Dec 24 20:52:04 2015 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 24 Dec 2015 23:52:04 +0300 Subject: RFR(XS) : 8146205 : quarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java Message-ID: <76EF69C7-EBBC-4B88-BCED-F42763CFFE1E@oracle.com> http://cr.openjdk.java.net/~iignatyev/8146205/webrev.00/ > 1 line changed: 1 ins; 0 del; 0 mod; Hi all, Could you please review the patch which quarantines compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java? ExecuteInstalledCodeTest fails intermittently even w/o extra VM flags, so it should be quarantined while 8139383 isn?t fixed. 8146205 : https://bugs.openjdk.java.net/browse/JDK-8146205 8139383 : https://bugs.openjdk.java.net/browse/JDK-8139383 Thanks, ? Igor From kishor.kharbas at intel.com Thu Dec 24 22:26:18 2015 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 24 Dec 2015 22:26:18 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <566228AD.6060704@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> Message-ID: Hello all, Thank you Vladimir and Anthony for your inputs so far. I have updated the hotspot based on the suggestions and also added CTR mode to jtreg test. During testing I also noticed that the Java code for CounterMode.crypt() uses the partially used encrypted counter from previous invocation and also saves the last encryptedCounter for next invocation. This case was not handled by the intrinsic. I have fixed this in the latest patch. Summary of changes: 1. Proper disabling of UseAESCTRIntrinsic flag based on hardware support 2. Adding the missing support explained above. 3. Added CTR mode in jtreg test 7184394 4. Added and changed some encodings (pextr and pinsr) in assembler_x86.cpp The updated hotspot webrev is at : http://cr.openjdk.java.net/~vdeshpande/8143925/webrev.00/ There is no update to jdk webrev posted earlier which is http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.02/ Bug id : https://bugs.openjdk.java.net/browse/JDK-8143925 Much appreciated! Happy holidays! Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, December 04, 2015 3:59 PM To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net Cc: Anthony Scarpino Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.02/ JDK changes looks good to me. hotspot: http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.04/ Please, set flag to 'false' on platforms which does not support this intrinsic: if (UseAESCTRIntrinsics) { warning("AES/CTR intrinsics are not available on this CPU"); FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); } Also Anthony asked to add test for this intrinsic. Please do it: "2) It would be good to add CTR to the TestAES tests. It's in hotspot/test/compiler/codegen/7184394/. The test currently has CBC, ECB, and GCM in it, so it should be easy. It's also the only test I know of that tests the intrinsic. None of the tests in the jdk repo that I know of loop enough to trigger the intrinsic." Thanks, Vladimir On 12/4/15 1:40 PM, Kharbas, Kishor wrote: > Thanks Vladimir for the feedback! > > I have updated the jbs entry with the new patch. > > JDK changes : added range checks in the JDK using additional methods. > Hotspot changes : renamed the UseCTRAESIntrinsics flag to > UseAESCTRIntrinsics > > Further review and feedback is appreciated! > > - Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, December 01, 2015 5:32 PM > To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES > > Hotspot changes seems fine. But JDK changes should have additional method for range checks - this is new requirement for intrinsics which access arrays. See, for example, cryptBlockCheck() in AESCrypt.java. > > Thanks, > Vladimir > > On 11/24/15 2:33 PM, Kharbas, Kishor wrote: >> Hello all, >> >> I request the community to review a patch for enhancing >> CounterMode.crypt() for AES. This patch defines intrinsic for >> CounterMode.crypt() to leverage the parallel nature of AES in Counter >> (CTR) Mode. >> >> This is achieved by operating on 6 blocks in parallel to issue >> independent x86 AES-NI instructions and keep the CPU pipeline full. >> >> Testing on micro-benchmark has shown a speedup of 4x-6x. >> >> Bug id: >> >> https://bugs.openjdk.java.net/browse/JDK-8143925 >> >> Webrev: >> >> hotspot: >> http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.02/ >> >> jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.01/ >> >> Much appreciated! >> >> Kishor Kharbas >> From vladimir.kozlov at oracle.com Fri Dec 25 00:00:00 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 24 Dec 2015 16:00:00 -0800 Subject: RFR(XS) : 8146205 : quarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java In-Reply-To: <76EF69C7-EBBC-4B88-BCED-F42763CFFE1E@oracle.com> References: <76EF69C7-EBBC-4B88-BCED-F42763CFFE1E@oracle.com> Message-ID: <